感觉现在有了 chatGPT 什么都总结好了,自己总结很难超过 AI 的。比如 我问 feature selection method,AI 回答:
Feature selection is the process of choosing a subset of relevant features (variables, predictors) for use in model construction. It can improve model performance, reduce overfitting, and enhance interpretability. Feature selection methods fall into three main categories:
1. Filter Methods
These use statistical techniques to evaluate the relationship between each feature and the target variable, independent of any machine learning algorithm.
Examples:
-
Correlation coefficient: Pearson/Spearman correlation with the target.
-
Chi-square test: For categorical variables and classification.
-
Mutual Information: Measures dependency between feature and target.
-
ANOVA F-test: For continuous features and categorical target.
-
Variance Threshold: Removes features with low variance.
Pros: Fast and computationally efficient.
Cons: Ignores feature interactions.
2. Wrapper Methods
These use a predictive model to evaluate feature subsets and select the best-performing one.
Examples:
-
Recursive Feature Elimination (RFE): Recursively removes the least important feature.
-
Sequential Feature Selection (SFS):
-
Forward Selection: Start with none, add features.
-
Backward Elimination: Start with all, remove features.
-
Pros: Takes interactions into account.
Cons: Computationally expensive, prone to overfitting on small data.
3. Embedded Methods
These perform feature selection during the model training process.
Examples:
-
Lasso Regression (L1 regularization): Shrinks coefficients to zero.
-
Ridge Regression (L2): Shrinks coefficients, but does not eliminate them.
-
Elastic Net: Combines L1 and L2.
-
Tree-based models (e.g., Random Forest, XGBoost): Provide feature importance metrics.
Pros: Efficient and model-aware.
Cons: Feature importance may vary across models.
所以现在还有必要自己整理笔记吗?