MDI#
General Overview: Mean Decrease Impurity (MDI) is a technique used to assess feature importance in decision tree-based models, such as Random Forests. It measures the importance of a feature by calculating the reduction in impurity (e.g., Gini impurity or entropy) it contributes across all the trees in the ensemble.
Key Concepts:
Impurity: A metric used to measure the homogeneity of nodes in a decision tree. Common impurity measures include Gini impurity and entropy.
Decision Trees: Models that split data into subsets based on feature values, aiming to reduce impurity at each split.
Ensemble Methods: Techniques like Random Forests, which build multiple decision trees and aggregate their results.
Feature Splits: In decision trees, features that effectively reduce impurity are selected for splits.
Procedure:
Train Decision Trees: Build decision trees as part of an ensemble model (e.g., Random Forest).
Calculate Impurity Reduction: For each feature, calculate how much it reduces impurity at each split in every tree.
Aggregate Importance: Sum the impurity reductions for each feature across all trees in the model.
Normalize Importance: Normalize these sums to get relative feature importance scores, often represented as percentages.
Applications:
Feature Ranking: Identifying the most important features for making predictions.
Model Interpretation: Understanding the contribution of each feature in the decision-making process of the model.
Feature Selection: Selecting the most influential features to reduce model complexity and improve performance.
Advantages:
Efficiency: Fast to compute since it is derived from the tree-building process.
Built-in Method: Naturally integrated into tree-based models, requiring no additional computation.
Interpretability: Provides clear and intuitive insights into which features are most influential.
Disadvantages:
Bias Towards High Cardinality: Features with many unique values (high cardinality) may appear more important because they can be used more often to split data.
Overfitting Risk: In trees that are too deep or not pruned properly, MDI can reflect noise rather than true importance.
Correlation Sensitivity: Like permutation importance, MDI can be misleading when features are highly correlated, as splits may be attributed to multiple correlated features.
Comparison with Permutation Importance:
Computation Method: MDI uses the tree-building process to measure importance, while permutation importance relies on shuffling feature values and observing performance changes.
Model Dependency: MDI is specific to tree-based models, whereas permutation importance is model-agnostic.
Bias: MDI can be biased towards high-cardinality features, while permutation importance is more robust but computationally intensive.
Conclusion: Mean Decrease Impurity is a useful and efficient method for determining feature importance in decision tree-based models. While it provides valuable insights into feature contributions, users should be aware of its potential biases and limitations, especially regarding high-cardinality features and feature correlations. Properly understanding and addressing these issues can help in making more informed decisions about feature selection and model interpretation.