Abstract
The random forest (RF) algorithm is known for its predictive performance but has been criticized for its lack of interpretability due to its complex ensemble nature. To address the issue of explainability our study questions the traditional approach of using most representative trees (MRTs) to simplify RF interpretation, highlighting the potential for misinterpretation due to non-informative early splits. To overcome these limitations, we propose a new method involving the construction of artificial representative trees (ARTs) through a greedy algorithm that iteratively builds a tree to minimize the distance to the RF ensemble, thereby preserving the predictive performance of the RF. We give a detailed description of the methodological framework for ART construction, including strategies for reducing computational complexity through variable preselection and quantile-based splitting. Results from extensive simulations demonstrate that ARTs provide a more accurate reflection of the RF's predictive performance and substantially reduce the false discovery rate, thus offering a more reliable interpretative model. The findings suggest that ARTs represent an advance in addressing the interpretation of RF models.
| Original language | English |
|---|---|
| Title of host publication | Communications in Computer and Information Science |
| Publication date | 2024 |
| Publication status | Published - 2024 |