Can’t see the forest for the trees: Analyzing groves to explain random forests

Gero Szepannek*, Björn Hergen von Holt

*Corresponding author for this work

Abstract

Random forests are currently one of the most popular algorithms for supervised machine learning tasks. By taking into account for many trees instead of a single one the resulting forest model is no longer easy to understand and also often denoted as a black box. The paper is dedicated to the interpretability of random forest models using tree-based explanations. Two different concepts, namely most representative trees and surrogate trees are analyzed regarding both their ability to explain the model and to be understandable by humans. For this purpose explanation trees are further extended to groves, i.e. small forests of few trees. The results of an application to three real world data sets underline the inherent trade of between both requirements. Using groves allows to control for the complexity of an explanation while simultaneously analyzing their explanatory power.

Original languageEnglish
JournalBehaviormetrika
Volume51
Issue number1
Pages (from-to)411-423
Number of pages13
ISSN0385-7417
DOIs
Publication statusPublished - 01.2024
Externally publishedYes

Cite this