Abstract
Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few "bad controls" in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.
Original language | English |
---|---|
Article number | 20220078 |
Number of pages | 12 |
Journal | Journal of Causal Inference |
Volume | 11 |
Issue number | 1 |
DOIs | |
Publication status | Published - 23 May 2023 |
Keywords
- double/debiased machine learning
- bad controls
- backdoor adjustment
- collider bias
- causal hierarchy