## Abstract

Causal diagrams, also known as directed acyclic graphs,1,2 provide an entirely graphical, yet mathematically rigorous methodology for minimizing bias in epidemiologic studies.3,4 The analysis of causal diagrams can be cumbersome in practice, and lends itself well to automatization by a computer program. Important first steps in this regard include the development of the DAG program by Knüppel and Stang5 and dagR by Breitling.6 We announce the release of DAGitty, which provides a graphical user interface tailored to draw and analyze causal diagrams. DAGitty overcomes some performance obstacles (pointed out by Breitling6) that affect earlier software when analyzing large diagrams.

The performance issues are 2-fold. First, previous software employed back-tracking algorithms5 to enumerate and categorize all paths from exposure to outcome. This is a reasonable approach for small diagrams, but diagrams with tens of variables can already contain millions of paths. A full listing is of little interest to the human user, but can take hours or days to generate. Instead of a path list, DAGitty identifies the subdiagrams involved in causal and biasing paths and highlights them in different colors. This highlighting algorithm7 scales to very large diagrams. It provides a vivid impression about how causal and biasing effects “flow” in the diagram, that is, by which variables and causal arrows these effects are mediated.

The second problem with previous software has arisen when identifying minimally sufficient adjustment sets (MSA sets). According to causal diagram theory, adjustment for the covariates in an MSA set minimizes bias when estimating the total effect from exposure to outcome. A straightforward approach to find MSA sets is to check each covariate set to see whether it is an MSA set. In a diagram with 50 covariates, this means that 250 sets may have to be tested—a 16-digit number that is too large even for computers. To identify MSA sets more efficiently, we adapted an algorithm proposed recently for a related graph-theoretical problem.8 This algorithm is guaranteed to output the list of MSA sets reasonably quickly (ie, in polynomial time per MSA set output). Note, however, that very large or very regularly structured diagrams could in theory have millions of different MSA sets. If such diagrams become practically relevant, further research will be necessary to develop appropriate computational methods for helping the user to choose appropriate MSA sets.

The described algorithms enable DAGitty's graphical interface to instantly reflect changes made to the diagram, such as adding a new arrow or inverting an arrow with unclear causal direction. This way, users can interactively assess the effects of their modifications on minimally sufficient adjustment sets and the flow of causal and biasing effects. We anticipate that these interactive possibilities will help users to develop an intuition about causal diagram theory, and to compare and decide among various causal diagrams.

DAGitty is available under an open-source license, allowing free access, redistribution, and modification. It runs out of the box in most modern web browsers and is available for online use and download at: www.dagitty.net.

The performance issues are 2-fold. First, previous software employed back-tracking algorithms5 to enumerate and categorize all paths from exposure to outcome. This is a reasonable approach for small diagrams, but diagrams with tens of variables can already contain millions of paths. A full listing is of little interest to the human user, but can take hours or days to generate. Instead of a path list, DAGitty identifies the subdiagrams involved in causal and biasing paths and highlights them in different colors. This highlighting algorithm7 scales to very large diagrams. It provides a vivid impression about how causal and biasing effects “flow” in the diagram, that is, by which variables and causal arrows these effects are mediated.

The second problem with previous software has arisen when identifying minimally sufficient adjustment sets (MSA sets). According to causal diagram theory, adjustment for the covariates in an MSA set minimizes bias when estimating the total effect from exposure to outcome. A straightforward approach to find MSA sets is to check each covariate set to see whether it is an MSA set. In a diagram with 50 covariates, this means that 250 sets may have to be tested—a 16-digit number that is too large even for computers. To identify MSA sets more efficiently, we adapted an algorithm proposed recently for a related graph-theoretical problem.8 This algorithm is guaranteed to output the list of MSA sets reasonably quickly (ie, in polynomial time per MSA set output). Note, however, that very large or very regularly structured diagrams could in theory have millions of different MSA sets. If such diagrams become practically relevant, further research will be necessary to develop appropriate computational methods for helping the user to choose appropriate MSA sets.

The described algorithms enable DAGitty's graphical interface to instantly reflect changes made to the diagram, such as adding a new arrow or inverting an arrow with unclear causal direction. This way, users can interactively assess the effects of their modifications on minimally sufficient adjustment sets and the flow of causal and biasing effects. We anticipate that these interactive possibilities will help users to develop an intuition about causal diagram theory, and to compare and decide among various causal diagrams.

DAGitty is available under an open-source license, allowing free access, redistribution, and modification. It runs out of the box in most modern web browsers and is available for online use and download at: www.dagitty.net.

Original language | English |
---|---|

Journal | Epidemiology |

Volume | 22 |

Issue number | 5 |

Number of pages | 1 |

ISSN | 1044-3983 |

DOIs | |

Publication status | Published - 09.2011 |