Estimation of high-dimensional directed acyclic graphs with surrogate intervention

Min Jin Ha, Wei Sun

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Directed acyclic graphs (DAGs) have been used to describe causal relationships between variables. The standard method for determining such relations uses interventional data. For complex systems with high-dimensional data, however, such interventional data are often not available. Therefore, it is desirable to estimate causal structure from observational data without subjecting variables to interventions. Observational data can be used to estimate the skeleton of a DAG and the directions of a limited number of edges. We develop a Bayesian framework to estimate a DAG using surrogate interventional data, where the interventions are applied to a set of external variables, and thus such interventions are considered to be surrogate interventions on the variables of interest. Our work is motivated by expression quantitative trait locus (eQTL) studies, where the variables of interest are the expression of genes, the external variables are DNA variations, and interventions are applied to DNA variants during the process of a randomly selected DNA allele being passed to a child from either parent. Our method, surrogate intervention recovery of a DAG ($\texttt{sirDAG}$), first constructs a DAG skeleton using penalized regressions and the subsequent partial correlation tests, and then estimates the posterior probabilities of all the edge directions after incorporating DNA variant data. We demonstrate the utilities of $\texttt{sirDAG}$ by simulation and an application to an eQTL study for 550 breast cancer patients.

Original languageEnglish (US)
Pages (from-to)659-675
Number of pages17
JournalBiostatistics
Volume21
Issue number4
DOIs
StatePublished - Oct 1 2020

Keywords

  • Directed acyclic graphs
  • Surrogate intervention
  • eQTL

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'Estimation of high-dimensional directed acyclic graphs with surrogate intervention'. Together they form a unique fingerprint.

Cite this