Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM

Nathan Osborne, Christine B. Peterson, Marina Vannucci

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation–maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.

Original languageEnglish (US)
Pages (from-to)163-175
Number of pages13
JournalJournal of Computational and Graphical Statistics
Volume31
Issue number1
DOIs
StatePublished - 2022

Keywords

  • Bayesian hierarchical model
  • Count data
  • EM algorithm
  • Graphical model
  • Microbiome data
  • Variational inference

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty
  • Discrete Mathematics and Combinatorics

MD Anderson CCSG core facilities

  • Biostatistics Resource Group

Fingerprint

Dive into the research topics of 'Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM'. Together they form a unique fingerprint.

Cite this