ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Liangliang Zhang; Yushu Shi; Kim Anh Do; Christine B. Peterson; Robert R. Jenq

doi:10.1186/s12859-021-04061-3

ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Liangliang Zhang, Yushu Shi, Kim Anh Do, Christine B. Peterson, Robert R. Jenq

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings.

Original language	English (US)
Article number	126
Journal	BMC bioinformatics
Volume	22
Issue number	1
DOIs	https://doi.org/10.1186/s12859-021-04061-3
State	Published - Dec 2021

Keywords

Differential test
Feature selection
Fragility index
Microbiome
Permutation
Robustness

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

MD Anderson CCSG core facilities

Biostatistics Resource Group

Access to Document

10.1186/s12859-021-04061-3

Cite this

@article{2523034be008409c9234b218ce87ee7f,

title = "ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries",

abstract = "Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings.",

keywords = "Differential test, Feature selection, Fragility index, Microbiome, Permutation, Robustness",

author = "Liangliang Zhang and Yushu Shi and Do, {Kim Anh} and Peterson, {Christine B.} and Jenq, {Robert R.}",

note = "Funding Information: KAD is partially supported by MD Anderson Moon Shot Programs, Prostate Cancer SPORE P50CA140388, NIH/NCI CCSG Grant P30CA016672, CCTS 5UL1TR000371, and CPRIT RP160693 Grants. CBP is partially supported by NIH/NCI CCSG Grant P30CA016672 and MD Anderson Moon Shot Programs. RRJ is partially supported by NIH R01 HL124112 and CPRIT RR160089 Grants. The funders played no role in the design of the study, analysis of the data, or writing the manuscript. Publisher Copyright: {\textcopyright} 2021, The Author(s).",

year = "2021",

month = dec,

doi = "10.1186/s12859-021-04061-3",

language = "English (US)",

volume = "22",

journal = "BMC bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - ProgPerm

T2 - Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

AU - Zhang, Liangliang

AU - Shi, Yushu

AU - Do, Kim Anh

AU - Peterson, Christine B.

AU - Jenq, Robert R.

N1 - Funding Information: KAD is partially supported by MD Anderson Moon Shot Programs, Prostate Cancer SPORE P50CA140388, NIH/NCI CCSG Grant P30CA016672, CCTS 5UL1TR000371, and CPRIT RP160693 Grants. CBP is partially supported by NIH/NCI CCSG Grant P30CA016672 and MD Anderson Moon Shot Programs. RRJ is partially supported by NIH R01 HL124112 and CPRIT RR160089 Grants. The funders played no role in the design of the study, analysis of the data, or writing the manuscript. Publisher Copyright: © 2021, The Author(s).

PY - 2021/12

Y1 - 2021/12

N2 - Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings.

AB - Background: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings.

KW - Differential test

KW - Feature selection

KW - Fragility index

KW - Microbiome

KW - Permutation

KW - Robustness

UR - http://www.scopus.com/inward/record.url?scp=85102703969&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85102703969&partnerID=8YFLogxK

U2 - 10.1186/s12859-021-04061-3

DO - 10.1186/s12859-021-04061-3

M3 - Article

C2 - 33731016

AN - SCOPUS:85102703969

SN - 1471-2105

VL - 22

JO - BMC bioinformatics

JF - BMC bioinformatics

IS - 1

M1 - 126

ER -

ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Abstract

Keywords

ASJC Scopus subject areas

MD Anderson CCSG core facilities

Access to Document

Other files and links

Fingerprint

Cite this