TY - JOUR
T1 - Profiling human pathogenic repeat expansion regions by synergistic and multi-level impacts on molecular connections
AU - Fan, Cong
AU - Chen, Ken
AU - Wang, Yukai
AU - Ball, Edward V.
AU - Stenson, Peter D.
AU - Mort, Matthew
AU - Bacolla, Albino
AU - Kehrer-Sawatzki, Hildegard
AU - Tainer, John A.
AU - Cooper, David N.
AU - Zhao, Huiying
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2023/2
Y1 - 2023/2
N2 - Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5′UTRs and 5′genes but were not significantly different from controls in introns, 3′UTRs and 3′genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5′genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx (http://biomed.nscc-gz.cn/zhaolab/geneprediction/#) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
AB - Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5′UTRs and 5′genes but were not significantly different from controls in introns, 3′UTRs and 3′genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5′genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx (http://biomed.nscc-gz.cn/zhaolab/geneprediction/#) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.
UR - http://www.scopus.com/inward/record.url?scp=85141361042&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141361042&partnerID=8YFLogxK
U2 - 10.1007/s00439-022-02500-6
DO - 10.1007/s00439-022-02500-6
M3 - Article
C2 - 36344696
AN - SCOPUS:85141361042
SN - 0340-6717
VL - 142
SP - 245
EP - 274
JO - Human genetics
JF - Human genetics
IS - 2
ER -