TY - JOUR
T1 - Misidentification of MLL3 and other mutations in cancer due to highly homologous genomic regions
AU - Bowler, Timothy G.
AU - Pradhan, Kith
AU - Kong, Yu
AU - Bartenstein, Matthias
AU - Morrone, Kerry A.
AU - Sridharan, Ashwin
AU - Kessel, Rachel M.
AU - Shastri, Aditi
AU - Giricz, Orsi
AU - Bhagat, Tushar D.
AU - Gordon-Mitchell, Shanisha
AU - Rohanizadegan, Mersedeh
AU - Hooda, Lauren
AU - Datt, Ishan
AU - Przychodzen, Bartlomiej P.
AU - Parmar, Simrit
AU - Maqbool, Shahina
AU - Maciejewski, Jaroslaw P.
AU - Steidl, Ulrich
AU - Greally, John M.
AU - Verma, Amit
N1 - Publisher Copyright:
© 2019, © 2019 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2019/11/10
Y1 - 2019/11/10
N2 - The MLL3 gene has been shown to be recurrently mutated in many malignancies including in families with acute myeloid leukemia. We demonstrate that many MLL3 variant calls made by exome sequencing are false positives due to misalignment to homologous regions, including a region on chr21, and can only be validated by long-range PCR. Numerous other recurrently mutated genes reported in COSMIC and TCGA databases have pseudogenes and cannot also be validated by conventional short read-based sequencing approaches. Genome-wide identification of pseudogene regions demonstrates that frequency of these homologous regions is increased with sequencing read lengths below 200 bps. To enable identification of poor quality sequencing variants in prospective studies, we generated novel genome-wide maps of regions with poor mappability that can be used in variant calling algorithms. Taken together, our findings reveal that pseudogene regions are a source of false-positive mutations in cancers.
AB - The MLL3 gene has been shown to be recurrently mutated in many malignancies including in families with acute myeloid leukemia. We demonstrate that many MLL3 variant calls made by exome sequencing are false positives due to misalignment to homologous regions, including a region on chr21, and can only be validated by long-range PCR. Numerous other recurrently mutated genes reported in COSMIC and TCGA databases have pseudogenes and cannot also be validated by conventional short read-based sequencing approaches. Genome-wide identification of pseudogene regions demonstrates that frequency of these homologous regions is increased with sequencing read lengths below 200 bps. To enable identification of poor quality sequencing variants in prospective studies, we generated novel genome-wide maps of regions with poor mappability that can be used in variant calling algorithms. Taken together, our findings reveal that pseudogene regions are a source of false-positive mutations in cancers.
KW - AML
KW - MLL3
KW - pseudogenes
UR - http://www.scopus.com/inward/record.url?scp=85068638138&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068638138&partnerID=8YFLogxK
U2 - 10.1080/10428194.2019.1630620
DO - 10.1080/10428194.2019.1630620
M3 - Article
C2 - 31288594
AN - SCOPUS:85068638138
SN - 1042-8194
VL - 60
SP - 3132
EP - 3137
JO - Leukemia and Lymphoma
JF - Leukemia and Lymphoma
IS - 13
ER -