Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing

Joke Reumers, Peter De Rijk, Hui Zhao, Anthony Liekens, Dominiek Smeets, John Cleary, Peter Van Loo, Maarten Van Den Bossche, Kirsten Catthoor, Bernard Sabbe, Evelyn Despierre, Ignace Vergote, Brian Hilbush, Diether Lambrechts, Jurgen Del-Favero

    Research output: Contribution to journalArticlepeer-review

    184 Scopus citations

    Abstract

    Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.

    Original languageEnglish (US)
    Pages (from-to)61-68
    Number of pages8
    JournalNature biotechnology
    Volume30
    Issue number1
    DOIs
    StatePublished - Jan 2012

    ASJC Scopus subject areas

    • Biotechnology
    • Bioengineering
    • Applied Microbiology and Biotechnology
    • Molecular Medicine
    • Biomedical Engineering

    Fingerprint

    Dive into the research topics of 'Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing'. Together they form a unique fingerprint.

    Cite this