TY - JOUR
T1 - Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA
AU - Lüth, Theresa
AU - Schaake, Susen
AU - Grünewald, Anne
AU - May, Patrick
AU - Trinh, Joanne
AU - Weissensteiner, Hansi
N1 - Copyright © 2022 Lüth, Schaake, Grünewald, May, Trinh and Weissensteiner.
PY - 2022
Y1 - 2022
N2 -
Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of nanopore sequencing for low-frequency single nucleotide variant detection.
Methods: We investigated the impact of base-calling, alignment/mapping, quality control steps, and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2%, and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr), and three variant callers (i.e., Mutserve2, Freebayes, and Nanopanel2) to compare low-frequency variants. We used F
1 score measurements to assess the performance of variant calling.
Results: We observed a mean read length of 11 kb and a mean overall read quality of 15. Ngmlr showed not only higher F
1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F
1 score = 0.83; false-positive allele frequencies < 0.17) compared to Minimap2 (mean F
1 score = 0.82; false-positive AF < 0.06). Mutserve2 had the highest F
1 scores (5% level: F
1 score >0.99, 2% level: F
1 score >0.54, and 1% level: F
1 score >0.70) across all callers and mixture levels.
Conclusion: We here present the benchmarking for low-frequency variant calling with nanopore sequencing by identifying current limitations.
AB -
Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of nanopore sequencing for low-frequency single nucleotide variant detection.
Methods: We investigated the impact of base-calling, alignment/mapping, quality control steps, and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2%, and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr), and three variant callers (i.e., Mutserve2, Freebayes, and Nanopanel2) to compare low-frequency variants. We used F
1 score measurements to assess the performance of variant calling.
Results: We observed a mean read length of 11 kb and a mean overall read quality of 15. Ngmlr showed not only higher F
1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F
1 score = 0.83; false-positive allele frequencies < 0.17) compared to Minimap2 (mean F
1 score = 0.82; false-positive AF < 0.06). Mutserve2 had the highest F
1 scores (5% level: F
1 score >0.99, 2% level: F
1 score >0.54, and 1% level: F
1 score >0.70) across all callers and mixture levels.
Conclusion: We here present the benchmarking for low-frequency variant calling with nanopore sequencing by identifying current limitations.
U2 - 10.3389/fgene.2022.887644
DO - 10.3389/fgene.2022.887644
M3 - Journal articles
C2 - 35664331
SN - 1664-8021
VL - 13
SP - 887644
JO - Frontiers in Genetics
JF - Frontiers in Genetics
ER -