Evaluating the accuracy of variant calling methods using the frequency of parent-offspring genotype mismatch.

Jasper, Russ J; McDonald, Tegan Krista; Singh, Pooja; Lu, Mengmeng; Rougeux, Clément; Lind, Brandon M; Yeaman, Sam (2022). Evaluating the accuracy of variant calling methods using the frequency of parent-offspring genotype mismatch. Molecular ecology resources, 22(7), pp. 2524-2533. Wiley 10.1111/1755-0998.13628

Preview

Text
Molecular_Ecology_Resources_-_2022_-_Jasper_-_Evaluating_the_accuracy_of_variant_calling_methods_using_the_frequency_of.pdf - Accepted Version
Available under License Publisher holds Copyright.
Download (12MB) | Preview

The use of next-generation sequencing (NGS) datasets has increased dramatically over the last decade, but there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single lodgepole pine (Pinus contorta) parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the single nucleotide polymorphism (SNP) genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded more SNPs and higher error rates by one to two orders of magnitude, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.

Item Type:	Journal Article (Original Article)
Division/Institute:	08 Faculty of Science > Department of Biology > Institute of Ecology and Evolution (IEE) 08 Faculty of Science > Department of Biology > Institute of Ecology and Evolution (IEE) > Aquatic Ecology
UniBE Contributor:	Singh, Pooja
Subjects:	500 Science > 570 Life sciences; biology
ISSN:	1755-0998
Publisher:	Wiley
Language:	English
Submitter:	Pubmed Import
Date Deposited:	06 May 2022 09:37
Last Modified:	07 May 2023 00:25
Publisher DOI:	10.1111/1755-0998.13628
PubMed ID:	35510784
Uncontrolled Keywords:	Bioinformatics Genomics Genotyping Next generation sequencing Non-model Single Nucleotide Polymorphism
BORIS DOI:	10.48350/169771
URI:	https://boris.unibe.ch/id/eprint/169771

Actions (login required)

Edit item

Evaluating the accuracy of variant calling methods using the frequency of parent-offspring genotype mismatch.

Interest & Impact

Downloads

Citations

Search

Services

Actions (login required)

Item Type:

Division/Institute:

UniBE Contributor:

Subjects:

ISSN:

Publisher:

Language:

Submitter:

Date Deposited:

Last Modified:

Publisher DOI:

PubMed ID:

Uncontrolled Keywords:

BORIS DOI:

URI:

Actions (login required)