Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants

Butty, Adrien M.; Sargolzaei, Mehdi; Miglior, Filippo; Stothard, Paul; Schenkel, Flavio S.; Gredler-Grandl, Birgit; Baes, Christine Francoise (2019). Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants. Frontiers in genetics, 10(510), p. 510. Frontiers Media SA 10.3389/fgene.2019.00510

ButteEtAl-2019-a.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (11MB) | Preview

Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants' frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended.

Item Type:

Journal Article (Original Article)


05 Veterinary Medicine > Department of Clinical Research and Veterinary Public Health (DCR-VPH) > Institute of Genetics
05 Veterinary Medicine > Department of Clinical Research and Veterinary Public Health (DCR-VPH)

UniBE Contributor:

Baes, Christine Francoise


500 Science > 590 Animals (Zoology)
600 Technology > 630 Agriculture
500 Science > 570 Life sciences; biology




Frontiers Media SA




Christine Francoise Baes

Date Deposited:

07 Aug 2019 12:15

Last Modified:

16 Sep 2020 01:36

Publisher DOI:


PubMed ID:






Actions (login required)

Edit item Edit item
Provide Feedback