Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets

De-Kayne, Rishi; Frei, David; Greenway, Ryan; Mendes, Sofia L.; Retel, Cas; Feulner, Philine G. D. (2021). Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets. Molecular ecology resources, 21(3), pp. 653-660. Wiley 10.1111/1755-0998.13309

[img]
Preview
Text
De-Kayne_et_al._2020_MolEcolRes_Sequencing_platform_shifts_provide_opportunities_but_pose_challenges_for_combining_genomic_datasets_accepted.pdf - Accepted Version
Available under License Publisher holds Copyright.

Download (8MB) | Preview
[img] Text
1755-0998.13309.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (446kB) | Request a copy

Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic datasets in an increasing number of non-model species. Additionally, this new data provides the opportunity for combining datasets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina’s base calling technology from a four-channel to a two-channel system. We caution that when data is combined from these two systems, erroneous guanine base calls that result from the two channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artifacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.

Item Type:

Journal Article (Further Contribution)

Division/Institute:

08 Faculty of Science > Department of Biology > Institute of Ecology and Evolution (IEE)
08 Faculty of Science > Department of Biology > Institute of Ecology and Evolution (IEE) > Aquatic Ecology

UniBE Contributor:

De-Kayne, Rishi, Frei, David Florian, Feulner, Philine

Subjects:

500 Science > 570 Life sciences; biology

ISSN:

1755-098X

Publisher:

Wiley

Language:

English

Submitter:

Marcel Häsler

Date Deposited:

15 Jan 2021 09:08

Last Modified:

05 Dec 2022 15:43

Publisher DOI:

10.1111/1755-0998.13309

PubMed ID:

33314612

BORIS DOI:

10.48350/149659

URI:

https://boris.unibe.ch/id/eprint/149659

Actions (login required)

Edit item Edit item
Provide Feedback