User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences

Vasar, Martti; Davison, John; Neuenkamp, Lena; Sepp, Siim-Kaarel; Young, J. Peter W.; Moora, Mari; Öpik, Maarja (2021). User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences. Molecular ecology resources, 21(4), pp. 1380-1392. Wiley 10.1111/1755-0998.13340

[img] Text
2021_MolEcolResour.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (671kB) | Request a copy

High‐throughput sequencing (HTS) of multiple organisms in parallel (metabarcoding) has become a routine and cost‐effective method for the analysis of microbial communities in environmental samples. However, careful data treatment is required to identify potential errors in HTS data, and the large volume of data generated by HTS requires in‐house experience with command line tools for downstream analysis. This paper introduces a pipeline that incorporates the most common command line tools into an easy‐to‐use graphical interface—gDAT. By using the Python scripting language, the pipeline is compatible with the latest Windows, macOS and Linux operating systems. The pipeline supports analysis of Sanger, 454, IonTorrent, Illumina and PacBio sequences, allows custom modification of quality filtering steps, and implements both open and closed‐reference operational taxonomic unit‐picking for sequence identification. Predefined parameters are optimized for analysis of small subunit (SSU) rRNA gene amplicons from arbuscular mycorrhizal fungi, but the pipeline is widely applicable to metabarcoding studies targeting a broad range of organisms. The pipeline was additionally tested with data using general eukaryotic primers from the SSU gene region and fungal primers from the internal transcribed spacer (ITS) marker region. We describe the pipeline design and evaluate its performance and speed by conducting analysis of example data sets using different marker regions sequenced on Illumina platforms. The graphical interface, with the option to use the command line if needed, provides an accessible tool for rapid data analysis with repeatability and logging capabilities. Keeping the software open‐source maximizes code accessibility, allowing scrutiny and bug fixes by the community.

Item Type:

Journal Article (Original Article)

Division/Institute:

08 Faculty of Science > Department of Biology > Institute of Plant Sciences (IPS) > Plant Ecology
08 Faculty of Science > Department of Biology > Institute of Plant Sciences (IPS)

UniBE Contributor:

Neuenkamp, Lena

Subjects:

500 Science > 580 Plants (Botany)

ISSN:

1755-098X

Publisher:

Wiley

Language:

English

Submitter:

Peter Alfred von Ballmoos-Haas

Date Deposited:

01 Apr 2021 14:02

Last Modified:

14 Apr 2021 01:34

Publisher DOI:

10.1111/1755-0998.13340

PubMed ID:

33527735

Uncontrolled Keywords:

arbuscular mycorrhizal fungi, high-throughput sequencing, pipeline, sequencing data analysis, software, teaching tool

BORIS DOI:

10.7892/boris.153486

URI:

https://boris.unibe.ch/id/eprint/153486

Actions (login required)

Edit item Edit item
Provide Feedback