Building and querying a kraken2 database

Name: Building and querying a kraken2 database
Rating: 4.5 (18 reviews)

A useful Next Generation Sequencing (NGS) Metagenomics tool

Highest Rated

Created byMatthew Cserhati

Last updated 2/2021

English

What you'll learn

how to design, build and query a kraken2 database
how to classify Next Generation Sequence reads to a metagenomics database
how to visualize a classification report in Pavian

Course content

4 sections • 20 lectures • 2h 0m total length

Introduction5:03
Build a kraken2 database for metagenomics, matching sample DNA to reference sequences to identify pathogens and microbial communities in biomedical and environmental contexts.
Installing kraken22:49
Install kraken2 on a new system using commands from a supplementary file, switch users, download and save a key file, run the install, and verify with kraken2 --help.
Kraken2 database and query input9:42
Identify Kraken2 database input types—whole or partial genomes and individual genes—and describe the fescue file query input, with single or paired reads 35–200 bases and header, sequence, plus, quality lines.
Kraken2 database input
Search the SRA website for SARS-CoV-2 data sets
How to use sequences from outside NCBI2:58
Learn to process sequences from outside NCBI by identifying the species with the NCBI taxonomy, obtaining the taxonomy ID, and adding an annotation line with two pipes and the ID.
Find NCBI taxon id for several species
The kraken2 algorithm8:34
Explore how the kraken2 algorithm classifies reads by chopping sequences into Kimura units, mapping them to a taxonomy tree, and using the lowest common ancestor to assign taxa.
Understanding the kraken2 algorithm
How many sequences to add to your database?4:12
Balance the number of genomes in a kraken2 database to optimize coverage and speed, since more genomes boost classification accuracy but increase build time for influenza B variants.

SIBO database overview2:45
Explore building and querying a kraken2 database by downloading genomes for 12 bacteria, building the database, and mapping sequencing reads to compare healthy and SIBO samples.
SIBO project overview
Adding genomes to the SIBO kraken2 database10:12
Build and query a kraken2 database by collecting bacterial genomes, selecting sequences, and automating their addition with a script that updates the library and mapping files while masking low-complexity regions.
Adding genomes to a kraken2 database
Other types of kraken2 databases and libraries3:02
Learn about kraken2 database variants beyond the standard database, including downloadable libraries for bacteria, viruses, protozoa, and human sequences, and how to build or augment the database with specific libraries.
types of other kraken2 databases
Building the database7:48
Build a kraken2 database by downloading taxonomy into the database, inspect accession to taxid mappings, and run the build with the kimberlin flag 41 to create and verify the database.
What is in the database? kraken2-inspect4:53
Inspect your kraken2 database to verify its structure and contents using the inspect command, exploring the hierarchical taxonomic levels, taxon IDs, and numbers at each node.
A note on classification3:47
Explain taxonomy basics and the eight main levels: domain, kingdom, phylum, class, order, family, genus, species. Illustrate the three domains and issues like gram staining with pathogenic E. coli.
How do we classify organisms?

Downloading SIBO data sets from the SRA database4:58
Download two Illumina data sets for small intestinal bacterial overgrowth from the SRA database, producing fastq files with fasterq-dump, then prepare to run them against the SIBO database.
Running a query and interpreting output6:53
Run a kraken2 query against the database, with paired or single reads. Specify output and report files, then interpret the taxonomic output and unclassified reads.
Running queries and output
Interpreting a kraken2 report14:16
interpret a kraken2 report by decoding its six columns, including the percent of reads mapped at each taxonomic level, then examine family-level mappings and sample-to-sample correlations.
Querying the SIBO database with a Parkinson's gut sample7:01
Explore querying a Kraken2 database with a Parkinson's gut sample from a SIBO dataset to identify bacteria linked to Parkinson's, including a 42.1% mapping to a Parkinson's-associated bacterial family.
Visualizing results with Pavian7:51
Visualize metagenomic results with Pavian, an online tool that uploads partitioned kraken2 outputs and displays taxonomic breakdowns, Sankey diagrams, and ranked tables of classified reads.
Pavian test quiz

kraken2 environmental variables5:16
Learn how kraken2 environmental variables govern thread counts and database paths, override defaults with a flag, and configure PATH to access the database across directories.
Kraken2 environmental variables
MPA style reports3:49
Produce MPA style reports from Kraken2 outputs, detailing taxonomic classifications and reads mapped to each category, and generate heat maps showing intensity by sample and species.
Summary and Conclusion, further ideas and other metagenomics tools4:49
Build and query a Kraken2 database by adding custom or standard virus and organism genomes, then classify unknown metagenomic samples and visualize results with MEGAN6.

Requirements

basic Linux knowledge
understand basic shell commands

Description

Kraken2 is a well-known Next Generation Sequencing (NGS) metagenomics classification tool. It is used widely in scientific research. With kraken2 you can build a database using whole genome sequences to classify read sequences against to identify unknown samples. It is easy to use and performs very rapid sample classification. Kraken2 has application in biology, medical research and even geology.

In this video we will see a general overview of the algorithm behind the kraken2 database language. Then, we will do a hands-on example whereby we download read data sets from the SRA database from NCBI (the National Center for Biotechnology Information) and classify them against the database that we have built. We will also look at the Pavian visualization software which presents a graphical overview of our results (i.e. Stankey diagram). Figures made in this program can also be used for publication purposes.

Students of this course are mainly either biology or medical students or researchers. Some knowledge of Linux is required to take the course, but there are only few commands that need clarification, but they will be explained in detail.

In total, it is worthwhile learning the skills used in this course, which can give you the edge in metagenomics research and analysis.

Who this course is for:

students studying bioinformatics, biotechnology

Building and querying a kraken2 database

What you'll learn

Explore related topics

Course content

Introduction6 lectures • 33min

Building the SIBO kraken2 database6 lectures • 32min

Querying the database5 lectures • 41min

Extra kraken2 material3 lectures • 14min

Requirements

Description

Who this course is for: