
Build a kraken2 database for metagenomics, matching sample DNA to reference sequences to identify pathogens and microbial communities in biomedical and environmental contexts.
Learn to process sequences from outside NCBI by identifying the species with the NCBI taxonomy, obtaining the taxonomy ID, and adding an annotation line with two pipes and the ID.
Build a kraken2 database by downloading taxonomy into the database, inspect accession to taxid mappings, and run the build with the kimberlin flag 41 to create and verify the database.
Inspect your kraken2 database to verify its structure and contents using the inspect command, exploring the hierarchical taxonomic levels, taxon IDs, and numbers at each node.
Explain taxonomy basics and the eight main levels: domain, kingdom, phylum, class, order, family, genus, species. Illustrate the three domains and issues like gram staining with pathogenic E. coli.
Learn how kraken2 environmental variables govern thread counts and database paths, override defaults with a flag, and configure PATH to access the database across directories.
Produce MPA style reports from Kraken2 outputs, detailing taxonomic classifications and reads mapped to each category, and generate heat maps showing intensity by sample and species.
Build and query a Kraken2 database by adding custom or standard virus and organism genomes, then classify unknown metagenomic samples and visualize results with MEGAN6.
Kraken2 is a well-known Next Generation Sequencing (NGS) metagenomics classification tool. It is used widely in scientific research. With kraken2 you can build a database using whole genome sequences to classify read sequences against to identify unknown samples. It is easy to use and performs very rapid sample classification. Kraken2 has application in biology, medical research and even geology.
In this video we will see a general overview of the algorithm behind the kraken2 database language. Then, we will do a hands-on example whereby we download read data sets from the SRA database from NCBI (the National Center for Biotechnology Information) and classify them against the database that we have built. We will also look at the Pavian visualization software which presents a graphical overview of our results (i.e. Stankey diagram). Figures made in this program can also be used for publication purposes.
Students of this course are mainly either biology or medical students or researchers. Some knowledge of Linux is required to take the course, but there are only few commands that need clarification, but they will be explained in detail.
In total, it is worthwhile learning the skills used in this course, which can give you the edge in metagenomics research and analysis.