
Explore how dna encodes genetic information, transcription copies it to rna, and translation produces proteins, with promoters, exons, introns, enhancers, and CpG island regulation.
Explain sanger sequencing as a method validating next generation data, sequencing genes or plasmids, and show how dideoxynucleotides terminate synthesis with chromatograms from gel electrophoresis in abi and scf formats.
Install finch tv, open ab1 or scf trace files, and explore sequences with vertical and horizontal scales; export to fasta or fastq for analysis and prepare segments for blast.
Explore how high-throughput sequencing enables genome and epigenomics studies, including SNPs, indels, and structural variants. Investigate regulatory elements, chromatin accessibility, and three-dimensional genome architecture with hi-c and chia-pet.
Explore FASTA and GenBank formats, including unique identifiers, and extensions like fna, faa, and frn, plus locus information, definitions, and accessions.
Understand gff3 and gtf annotation formats, their nine-column structure, and key fields such as seqid, source, type, start, end, score, strand, phase, and attributes; also learn bed track basics.
Understand vcf and bcf formats and the vital vcf header, and explore annotation fields: chromosome, position, id, ref, alt, qual, filter, plus sample fields like gt, gq, dp, and af.
Learn to extract regions from a sequence using extractseq by specifying start and end, handle fasta and genbank formats, and manage overlap removal with blastn and output options.
Explore advanced options to extract genomic features, including combining the cdss of a single gene into one sequence, translating cds to protein, and exporting in GenBank and NCBI FASTA formats.
this lecture shows filtering bam files to keep mapped reads by removing unmapped reads with -q and -f flags, sorting by coordinate, and optionally converting to fastq.
Extract fields from vcf or bcf files with bcftools query, producing a tab-delimited, user-defined output. The lecture introduces the lyve-set phylogenomics pipeline for studying foodborne pathogens at the genomics level.
Convert BAM to FASTQ by extracting unmapped reads and merging them into forward and reverse FASTQ files. Use r2 and q2 options to output read2 separately.
Learn to split sequences at long n strings with seqtk_cutn, remove gaps by adjusting nonn penalties, and extract the removed region in de bruijn graph-based genome assembly workflows using simplitigs.
Learn to concatenate multiple fasta exons into a single sequence using union and combine fasta, inspect overlaps, and verify the resulting length (444) and gc content with infoseq.
Learn how to remove poly-a tails from nucleotide sequences with trimest, including minimum tail length, allowed mismatches, and optional reverse-complement conversion for poly-t tails; see output headers reflect removals.
Learn to remove adapter sequences from nucleotide reads with vector strip, adjusting mismatch tolerance and interpreting the output statistics that report removed regions.
Learning many tools and manipulation methods helps you to accurately understand files, their components, and their outputs, thus broadening your horizons in bioinformatics applications in general.
Practical applications will be on nucleic acids
The programs used are popular, free, and online (majority) or software installed on all operating systems
The first part discusses the DNA Sanger Sequencing because of its great importance so far despite the old method, and we will talk in this part about:
Explanation of the Sanger Sequencing method, what files it produces, problems, and their causes.
A practical application to open and manipulate the Sanger Sequencing files using three different programs.
The second part discusses High-throughput Sequencing, which is the basis of recent nucleic acid analysis research, and we will talk in this part about:
Common High-throughput Sequencing methods are Illumina, Ion Torrent, Pacific Biosciences, and Oxford Nanopore.
Various applications of High-throughput Sequencing in different fields.
The third part discusses bioinformatics files, which are the raw material for biologists where sequences, alignments, variations, and annotations are stored. The files mainly found are:
FASTQ
FASTA
Genbank
GTF/GFF3
BED
SAM/BAM
BCF/VCF
Parts from the fourth to the last are practical applications on the aforementioned files using various programs from online packages such as:
Sequence Manipulation Suite 2.
EMBOSS.
Packages on the Galaxy platform:
Seqtk.
Bedtools.
Samtools.
Bcftools.
The course contains many programs, some of which are basic in manipulation and have been explained in detail, and other programs were found to help clearly understand the examples, and these were not explained in detail but were used to perform a specific function.
The quizzes are not yet complete.