
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Importance of Biological Database
Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.
Biological database design, development, and long-term management is a core area of the discipline of bioinformatics
Types of biological database on the base of biological molecules sequence, structure and functions:
Nucleic acid databases
Amino acid / protein databases
Specialized databases
Types of biological database:
There are three to four main types of biological databases on the base of data it contain;
Primary database
Secondary Databases
Specialized Database
Literature Databases
Primary Database : Primary databases (also known as data repositories) are highly organized, user-friendly gateways to the huge amount of biological data produced by researchers around the world. Most protein sequences found in databases are the product of conceptual translation of the genes and genomes determined using DNA sequencing.
Examples of these include Swiss-Prot & PIR for protein sequences, GenBank & DDBJ for Genome sequences and the Protein Databank for protein structures.
Secondary Database : Secondary databases comprise data derived from analyzing entries in primary databases. In most cases, they also provide tools to investigate further the genes and proteins.
Secondary databases contain information derived from primary sequence data which are in the form of regular expressions (patterns), Fingerprints, profiles blocks or Hidden Markov Models. The type of information stored in each of the secondary databases is different.
Examples of Secondary databases are as follows.
InterPro
UniProt
RefSeq
1000 Genomes Project
Primary Databases
Are populated with experimentally derived data
Original biological data.
Raw data submitted by researcher
DDBJ EMBL NCBI
Explained the different primary databases
DDBJ
EMBL
NCBI
Secondary Databases
Computationally processed or manually curated information,
based on primary databases
23andMe's
RefSeq
OMIM
1000 genome browser
Explained Secondary Databases
23andMe's
RefSeq
OMIM
1000 genome browser
Protein sequence databases
PROSITE: database of protein families and domains
Protein Information Resource (Georgetown University Medical Center [GUMC])
Swiss-Prot: protein knowledgebase (Swiss Institute of Bioinformatics)
NCBI: protein sequence and knowledgebase (National Center for Biotechnology Information)
Explained Protein Sequence Databases
PROSITE
NCBI
Swiss-Prot
Protein Information Resource
Protein structure databases
Protein Data Bank (PDB), comprising:
Protein Data Bank in Europe (PDBe)
Protein Data bank in Japan (PDBj)
Research Collaboratory for Structural Bioinformatics (RCSB)
Structural Classification of Proteins (SCOP)
CATH : Protein Structure Classification database
Explained Protein Structure Databases
Protein Data Bank
CATH
Structural Classification of Proteins
Protein model databases
ModBase
Similarity Matrix of Proteins (SIMAP)
Swiss-model
AAindex
Protein model databases
ModBase: database of comparative protein structure models
Similarity Matrix of Proteins (SIMAP): database of protein similarities computed using FASTA
Swiss-model: server and repository for protein structure models
AAindex: database of amino acid indices, amino acid mutation matrices, and pair-wise contact potentials
Protein-protein and other molecular interactions
BioGRID
RNA-binding protein database
Database of Interacting Proteins
IntAct
Protein-protein and other molecular interactions
BioGRID: general repository for interaction datasets
RNA-binding protein database
Database of Interacting Proteins
IntAct: open-source database for molecular interactions (EMBL-EBI)
Protein production is the biotechnological process of generating a specific protein. It is typically achieved by the manipulation of gene expression in an organism such that it expresses large amounts of a recombinant gene. This includes the transcription of the recombinant DNA to messenger RNA (mRNA), the translation of mRNA into polypeptide chains, which are ultimately folded into functional proteins and may be targeted to specific subcellular or extracellular locations.
Protein expression databases
Human Protein Atlas: aims at mapping all the human proteins in cells, tissues and organs
This Bioinformatics course is going to game changer for you. Currently, there is an explosion of biological data. Bioinformatics is at the intersection of biology and computer science.
What is Bioinformatics ?
In biology, bioinformatics is defined as, “the use of computer to store, retrieve, analyze or predict the composition or structure of bio-molecules” . Bioinformatics is the application of computational techniques and information technology to the organization and management of biological data. Classical bioinformatics deals primarily with sequence analysis.
Aims of Bioinformatics
Development of database containing all biological information.
Development of better tools for data designing, annotation and mining.
Design and development of drugs by using simulation software.
Design and development of software tools for protein structure prediction function, annotation and docking analysis.
Creation and development of software to improve tools for analyzing sequences for their function and similarity with other sequences
Biological Databases
Biological data are complex, exception-ridden, vast, and incomplete. Therefore several databases have been created and interpreted to ensure unambiguous results. A collection of biological data arranged in a computer-readable form that enhances the speed of search and retrieval and convenient to use is called a biological database. A good database must have updated information.
Importance of Biological Database
A range of information like biological sequences, structures, binding sites, metabolic interactions, molecular action, functional relationships, protein families, motifs and homologous can be retrieved by using biological databases. The main purpose of a biological database is to store and manage biological data and information in computer readable forms.
In this course we learned about the different biological databases that are being used in bioinformatics and get to know a little bit about their details. Mainly these databases are divided into four categories and we learned about them base by base. And explained the difference among the primary and secondary database and explained their utilization in bioinformatics.
Amino acid / protein databases
Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.
Sequence databases
CCDS The Consensus CDS protein set database
DDB JDNA Data Bank of Japan
ENA European Nucleotide Archive
GenBank GenBank nucleotide sequence database
Refseq NCBI Reference Sequence Database
UniGene Database of computationally identifies transcripts from the same locus
UniProtKB Universal Protein Resource (UniProt)
3D structure protein databases
DisProt Database of Protein Disorder
MobiDB Database of intrinsically disordered and mobile proteins
ModBase Database of Comparative Protein Structure Models
PDBsum Pictorial database of 3D structures in the Protein Data Bank
ProteinModelPortal Protein Model Portal of the PSI-Nature Structural Biology Knowledgebase
SMR Database of annotated 3D protein structure models
This course will be extremely helpful to students of data analyst and bioinformaticians because they use the databases a lot in their work.
If you guys have any questions or suggestions please let me know in instructor inbox I’ll try to answer all of your questions within 12 hours.