Hey, how’re things? Welcome back to another of my introductions. This one is all about performing bioinformatics in Python. I like doing introductions because this is the moment when we get time to flesh out the subject.
At later stages when we’re learning algorithms and working on datasets, we don’t really afford going back and explaining three pages of fundamentals.
So this is an intro to bioinformatics in Python – the application of statistics and computer sciences to the field of molecular biology.
The Need Of Bioinformatics in Computer Science
The key purpose of bioinformatics is to improve the knowledge of biological processes. And to accomplish this objective, it focuses on the creation and implementation of computationally intensive techniques. Some of the techniques can be outlined as pattern recognition, data mining, machine learning algorithms, and visualization.
Major research sectors
Many major research sectors make use of bioinformatics. To name a few, here is a small list of sectors:
- Sequence alignment
- Gene finding
- Genome assembly
- Drug design and discovery
- Protein structure alignment and prediction
- Gene expression prediction
- Protein-protein interactions
- Genome-wide correlation studies
- Evolution modeling
Bioinformatics is inter-disciplinary. And if you are a biologist, you can find that your studies will greatly benefit from the knowledge of bioinformatics.
Jobs in the field of Bioinformatics
The job industry is eager to find people with bioinformatics expertise. Large pharmaceutical, biotech, and software firms aim to employ experienced bioinformatics-experts to work on biological and health care projects.
Here are two of the major research organizations conducting active research:
- NCBI (National Center for Biotechnology Information)
- RCSB PDB (Research Collaboratory for Structural Bioinformatics PDB)
Basic Terminologies In The Study Of Bioinformatics
Let us understand the study of bioinformatics now. The below is a list of some of the most basic elements of biological studies.
1. Amino acids
Amino Acids form the fundamental structure of proteins. I don’t mean the meat you had for lunch. It’s more along the lines of what creates those proteins – amino acids.
Amino acids are the building blocks of proteins. In eukaryotes, there are 20 standard amino acids out of which almost all proteins are made.
There are around 500 known amino acids.
For animals, producing proteins that are very long chains of amino acids is the most essential thing amino acids do. Each protein has its own amino acid sequence, and the sequence allows the protein to take multiple forms and have different functions.
Amino acids are like the protein alphabet; even if you have just a few letters, you can make several different phrases if you connect them.
2. Genes and DNA
How does DNA come into all of this?
The molecule that carries the genetic code of any species is DNA, short for deoxyribonucleic acid. Animals, fungi, protists, archaea, and bacteria are involved in this.
DNA is in the body in each cell and tells cells what proteins to make. Often, enzymes are certain proteins. Kids inherit DNA from their parents. This is why kids share characteristics, such as skin, hair, and eye color, with their parents. A child’s DNA is a mixture of the parents’ DNA.
A section of DNA that contains instructions to make a protein is called a gene.
3. FASTA format
The FASTA format has become a near-universal standard in the field of bioinformatics.
It is a text-based format used to display either nucleotide sequences or amino acid (protein) sequences, using single-letter codes to represent nucleotides or amino acids.
It looks like this:
> sequence A ggtccccaatattgtgatataattaaaattatattcatat tctgtacaaacacctatattagagcttgccagaaaaaacacttttaggaagtcctctagcatcttctttgaagcgttgtc > sequence B ggtaagtcctctaaatattgtgccagaaaaaacacttttaatataattaaaattatattca tattctgttgggctatattagagccatcgtacaaacacccccttctttgaagcgttgtc
The style also makes it possible to precede the sequences by sequence names and remarks.
To read more about the FASTA format, go here: https://en.wikipedia.org/wiki/FASTA_format
Bioinformatics in Python using BioPython
The Biopython Project is an open-source series of non-commercial Python computational biology and bioinformatics software developed by an international developers’ group.
It’s very easy to install the library using the pip command:
pip install biopython
Here’s a whole tutorial to download and setup if you face any problem: BioPython setup tutorial
In the upcoming articles, we’ll slowly cover a few more important sections like BLAST and PDB visualization tools.
See you all next time. Bookmark the site and follow me as an author to stay updated. Also, if you’re interested in Data Science as a whole, check out some of my other articles.