CnvKit is also available, but it is a CLI and not easy to use. In addition to this, PyCogent, which was developed by researchers at NCBI from the National Institutes of Health (NIH), is a useful tool. However, they are not easy to use. We will use a package called Bio
(https://github.com/biopython/biopython/tree/master/Bio) and libraries from Python programming for biology.
In general, every experiment, research project, or study has sequence as the key object that is used in bioinformatics. As a mathematician, my visual thought of a sequence relates to a string with certain patterns (such as ATAGCATATGCT
). To begin with, here is a simple example that shows a sequence, GC ratio, and codons:
from Bio.Seq import Seq from Bio.Alphabet import IUPAC from Bio.SeqUtils import GC def DNACodons(seq): end = len(seq) - (len(seq) % 3) – 1 codons = [seq[i:i+3] for i in range(0, end, 3)] return codons DNACodons(my_seq) my_seq = Seq('GGTCGATGGGCCTAGCAGCATATCTGAGC', IUPAC.unambiguous_dna) print "GC Result==>", GC(my_seq) DNACodons(my_seq) [Seq('GGT', IUPACUnambiguousDNA()), Seq('CGA', IUPACUnambiguousDNA()), Seq('TGG', IUPACUnambiguousDNA()), Seq('GCC', IUPACUnambiguousDNA()), Seq('TAG', IUPACUnambiguousDNA()), Seq('CAG', IUPACUnambiguousDNA()), Seq('CAT', IUPACUnambiguousDNA()), Seq('ATC', IUPACUnambiguousDNA()), Seq('TGA', IUPACUnambiguousDNA())] GC Result==> 58.6206896552
Let's consider two molecular structures, collect certain atoms, and try to plot their positions with their Phi and Psi angles. The allowed molecular structures are DNA, RNA, and protein. Using the Modelling
and Maths
modules from the PythonForBiology
library, we will attempt to plot these structures side by side:
The two plots uses data from two files: testTransform.pdb
and 1A12.pub
. This contains the regulator of chromosome condensation (RCC1) of humans, as shown in the following code:
# bio_1.py # import matplotlib.pyplot as plt from phipsi import getPhiPsi from Modelling import getStructuresFromFile def genPhiPsi(fileName): struc = getStructuresFromFile(fileName)[0] phiList = [] psiList = [] for chain in struc.chains: for residue in chain.residues[1:-1]: phi, psi = getPhiPsi(residue) phiList.append(phi) psiList.append(psi) return phiList, psiList if __name__ == '__main__': phiList = [] psiList = [] phiList, psiList = genPhiPsi('examples/testTransform.pdb') phiList2 = [] psiList2 = [] phiList2, psiList2 = genPhiPsi('examples/1A12.pdb') plt.figure(figsize=(12,9)) f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(12,9)) ax1.scatter(phiList, psiList, s=90, alpha=0.65) ax1.axis([-160,160,-180,180]) ax1.set_title('Ramachandran Plot for Two Structures') ax2.scatter(phiList2, psiList2, s=60, alpha=0.65, color='r') plt.show()
The library used in this example will be available with the code examples in a file called PythonForBiology.zip
. You can extract it and run this code via a command line, assuming that you have numpy
and matplotlib
installed.