BLAST Output Files

The following is part of a BLAST output file. I created it by entering a few lines of the sample.dna file from Chapter 8 into the BLAST program at the NCBI web site, without changing any of the default parameters. I then saved the output as text in the file blst.txt, which is available from this book's web site. I've used it repeatedly in the parsing routines throughout this chapter. Because the output is several pages long, I've truncated it here to show the beginning, the middle, and the end of the file.

BLASTN 2.1.3 [Apr-11-2001]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
RID: 991533563-27495-9092
         (400 letters)

Database: nt
           868,831 sequences; 3,298,558,333 total letters

                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value

dbj|AB031069.1|AB031069 Homo sapiens PCCX1 mRNA for protein cont...   793  0.0
ref|NM_014593.1| Homo sapiens CpG binding protein (CGBP), mRNA        779  0.0
gb|AF149758.1|AF149758 Homo sapiens CpG binding protein (CGBP) m...   779  0.0
ref|XM_008699.3| Homo sapiens CpG binding protein (CGBP), mRNA        765  0.0
emb|AL136862.1|HSM801830 Homo sapiens mRNA; cDNA DKFZp434F174 (f...   450  e-124
emb|AJ132339.1|HSA132339 Homo sapiens CpG island sequence, subcl...   446  e-123
emb|AJ236590.1|HSA236590 Homo sapiens chromosome 18 CpG island D...   406  e-111
dbj|AK010337.1|AK010337 Mus musculus ES cells cDNA, RIKEN full-l...   234  3e-59
dbj|AK017941.1|AK017941 Mus musculus adult male thymus cDNA, RIK...   210  5e-52
gb|AC009750.7|AC009750 Drosophila melanogaster, chromosome 2L, r...    46  0.017
gb|AE003580.2|AE003580 Drosophila melanogaster genomic scaffold ...    46  0.017
ref|NC_001905.1| Leishmania major chromosome 1, complete sequence      40  1.0
gb|AE001274.1|AE001274 Leishmania major chromosome 1, complete s...    40  1.0
gb|AC008299.5|AC008299 Drosophila melanogaster, chromosome 3R, r...    38  4.1
gb|AC018662.3|AC018662 Human Chromosome 7 clone RP11-339C9, comp...    38  4.1
gb|AE003774.2|AE003774 Drosophila melanogaster genomic scaffold ...    38  4.1
gb|AC008039.1|AC008039 Homo sapiens clone SCb-391H5 from 7q31, c...    38  4.1
gb|AC005315.2|AC005315 Arabidopsis thaliana chromosome II sectio...    38  4.1
emb|AL353748.13|AL353748 Human DNA sequence from clone RP11-317B...    38  4.1

>dbj|AB031069.1|AB031069 Homo sapiens PCCX1 mRNA for protein containing CXXC
domain 1,
           complete cds
          Length = 2487

 Score =  793 bits (400), Expect = 0.0
 Identities = 400/400 (100%)
 Strand = Plus / Plus

Query: 1   agatggcggcgctgaggggtcttgggggctctaggccggccacctactggtttgcagcgg 60
Sbjct: 1   agatggcggcgctgaggggtcttgggggctctaggccggccacctactggtttgcagcgg 60

Query: 61  agacgacgcatggggcctgcgcaataggagtacgctgcctgggaggcgtgactagaagcg 120
Sbjct: 61  agacgacgcatggggcctgcgcaataggagtacgctgcctgggaggcgtgactagaagcg 120

Query: 121 gaagtagttgtgggcgcctttgcaaccgcctgggacgccgccgagtggtctgtgcaggtt 180
Sbjct: 121 gaagtagttgtgggcgcctttgcaaccgcctgggacgccgccgagtggtctgtgcaggtt 180

Query: 181 cgcgggtcgctggcgggggtcgtgagggagtgcgccgggagcggagatatggagggagat 240
Sbjct: 181 cgcgggtcgctggcgggggtcgtgagggagtgcgccgggagcggagatatggagggagat 240

Query: 241 ggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggggagaat 300
Sbjct: 241 ggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggggagaat 300

Query: 301 gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcgggtgtgac 360
Sbjct: 301 gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcgggtgtgac 360

Query: 361 aactgcaatgagtggttccatggggactgcatccggatca 400
Sbjct: 361 aactgcaatgagtggttccatggggactgcatccggatca 400

>ref|NM_014593.1| Homo sapiens CpG binding protein (CGBP), mRNA

 ... (file truncated here)

>dbj|AK010337.1|AK010337 Mus musculus ES cells cDNA, RIKEN full-length
enriched library,
           clone:2410002I16, full insert sequence
          Length = 2538

 Score =  234 bits (118), Expect = 3e-59
 Identities = 166/182 (91%)
 Strand = Plus / Plus

Query: 219 gagcggagatatggagggagatggttcagacccagagcctccagatgccggggaggacag 278
           ||||||||||||||| |||||||| |||||||  || ||||| ||||||||||| |||||
Sbjct: 260 gagcggagatatggaaggagatggctcagacctggaacctccggatgccggggacgacag 319

Query: 279 caagtccgagaatggggagaatgcgcccatctactgcatctgccgcaaaccggacatcaa 338
           |||||| |||||||||||||| || ||||||||||||||||| |||||||||||||||||
Sbjct: 320 caagtctgagaatggggagaacgctcccatctactgcatctgtcgcaaaccggacatcaa 379

Query: 339 ctgcttcatgatcgggtgtgacaactgcaatgagtggttccatggggactgcatccggat 398
            ||||||||||| || |||||||||||||| |||||||||||||| ||||||||||||||
Sbjct: 380 ttgcttcatgattggatgtgacaactgcaacgagtggttccatggagactgcatccggat 439

Query: 399 ca 400
Sbjct: 440 ca 441
 Score = 44.1 bits (22), Expect = 0.066
 Identities = 25/26 (96%)
 Strand = Plus / Plus

Query: 118 gcggaagtagttgtgggcgcctttgc 143
           ||||||||||||| ||||||||||||
Sbjct: 147 gcggaagtagttgcgggcgcctttgc 172

>dbj|AK017941.1|AK017941 Mus musculus adult male thymus cDNA, RIKEN
full-length enriched library, clone:5830420C16, full insert sequence
          Length = 1461

 Score =  210 bits (106), Expect = 5e-52
 Identities = 151/166 (90%)
 Strand = Plus / Plus

Query: 235  ggagatggttcagacccagagcctccagatgccggggaggacagcaagtccgagaatggg 294
            |||||||| |||||||  || ||||| ||||||||||| ||||||||||| |||||||||
Sbjct: 1048 ggagatggctcagacctggaacctccggatgccggggacgacagcaagtctgagaatggg 1107

Query: 295  gagaatgcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgatcggg 354
            ||||| || ||||||||||||||||| ||||||||||||||||| ||||||||||| ||
Sbjct: 1108 gagaacgctcccatctactgcatctgtcgcaaaccggacatcaattgcttcatgattgga 1167

Query: 355  tgtgacaactgcaatgagtggttccatggggactgcatccggatca 400
            |||||||||||||| |||||||||||||| ||||||||||||||||
Sbjct: 1168 tgtgacaactgcaacgagtggttccatggagactgcatccggatca 1213

 Score = 44.1 bits (22), Expect = 0.066
 Identities = 25/26 (96%)
 Strand = Plus / Plus

Query: 118 gcggaagtagttgtgggcgcctttgc 143
           ||||||||||||| ||||||||||||
Sbjct: 235 gcggaagtagttgcgggcgcctttgc 260

>gb|AC009750.7|AC009750 Drosophila melanogaster, chromosome 2L, region 23F-24A,
BAC clone

 ... (file truncated here)

>emb|AL353748.13|AL353748 Human DNA sequence from clone RP11-317B17 on
chromosome 9, complete
             sequence [Homo sapiens]
          Length = 179155

 Score = 38.2 bits (19), Expect = 4.1
 Identities = 22/23 (95%)
 Strand = Plus / Plus

Query: 192   ggcgggggtcgtgagggagtgcg 214
             |||| ||||||||||||||||||
Sbjct: 48258 ggcgtgggtcgtgagggagtgcg 48280

  Database: nt
    Posted date:  May 30, 2001  3:54 AM
  Number of letters in database: -996,408,959
  Number of sequences in database:  868,831

Lambda     K      H
    1.37    0.711     1.31

Lambda     K      H
    1.37    0.711     1.31

Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 436021
Number of Sequences: 868831
Number of extensions: 436021
Number of successful extensions: 7536
Number of sequences better than 10.0: 19
length of query: 400
length of database: 3,298,558,333
effective HSP length: 20
effective length of query: 380
effective length of database: 3,281,181,713
effective search space: 1246849050940
effective search space used: 1246849050940
T: 0
A: 30
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 19 (38.2 bits)

As you can see, the file consists of three parts: some header information at the beginning followed by a summary of the alignments, the alignments, and then some additional summary parameters and statistics at the end.

