[BioPython] Blast parser missing first record

Bzy Bee nomy2020 at yahoo.com
Fri Jul 16 21:34:00 EDT 2004


Hi everyone

My program that uses NCBI standalone blast parser
works just fine, well almost!

The only problem is that it misses first record (i.e.
all the details and hits of first query sequence) and
starts from the second query. Has anyone come across
this one before? Below is a sample code:

------------------------------------
from Bio.Blast import NCBIStandalone
import sys,os,string


in_file = 'myblast.txt'

def testp(blast_res):
    bf = open(blast_res)
    parser = NCBIStandalone.BlastParser()
    iter = NCBIStandalone.Iterator(bf, parser)
    myrec = parser.parse(bf)


    while 1:
        myrec = iter.next()
        if myrec is None:
            break

        for alignment in myrec.alignments:
            hsps = alignment.hsps
            num_hsp = len(alignment.hsps)
            al_tit = alignment.title[1:40]
            i=0
            hsp_i = alignment.hsps[i]

        
            if num_hsp > 1:
                print myrec.query[0:40],',',
myrec.query_letters, ',', al_tit,',',\
                  num_hsp,",", alignment.length,",",
hsp_i.strand[1]


# Main
testp(in_file)
-----------------------------------------------

I am running Blast 2.2.9 and the output is just a
normal output. Here is the sample output of my
standalone Blast.

================================================
BLASTN 2.2.9 [May-01-2004]


Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= test_seq1
         (1563 letters)

Database: mydb.seq 
           24,211 sequences; 12,519,432 total letters

Searching..................................................done

                                                      
          Score    E
Sequences producing significant alignments:           
          (bits) Value

Sequence1,......................................................
 2557   0.0  

>sequence1: this is just a test sequence
          Length = 701

 Score =  184 bits (93), Expect = 1e-45
 Identities = 171/197 (86%)
 Strand = Plus / Plus

                                                      
                
Query: 346
accatggactctgtgcgctcggggcctttcgggcagatcttcaggccggacaactttgtc
405
           |||||||||||||| ||||| || ||||| ||
||||||||||| || ||||||||||| 
Sbjct: 350
accatggactctgttcgctcaggtccttttggtcagatcttcagaccagacaactttgtt
409

                                                      
                
Query: 406
ttcggtcagagcggtgccgggaacaactgggccaagggacactacacggaaggggcggag
465
           || ||||||  ||| || || |||||||||||||||||
||||| || || || || |||
Sbjct: 410
tttggtcagtccggggcaggcaacaactgggccaagggccactatacagagggagccgag
469

                                                      
                
Query: 466
ctcgtcgactcggtgctggatgtcgtgaggaaggaggccgagagctgtgactgcctgcag
525
           || || |||||||| |||||||| || 
||||||||||||||||||||||||||||||||
Sbjct: 470
ctggttgactcggtcctggatgtggttcggaaggaggccgagagctgtgactgcctgcag
529

                            
Query: 526 ggcttccagctgaccca 542
           |||||||||||||||||
Sbjct: 530 ggcttccagctgaccca 546



 Score =  153 bits (77), Expect = 4e-36
 Identities = 146/169 (86%)
 Strand = Plus / Plus

                                                      
                
Query: 57 
atgagggagatcgtgcacctgcaggccggccagtgcggcaaccagatcggcgccaagttt
116
           |||||||| ||||||||| | |||||||| ||||| |||||
|||||||| |||||||| 
Sbjct: 137
atgagggaaatcgtgcacatccaggccggtcagtgtggcaatcagatcggtgccaagttc
196

                                                      
                
Query: 117
tgggaggtgatcagcgacgagcatggcatcgaccccaccggcacctaccacggggacagc
176
           |||||||| ||||| || ||
|||||||||||||||||||||||||| || |||||||||
Sbjct: 197
tgggaggtaatcagtgatgaacatggcatcgaccccaccggcacctatcatggggacagc
256

                                                      
     
Query: 177
gacctgcagctggagcgaatcaacgtgtactacaacgaggccaccggtg 225
           ||||| || ||||| || |||  |||||||||||| ||
||||| ||||
Sbjct: 257
gacctacaactggaccgcatctccgtgtactacaatgaagccacaggtg 305


  Database: mydb.seq
    Posted date:  Sep 21, 1998  11:19 AM
  Number of letters in database: 12,519,432
  Number of sequences in database:  24,211
  
Lambda     K      H
    1.37    0.711     1.31 

Gapped
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than  0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]


Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= seq2
         (1352 letters)

Database: mydb.seq 
           24,211 sequences; 12,519,432 total letters

Searching..................................................done

                                                      
          Score    E
Sequences producing significant alignments:           
          (bits) Value

Sequence2,......................................................
  737   0.0  

>sequence2 this is just a test sequence
          Length = 1523

 Score =  737 bits (372), Expect = 0.0
 Identities = 502/544 (92%), Gaps = 1/544 (0%)
 Strand = Plus / Plus

                                                      
                 
Query: 540 
taagtcatcagacagttcatactggagagaaaccttacaaatgtgatgagtgtggcgagg
599
            |||||||| ||| | |||||||||||||||
||||||||||||||||||||||||| |||
Sbjct: 440 
taagtcataagagatttcatactggagagagaccttacaaatgtgatgagtgtggcaagg
499

                                                      
                 
Query: 600 
cctttcgtctaaagtcaacccttttaagtcatcagacggttcatactggtgataaacctt
659
            |||||||| ||||||||| ||||||||||||||||||
||||||||||| || |||||||
Sbjct: 500 
cctttcgtgtaaagtcaatccttttaagtcatcagacagttcatactggagagaaacctt
559

                                                      
                 
Query: 660 
acaagtgtgatgagtgtggaaaagtctttggtcgaaaaccacatcttcaacttcactgga
719
            |||| || |||||||||||||||||||| ||||
|||||||||||||| |||||||||||
Sbjct: 560 
acaaatgcgatgagtgtggaaaagtcttcggtcaaaaaccacatcttcgacttcactgga
619

                                                      
                 
Query: 720 
gaattcataatggagagagacctttcagatgtaatgagtgtggcaagttcttcagtcaaa
779
            ||||||||| |||||||||||||||||
||||||||||||||||||||||||||||| ||
Sbjct: 620 
gaattcatactggagagagacctttcaaatgtaatgagtgtggcaagttcttcagtcgaa
679

                                                      
                 
Query: 780 
attcacaccttaaaaaacattggagaatacatatagagaaacctttcaaatgtttcaagt
839
            |||||||||||| ||  |||
|||||||||||||||||||||||||||| |||||| |||
Sbjct: 680 
attcacaccttacaagtcatcggagaatacatatagagaaacctttcaagtgtttcgagt
739

                                                      
                 
Query: 840 
gtggaaaatcctttattcaggtctcagcactcactaaacatcagaaaatccatacatgag
899
            |||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 740 
gtggaaaatcctttactcaggtctcagcactcactaaacatcagaaaatccatacatgag
799

                                                      
                 
Query: 900 
agaaactcatgtgaatgtcatatatggtagaactctt-gaaatttaaagttcaaaattca
958
            ||||||||||||||||||   ||||||||||| |||| 
||||| |||||||||||||||
Sbjct: 800 
agaaactcatgtgaatgtggaatatggtagaagtcttcaaaattcaaagttcaaaattca
859

                                                      
                 
Query: 959 
caccctacagttgctcagagaattcatcttactgaaaaaccatacaaatatcatgagtgt
1018
            |||||||| |||| ||||||||||||   ||||||
||||||||||||||||| ||||||
Sbjct: 860 
caccctactgttggtcagagaattcacactactgagaaaccatacaaatatcacgagtgt
919

                                                      
                 
Query: 1019
ggcaacatcttaggcttcatgagaaaattcatactggatagaagggagatataaatgtat
1078
           
||||||||||||||||||||||||||||||||||||||||||||  |||||||
||||||
Sbjct: 920 
ggcaacatcttaggcttcatgagaaaattcatactggatagaagccagatatatatgtat
979

                
Query: 1079 ttat 1082
            ||||
Sbjct: 980  ttat 983



 Score =  327 bits (165), Expect = 9e-89
 Identities = 180/185 (97%)
 Strand = Plus / Plus

                                                      
                 
Query: 1133
ctcacatctcaccagtagtatcagagtatttatcctggagaaaacacacagcagtgtaat
1192
           
|||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||
Sbjct: 1076
ctcacatctcaccagtagtatcagagtatttatcctggagaaaacacacagcaatgtaat
1135

                                                      
                 
Query: 1193
gtgtgtggcaaggatcttacccaaaagtcacaagataggaatatagtggagccatacttc
1252
            |||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||
Sbjct: 1136
gtgtgtggcaaggatcttacccaaaagtcacaagacaggaatatagtggagccatacttc
1195

                                                      
                 
Query: 1253
ccagactcagtggtggtggcaaatcctttacctttttcactctatggattctctgatgac
1312
            |||||||||||||| ||||||||||||||||||||||||||
|||| |||||||||||||
Sbjct: 1196
ccagactcagtggttgtggcaaatcctttacctttttcactatatgtattctctgatgac
1255

                 
Query: 1313 tttag 1317
            |||||
Sbjct: 1256 tttag 1260


  Database: mydb.seq
    Posted date:  Sep 21, 1998  11:19 AM
  Number of letters in database: 12,123,335
  Number of sequences in database:  24,211
  
Lambda     K      H
    1.37    0.711     1.31 

Gapped
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than  0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]


Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= seq3
         (768 letters)

Database: mydb.seq 
           24,211 sequences; 12,519,432 total letters

Searching..................................................done

                                                      
          Score    E
Sequences producing significant alignments:           
          (bits) Value

Sequence3,......................................................
 305   2e-82  

>Sequence3 this is just a test sequence
          Length = 1523

 Score =  305 bits (154), Expect = 2e-82
 Identities = 199/214 (92%)
 Strand = Plus / Plus

                                                      
                
Query: 304
ttcatgctggagagaaacgttacaaatgtgatgagtgtggtgaggcctttcctgaaaagt
363
           ||||| ||||||||| || ||||||||||||||||||||| 
||||||||| || |||||
Sbjct: 455
ttcatactggagagagaccttacaaatgtgatgagtgtggcaaggcctttcgtgtaaagt
514

                                                      
                
Query: 364
caacccttttaaggcatcagacagttcatactggtgagaaagcttacaaatgtgatgagt
423
           ||| ||||||||| |||||||||||||||||||| ||||||
|||||||||| |||||||
Sbjct: 515
caatccttttaagtcatcagacagttcatactggagagaaaccttacaaatgcgatgagt
574

                                                      
                
Query: 424
gtggataagtcttcggtcaaaaaccacatcttcaacttcactggagaattcatactggag
483
           ||||| |||||||||||||||||||||||||||
||||||||||||||||||||||||||
Sbjct: 575
gtggaaaagtcttcggtcaaaaaccacatcttcgacttcactggagaattcatactggag
634

                                             
Query: 484 agagacctttcagatgtaatgagtgtggcaagtt 517
           |||||||||||| |||||||||||||||||||||
Sbjct: 635 agagacctttcaaatgtaatgagtgtggcaagtt 668



 Score =  232 bits (117), Expect = 2e-60
 Identities = 139/145 (95%), Gaps = 1/145 (0%)
 Strand = Plus / Plus

                                                      
                
Query: 519
ggagaatacatatagagaaacctttcaa-tgtttcgagtgtggaaaatccttttctcagg
577
           ||||||||||||||||||||||||||||
|||||||||||||||||||||||| ||||||
Sbjct: 701
ggagaatacatatagagaaacctttcaagtgtttcgagtgtggaaaatcctttactcagg
760

                                                      
                
Query: 578
tctcagcactcactaaacatcaaaaaatccatacgtgagagaaactcaggtgaatgtggt
637
           |||||||||||||||||||||| |||||||||||
||||||||||||| |||||||||| 
Sbjct: 761
tctcagcactcactaaacatcagaaaatccatacatgagagaaactcatgtgaatgtgga
820

                                    
Query: 638 atatggtagaagtcttcaaaattca 662
           |||||||||||||||||||||||||
Sbjct: 821 atatggtagaagtcttcaaaattca 845


  Database: mydb.seq
    Posted date:  Sep 21, 1998  11:19 AM
  Number of letters in database: 12,519,432
  Number of sequences in database:  24,211
  
Lambda     K      H
    1.37    0.711     1.31 

Gapped
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than  0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]


Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= seq4
         (1100 letters)

Database: mydb.seq 
           24,211 sequences; 12,519,432 total letters

Searching..................................................done

                                                      
          Score    E
Sequences producing significant alignments:           
          (bits) Value

sequence4,.....................................................
  1237   0.0  

>seqeunce4 this is just a test sequence
          Length = 782

 Score = 1237 bits (624), Expect = 0.0
 Identities = 624/624 (100%)
 Strand = Plus / Plus

                                                      
                
Query: 357
ctgatctgcagaatgatgaagtagcatttagaaaattcaagctaattactgaggatgttc
416
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 159
ctgatctgcagaatgatgaagtagcatttagaaaattcaagctaattactgaggatgttc
218

                                                      
                
Query: 417
agggcaaaaactgcctgactaactttcatggtatggatcttacccgtgacaaaatgtgct
476
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 219
agggcaaaaactgcctgactaactttcatggtatggatcttacccgtgacaaaatgtgct
278

                                                      
                
Query: 477
ccatggtcaaaaaatggcagaccatgattgaagctcacgtagacgtcaagactaccgatg
536
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 279
ccatggtcaaaaaatggcagaccatgattgaagctcacgtagacgtcaagactaccgatg
338

                                                      
                
Query: 537
gttacttgcttcgtctgttctgtgtgggttttactaaaaagcgcaacaatcagattcgga
596
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 339
gttacttgcttcgtctgttctgtgtgggttttactaaaaagcgcaacaatcagattcgga
398

                                                      
                
Query: 597
agacctcttacgcccagcaccagcaggtgcgccagatccgcaagaagatgatggagatca
656
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 399
agacctcttacgcccagcaccagcaggtgcgccagatccgcaagaagatgatggagatca
458

                                                      
                
Query: 657
tgacccgagaggtgcagaccaacgacctgaaagaggtggtcaataaactgattccagata
716
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 459
tgacccgagaggtgcagaccaacgacctgaaagaggtggtcaataaactgattccagata
518

                                                      
                
Query: 717
gcattggaaaagacatagaaaaggcttgccaatctatttatccactccatgatgtcttcg
776
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 519
gcattggaaaagacatagaaaaggcttgccaatctatttatccactccatgatgtcttcg
578

                                                      
                
Query: 777
ttagaaaagtaaaaatgctgaagaagcccaaatttgaattgggaaaactcatggagctac
836
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 579
ttagaaaagtaaaaatgctgaagaagcccaaatttgaattgggaaaactcatggagctac
638

                                                      
                
Query: 837
atggtgaaggtagtagttctggaaaagctactggggatgagacaggtgctaaagttgaac
896
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 639
atggtgaaggtagtagttctggaaaagctactggggatgagacaggtgctaaagttgaac
698

                                                      
                
Query: 897
gagctgatggatacgagccaccagtccaagaatcggtttaaaatgcagactcttaatggt
956
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 699
gagctgatggatacgagccaccagtccaagaatcggtttaaaatgcagactcttaatggt
758

                                   
Query: 957 gacaaataaaagatcttatttgtg 980
           ||||||||||||||||||||||||
Sbjct: 759 gacaaataaaagatcttatttgtg 782



 Score =  192 bits (97), Expect = 3e-48
 Identities = 97/97 (100%)
 Strand = Plus / Plus

                                                      
                
Query: 137
ggcaccatggcggtcggcaagaacaagcgccttacgaaaggaggcaaaaagggagccaag
196
          
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1  
ggcaccatggcggtcggcaagaacaagcgccttacgaaaggaggcaaaaagggagccaag
60

                                                
Query: 197 aagaaagtggttgacccattttctaaaaaagattggt 233
           |||||||||||||||||||||||||||||||||||||
Sbjct: 61  aagaaagtggttgacccattttctaaaaaagattggt 97


  Database: mydb.seq
    Posted date:  Sep 21, 1998  11:19 AM
  Number of letters in database: 12,519,432
  Number of sequences in database:  24,211
  
Lambda     K      H
    1.37    0.711     1.31 

Gapped
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than  0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
================================================

Any help would be great.

regards

JA


		
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/



More information about the BioPython mailing list