[BioPython] Blast parser missing first record
Bzy Bee
nomy2020 at yahoo.com
Fri Jul 16 21:34:00 EDT 2004
Hi everyone
My program that uses NCBI standalone blast parser
works just fine, well almost!
The only problem is that it misses first record (i.e.
all the details and hits of first query sequence) and
starts from the second query. Has anyone come across
this one before? Below is a sample code:
------------------------------------
from Bio.Blast import NCBIStandalone
import sys,os,string
in_file = 'myblast.txt'
def testp(blast_res):
bf = open(blast_res)
parser = NCBIStandalone.BlastParser()
iter = NCBIStandalone.Iterator(bf, parser)
myrec = parser.parse(bf)
while 1:
myrec = iter.next()
if myrec is None:
break
for alignment in myrec.alignments:
hsps = alignment.hsps
num_hsp = len(alignment.hsps)
al_tit = alignment.title[1:40]
i=0
hsp_i = alignment.hsps[i]
if num_hsp > 1:
print myrec.query[0:40],',',
myrec.query_letters, ',', al_tit,',',\
num_hsp,",", alignment.length,",",
hsp_i.strand[1]
# Main
testp(in_file)
-----------------------------------------------
I am running Blast 2.2.9 and the output is just a
normal output. Here is the sample output of my
standalone Blast.
================================================
BLASTN 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= test_seq1
(1563 letters)
Database: mydb.seq
24,211 sequences; 12,519,432 total letters
Searching..................................................done
Score E
Sequences producing significant alignments:
(bits) Value
Sequence1,......................................................
2557 0.0
>sequence1: this is just a test sequence
Length = 701
Score = 184 bits (93), Expect = 1e-45
Identities = 171/197 (86%)
Strand = Plus / Plus
Query: 346
accatggactctgtgcgctcggggcctttcgggcagatcttcaggccggacaactttgtc
405
|||||||||||||| ||||| || ||||| ||
||||||||||| || |||||||||||
Sbjct: 350
accatggactctgttcgctcaggtccttttggtcagatcttcagaccagacaactttgtt
409
Query: 406
ttcggtcagagcggtgccgggaacaactgggccaagggacactacacggaaggggcggag
465
|| |||||| ||| || || |||||||||||||||||
||||| || || || || |||
Sbjct: 410
tttggtcagtccggggcaggcaacaactgggccaagggccactatacagagggagccgag
469
Query: 466
ctcgtcgactcggtgctggatgtcgtgaggaaggaggccgagagctgtgactgcctgcag
525
|| || |||||||| |||||||| ||
||||||||||||||||||||||||||||||||
Sbjct: 470
ctggttgactcggtcctggatgtggttcggaaggaggccgagagctgtgactgcctgcag
529
Query: 526 ggcttccagctgaccca 542
|||||||||||||||||
Sbjct: 530 ggcttccagctgaccca 546
Score = 153 bits (77), Expect = 4e-36
Identities = 146/169 (86%)
Strand = Plus / Plus
Query: 57
atgagggagatcgtgcacctgcaggccggccagtgcggcaaccagatcggcgccaagttt
116
|||||||| ||||||||| | |||||||| ||||| |||||
|||||||| ||||||||
Sbjct: 137
atgagggaaatcgtgcacatccaggccggtcagtgtggcaatcagatcggtgccaagttc
196
Query: 117
tgggaggtgatcagcgacgagcatggcatcgaccccaccggcacctaccacggggacagc
176
|||||||| ||||| || ||
|||||||||||||||||||||||||| || |||||||||
Sbjct: 197
tgggaggtaatcagtgatgaacatggcatcgaccccaccggcacctatcatggggacagc
256
Query: 177
gacctgcagctggagcgaatcaacgtgtactacaacgaggccaccggtg 225
||||| || ||||| || ||| |||||||||||| ||
||||| ||||
Sbjct: 257
gacctacaactggaccgcatctccgtgtactacaatgaagccacaggtg 305
Database: mydb.seq
Posted date: Sep 21, 1998 11:19 AM
Number of letters in database: 12,519,432
Number of sequences in database: 24,211
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than 0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= seq2
(1352 letters)
Database: mydb.seq
24,211 sequences; 12,519,432 total letters
Searching..................................................done
Score E
Sequences producing significant alignments:
(bits) Value
Sequence2,......................................................
737 0.0
>sequence2 this is just a test sequence
Length = 1523
Score = 737 bits (372), Expect = 0.0
Identities = 502/544 (92%), Gaps = 1/544 (0%)
Strand = Plus / Plus
Query: 540
taagtcatcagacagttcatactggagagaaaccttacaaatgtgatgagtgtggcgagg
599
|||||||| ||| | |||||||||||||||
||||||||||||||||||||||||| |||
Sbjct: 440
taagtcataagagatttcatactggagagagaccttacaaatgtgatgagtgtggcaagg
499
Query: 600
cctttcgtctaaagtcaacccttttaagtcatcagacggttcatactggtgataaacctt
659
|||||||| ||||||||| ||||||||||||||||||
||||||||||| || |||||||
Sbjct: 500
cctttcgtgtaaagtcaatccttttaagtcatcagacagttcatactggagagaaacctt
559
Query: 660
acaagtgtgatgagtgtggaaaagtctttggtcgaaaaccacatcttcaacttcactgga
719
|||| || |||||||||||||||||||| ||||
|||||||||||||| |||||||||||
Sbjct: 560
acaaatgcgatgagtgtggaaaagtcttcggtcaaaaaccacatcttcgacttcactgga
619
Query: 720
gaattcataatggagagagacctttcagatgtaatgagtgtggcaagttcttcagtcaaa
779
||||||||| |||||||||||||||||
||||||||||||||||||||||||||||| ||
Sbjct: 620
gaattcatactggagagagacctttcaaatgtaatgagtgtggcaagttcttcagtcgaa
679
Query: 780
attcacaccttaaaaaacattggagaatacatatagagaaacctttcaaatgtttcaagt
839
|||||||||||| || |||
|||||||||||||||||||||||||||| |||||| |||
Sbjct: 680
attcacaccttacaagtcatcggagaatacatatagagaaacctttcaagtgtttcgagt
739
Query: 840
gtggaaaatcctttattcaggtctcagcactcactaaacatcagaaaatccatacatgag
899
|||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 740
gtggaaaatcctttactcaggtctcagcactcactaaacatcagaaaatccatacatgag
799
Query: 900
agaaactcatgtgaatgtcatatatggtagaactctt-gaaatttaaagttcaaaattca
958
|||||||||||||||||| ||||||||||| ||||
||||| |||||||||||||||
Sbjct: 800
agaaactcatgtgaatgtggaatatggtagaagtcttcaaaattcaaagttcaaaattca
859
Query: 959
caccctacagttgctcagagaattcatcttactgaaaaaccatacaaatatcatgagtgt
1018
|||||||| |||| |||||||||||| ||||||
||||||||||||||||| ||||||
Sbjct: 860
caccctactgttggtcagagaattcacactactgagaaaccatacaaatatcacgagtgt
919
Query: 1019
ggcaacatcttaggcttcatgagaaaattcatactggatagaagggagatataaatgtat
1078
|||||||||||||||||||||||||||||||||||||||||||| |||||||
||||||
Sbjct: 920
ggcaacatcttaggcttcatgagaaaattcatactggatagaagccagatatatatgtat
979
Query: 1079 ttat 1082
||||
Sbjct: 980 ttat 983
Score = 327 bits (165), Expect = 9e-89
Identities = 180/185 (97%)
Strand = Plus / Plus
Query: 1133
ctcacatctcaccagtagtatcagagtatttatcctggagaaaacacacagcagtgtaat
1192
|||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||
Sbjct: 1076
ctcacatctcaccagtagtatcagagtatttatcctggagaaaacacacagcaatgtaat
1135
Query: 1193
gtgtgtggcaaggatcttacccaaaagtcacaagataggaatatagtggagccatacttc
1252
|||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||
Sbjct: 1136
gtgtgtggcaaggatcttacccaaaagtcacaagacaggaatatagtggagccatacttc
1195
Query: 1253
ccagactcagtggtggtggcaaatcctttacctttttcactctatggattctctgatgac
1312
|||||||||||||| ||||||||||||||||||||||||||
|||| |||||||||||||
Sbjct: 1196
ccagactcagtggttgtggcaaatcctttacctttttcactatatgtattctctgatgac
1255
Query: 1313 tttag 1317
|||||
Sbjct: 1256 tttag 1260
Database: mydb.seq
Posted date: Sep 21, 1998 11:19 AM
Number of letters in database: 12,123,335
Number of sequences in database: 24,211
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than 0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= seq3
(768 letters)
Database: mydb.seq
24,211 sequences; 12,519,432 total letters
Searching..................................................done
Score E
Sequences producing significant alignments:
(bits) Value
Sequence3,......................................................
305 2e-82
>Sequence3 this is just a test sequence
Length = 1523
Score = 305 bits (154), Expect = 2e-82
Identities = 199/214 (92%)
Strand = Plus / Plus
Query: 304
ttcatgctggagagaaacgttacaaatgtgatgagtgtggtgaggcctttcctgaaaagt
363
||||| ||||||||| || |||||||||||||||||||||
||||||||| || |||||
Sbjct: 455
ttcatactggagagagaccttacaaatgtgatgagtgtggcaaggcctttcgtgtaaagt
514
Query: 364
caacccttttaaggcatcagacagttcatactggtgagaaagcttacaaatgtgatgagt
423
||| ||||||||| |||||||||||||||||||| ||||||
|||||||||| |||||||
Sbjct: 515
caatccttttaagtcatcagacagttcatactggagagaaaccttacaaatgcgatgagt
574
Query: 424
gtggataagtcttcggtcaaaaaccacatcttcaacttcactggagaattcatactggag
483
||||| |||||||||||||||||||||||||||
||||||||||||||||||||||||||
Sbjct: 575
gtggaaaagtcttcggtcaaaaaccacatcttcgacttcactggagaattcatactggag
634
Query: 484 agagacctttcagatgtaatgagtgtggcaagtt 517
|||||||||||| |||||||||||||||||||||
Sbjct: 635 agagacctttcaaatgtaatgagtgtggcaagtt 668
Score = 232 bits (117), Expect = 2e-60
Identities = 139/145 (95%), Gaps = 1/145 (0%)
Strand = Plus / Plus
Query: 519
ggagaatacatatagagaaacctttcaa-tgtttcgagtgtggaaaatccttttctcagg
577
||||||||||||||||||||||||||||
|||||||||||||||||||||||| ||||||
Sbjct: 701
ggagaatacatatagagaaacctttcaagtgtttcgagtgtggaaaatcctttactcagg
760
Query: 578
tctcagcactcactaaacatcaaaaaatccatacgtgagagaaactcaggtgaatgtggt
637
|||||||||||||||||||||| |||||||||||
||||||||||||| ||||||||||
Sbjct: 761
tctcagcactcactaaacatcagaaaatccatacatgagagaaactcatgtgaatgtgga
820
Query: 638 atatggtagaagtcttcaaaattca 662
|||||||||||||||||||||||||
Sbjct: 821 atatggtagaagtcttcaaaattca 845
Database: mydb.seq
Posted date: Sep 21, 1998 11:19 AM
Number of letters in database: 12,519,432
Number of sequences in database: 24,211
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than 0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
BLASTN 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of
protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= seq4
(1100 letters)
Database: mydb.seq
24,211 sequences; 12,519,432 total letters
Searching..................................................done
Score E
Sequences producing significant alignments:
(bits) Value
sequence4,.....................................................
1237 0.0
>seqeunce4 this is just a test sequence
Length = 782
Score = 1237 bits (624), Expect = 0.0
Identities = 624/624 (100%)
Strand = Plus / Plus
Query: 357
ctgatctgcagaatgatgaagtagcatttagaaaattcaagctaattactgaggatgttc
416
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 159
ctgatctgcagaatgatgaagtagcatttagaaaattcaagctaattactgaggatgttc
218
Query: 417
agggcaaaaactgcctgactaactttcatggtatggatcttacccgtgacaaaatgtgct
476
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 219
agggcaaaaactgcctgactaactttcatggtatggatcttacccgtgacaaaatgtgct
278
Query: 477
ccatggtcaaaaaatggcagaccatgattgaagctcacgtagacgtcaagactaccgatg
536
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 279
ccatggtcaaaaaatggcagaccatgattgaagctcacgtagacgtcaagactaccgatg
338
Query: 537
gttacttgcttcgtctgttctgtgtgggttttactaaaaagcgcaacaatcagattcgga
596
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 339
gttacttgcttcgtctgttctgtgtgggttttactaaaaagcgcaacaatcagattcgga
398
Query: 597
agacctcttacgcccagcaccagcaggtgcgccagatccgcaagaagatgatggagatca
656
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 399
agacctcttacgcccagcaccagcaggtgcgccagatccgcaagaagatgatggagatca
458
Query: 657
tgacccgagaggtgcagaccaacgacctgaaagaggtggtcaataaactgattccagata
716
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 459
tgacccgagaggtgcagaccaacgacctgaaagaggtggtcaataaactgattccagata
518
Query: 717
gcattggaaaagacatagaaaaggcttgccaatctatttatccactccatgatgtcttcg
776
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 519
gcattggaaaagacatagaaaaggcttgccaatctatttatccactccatgatgtcttcg
578
Query: 777
ttagaaaagtaaaaatgctgaagaagcccaaatttgaattgggaaaactcatggagctac
836
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 579
ttagaaaagtaaaaatgctgaagaagcccaaatttgaattgggaaaactcatggagctac
638
Query: 837
atggtgaaggtagtagttctggaaaagctactggggatgagacaggtgctaaagttgaac
896
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 639
atggtgaaggtagtagttctggaaaagctactggggatgagacaggtgctaaagttgaac
698
Query: 897
gagctgatggatacgagccaccagtccaagaatcggtttaaaatgcagactcttaatggt
956
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 699
gagctgatggatacgagccaccagtccaagaatcggtttaaaatgcagactcttaatggt
758
Query: 957 gacaaataaaagatcttatttgtg 980
||||||||||||||||||||||||
Sbjct: 759 gacaaataaaagatcttatttgtg 782
Score = 192 bits (97), Expect = 3e-48
Identities = 97/97 (100%)
Strand = Plus / Plus
Query: 137
ggcaccatggcggtcggcaagaacaagcgccttacgaaaggaggcaaaaagggagccaag
196
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1
ggcaccatggcggtcggcaagaacaagcgccttacgaaaggaggcaaaaagggagccaag
60
Query: 197 aagaaagtggttgacccattttctaaaaaagattggt 233
|||||||||||||||||||||||||||||||||||||
Sbjct: 61 aagaaagtggttgacccattttctaaaaaagattggt 97
Database: mydb.seq
Posted date: Sep 21, 1998 11:19 AM
Number of letters in database: 12,519,432
Number of sequences in database: 24,211
Lambda K H
1.37 0.711 1.31
Gapped
Lambda K H
1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Hits to DB: 130
Number of Sequences: 24211
Number of extensions: 132
Number of successful extensions: 111
Number of sequences better than 1.0e-20: 13
Number of HSP's better than 0.0 without gapping: 12
Number of HSP's successfully gapped in prelim test: 1
Number of HSP's that attempted gapping in prelim test:
147
Number of HSP's gapped (non-prelim): 59
length of query: 1563
length of database: 12,519,432
effective HSP length: 18
effective length of query: 1545
effective length of database: 11,343,111
effective search space: 211578144
effective search space used: 211578144
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 15 (29.7 bits)
S1: 12 (24.3 bits)
S2: 52 (103.6 bits)
================================================
Any help would be great.
regards
JA
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
More information about the BioPython
mailing list