[Biopython-dev] BUG: blastparser: expect(2)

thomas at cbs.dtu.dk thomas at cbs.dtu.dk
Fri Aug 11 07:47:38 EDT 2000


Hi,

The blastparser fails while reading a blastall result with the "-g = F" option.
(-g  Perfom gapped alignment (not available with tblastx) [T/F] default = T)

Expect(2) means that there are 2 alignments for the same Sbjct:

c ya
-thomas
example code
##############################################
from Bio.Blast import NCBIStandalone
from Bio.Data import IUPACData

file = 'test.blastn'
parser = NCBIStandalone.BlastParser()
iter = NCBIStandalone.Iterator(handle = open(file), parser = parser)

while 1:
    rec = iter.next()
    if not rec: break
#############

results in:
##############################################
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py", line 587, in _parse
    dh.score = _safe_int(dh.score)
  File "/home/genome6/thomas/cbs/python/biopython/Bio/Blast/NCBIStandalone.py", line 1469, in _safe_int
    return long(str)
ValueError: invalid literal for long(): 5e-45
#########

the blast file:
##############################################
BLASTN 2.0.14 [Jun-29-2000]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= HUMAGCGB
         (100 letters)

Database: ./ensembl.cdna
           37,720 sequences; 24,543,038 total letters

Searching..................................................done

                                                               Score     E
Sequences producing significant alignments:                    (bits)  Value  N

ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Cont...   153  5e-45  2
ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Cont...    28     13  1

>ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Contig:AC012263.00001
          Length = 2673

 Score = 46.1 bits (23), Expect(2) = 5e-45
 Identities = 23/23 (100%)
 Strand = Plus / Plus

                                   
Query: 1    atggagaccgtggtttgcccaag 23
            |||||||||||||||||||||||
Sbjct: 1742 atggagaccgtggtttgcccaag 1764


 Score =  153 bits (77), Expect(2) = 5e-45
 Identities = 77/77 (100%)
 Strand = Plus / Plus

                                                                        
Query: 24   gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 83
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1764 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 1823

                             
Query: 84   ttcaccatatgaggaac 100
            |||||||||||||||||
Sbjct: 1824 ttcaccatatgaggaac 1840


>ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Contig:AC007637.00001
          Length = 1530

 Score = 28.2 bits (14), Expect =    13
 Identities = 14/14 (100%)
 Strand = Plus / Plus

                        
Query: 26 cctgggaagagagg 39
          ||||||||||||||
Sbjct: 57 cctgggaagagagg 70


  Database: ./ensembl.cdna
    Posted date:  Aug 3, 2000  1:07 PM
  Number of letters in database: 24,543,038
  Number of sequences in database:  37,720
  
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Number of Hits to DB: 3
Number of Sequences: 37720
Number of extensions: 3
Number of successful extensions: 3
Number of sequences better than 10.0: 2
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
BLASTN 2.0.14 [Jun-29-2000]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= HUMAGCGB
         (100 letters)

Database: ./ensembl.cdna
           37,720 sequences; 24,543,038 total letters

Searching..................................................done

                                                               Score     E
Sequences producing significant alignments:                    (bits)  Value  N

ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Cont...   153  5e-45  2
ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Cont...    28     13  1

>ENST00000022209 Gene:ENSG00000020685 Clone:AC012263 Contig:AC012263.00001
          Length = 2673

 Score = 46.1 bits (23), Expect(2) = 5e-45
 Identities = 23/23 (100%)
 Strand = Plus / Plus

                                   
Query: 1    atggagaccgtggtttgcccaag 23
            |||||||||||||||||||||||
Sbjct: 1742 atggagaccgtggtttgcccaag 1764


 Score =  153 bits (77), Expect(2) = 5e-45
 Identities = 77/77 (100%)
 Strand = Plus / Plus

                                                                        
Query: 24   gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 83
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1764 gccctgggaagagaggcggaaacggagaagcctttccagtgaccgtgggaggacaaccca 1823

                             
Query: 84   ttcaccatatgaggaac 100
            |||||||||||||||||
Sbjct: 1824 ttcaccatatgaggaac 1840


>ENST00000008890 Gene:ENSG00000008430 Clone:AC007637 Contig:AC007637.00001
          Length = 1530

 Score = 28.2 bits (14), Expect =    13
 Identities = 14/14 (100%)
 Strand = Plus / Plus

                        
Query: 26 cctgggaagagagg 39
          ||||||||||||||
Sbjct: 57 cctgggaagagagg 70


  Database: ./ensembl.cdna
    Posted date:  Aug 3, 2000  1:07 PM
  Number of letters in database: 24,543,038
  Number of sequences in database:  37,720
  
Lambda     K      H
    1.37    0.711     1.31 


Matrix: blastn matrix:1 -3
Number of Hits to DB: 3
Number of Sequences: 37720
Number of extensions: 3
Number of successful extensions: 3
Number of sequences better than 10.0: 2
length of query: 100
length of database: 24,543,038
effective HSP length: 16
effective length of query: 84
effective length of database: 23,939,518
effective search space: 2010919512
effective search space used: 2010919512
T: 0
A: 0
X1: 6 (11.9 bits)
X2: 10 (19.8 bits)
S1: 12 (24.3 bits)
S2: 14 (28.2 bits)
########


-- 
Sicheritz Ponten Thomas E.  CBS, Department of Biotechnology
blippblopp at linux.nu         The Technical University of Denmark
CBS:  +45 45 252485         Building 208, DK-2800 Lyngby
Fax   +45 45 931585         http://www.cbs.dtu.dk/thomas/index.html

	De Chelonian Mobile ... The Turtle Moves ...




More information about the Biopython-dev mailing list