[BioPython] Problem with blastx output parsing =~

Italo Maia italo.maia at gmail.com
Mon Jun 4 17:22:15 UTC 2007


Well, i have 24 thousand of those, i think it would be very painfull to
remake them...i'll fill the the bug, but, could there be a workaround? The
file goes below:

<<<begin>>>

BLASTX 2.2.15 [Oct-15-2006]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= 26
         (858 letters)

Database: Leigo
           4,535,438 sequences; 1,573,298,872 total letters

Searching..................................................done



                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

gi|15778340|gb|AAL07392.1|AF411412_4 polymerase [Hepatitis B virus]    39
0.33
gi|12060441|dbj|BAB20611.1| DNA polymerase [Hepatitis B virus]         38
0.57
gi|84095095|dbj|BAE66661.1| P protein [Hepatitis B virus]              38
0.57
gi|57021117|ref|NP_647604.2| Polymerase [Hepatitis B virus]            38
0.75

>gi|15778340|gb|AAL07392.1|AF411412_4 polymerase [Hepatitis B virus]
          Length = 843

 Score = 38.9 bits (89), Expect = 0.33
 Identities = 24/89 (26%), Positives = 42/89 (47%), Gaps = 1/89 (1%)
 Frame = +1

Query: 562 VSPLLGAMTRGKRRKPGRIWSISHPLPITNLWQHPDGAWHANNRPTSVLAAAN*KE-RKF 738
           + P  G++ RGK  + G IW+  HP    +    P G+ H +N  +S  +  +    RK
Sbjct: 225 LQPQQGSLARGKSGRSGSIWARVHPTTRQSFGVEPSGSRHIDNSASSTTSCLHQSAVRKT 284

Query: 739 FFYKQTSCKAANNTGRATPDAQWTPSTHR 825
            +   ++ K  +++GRA       PS+ R
Sbjct: 285 AYSHLSTSKRQSSSGRAVELHNIPPSSVR 313


>gi|12060441|dbj|BAB20611.1| DNA polymerase [Hepatitis B virus]
          Length = 843

 Score = 38.1 bits (87), Expect = 0.57
 Identities = 23/90 (25%), Positives = 42/90 (46%), Gaps = 1/90 (1%)
 Frame = +1

Query: 562 VSPLLGAMTRGKRRKPGRIWSISHPLPITNLWQHPDGAWHANNRPTSVLAAAN*KE-RKF 738
           + P  G++ RGK  + G IWS  HP    +    P G+ H +N  +S  +  +    RK
Sbjct: 225 LQPQQGSLARGKSGRSGSIWSRVHPTTRRSFGVEPSGSGHIDNSASSTSSCLHQSAVRKT 284

Query: 739 FFYKQTSCKAANNTGRATPDAQWTPSTHRA 828
            +   ++ K  +++G A       P++ R+
Sbjct: 285 AYSHLSTSKRQSSSGHAVEFHNIPPNSARS 314


>gi|84095095|dbj|BAE66661.1| P protein [Hepatitis B virus]
          Length = 843

 Score = 38.1 bits (87), Expect = 0.57
 Identities = 23/90 (25%), Positives = 42/90 (46%), Gaps = 1/90 (1%)
 Frame = +1

Query: 562 VSPLLGAMTRGKRRKPGRIWSISHPLPITNLWQHPDGAWHANNRPTSVLAAAN*KE-RKF 738
           + P  G++ RGK  + G IW+  HP    +    P G+ H +N  +S  +  +    RK
Sbjct: 225 LQPQQGSLARGKSGRSGSIWARVHPTSRRSFGVEPSGSGHIDNSASSASSCLHQSAVRKT 284

Query: 739 FFYKQTSCKAANNTGRATPDAQWTPSTHRA 828
            +   ++ K  +++G A       PS+ R+
Sbjct: 285 AYSHLSTSKRQSSSGHAVELLNIPPSSARS 314


>gi|57021117|ref|NP_647604.2| Polymerase [Hepatitis B virus]
          Length = 843

 Score = 37.7 bits (86), Expect = 0.75
 Identities = 24/90 (26%), Positives = 41/90 (45%), Gaps = 1/90 (1%)
 Frame = +1

Query: 562 VSPLLGAMTRGKRRKPGRIWSISHPLPITNLWQHPDGAWHANNRPTSVLAAAN*KE-RKF 738
           + P  G++ RGK  + G IWS  HP         P G+ H +N  +S  +  +    RK
Sbjct: 225 LQPQQGSLARGKSGRSGSIWSRVHPTTRRPFGVEPSGSGHIDNTASSTSSCLHQSAVRKT 284

Query: 739 FFYKQTSCKAANNTGRATPDAQWTPSTHRA 828
            +   ++ K  +++G A       PS+ R+
Sbjct: 285 AYSHLSTSKRQSSSGHAVELHNIPPSSARS 314


  Database: Leigo
    Posted date:  Jan 22, 2007 11:26 AM
  Number of letters in database: 1,573,298,872
  Number of sequences in database:  4,535,438

Lambda     K      H
   0.318    0.134    0.401

Gapped
Lambda     K      H
   0.267   0.0410    0.140


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 4535438
Number of Hits to DB: 2,724,816,234
Number of extensions: 65999927
Number of successful extensions: 158184
Number of sequences better than  2.0: 4
Number of HSP's gapped: 158133
Number of HSP's successfully gapped: 4
Length of query: 286
Length of database: 1,573,298,872
Length adjustment: 130
Effective length of query: 156
Effective length of database: 983,691,932
Effective search space: 153455941392
Effective search space used: 153455941392
Neighboring words threshold: 12
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 32 (16.9 bits)

<<<end>>>




2007/6/4, Peter <biopython at maubp.freeserve.co.uk>:
>
> Italo Maia wrote:
> > Well, i have a perfectly fine blastx output that throws an error when
> parsed
> > by biopython.
> > It gives me this output:
> >
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File "/var/lib/python-support/python2.5/Bio/Blast/NCBIStandalone.py",
> line
> > 624, in parse
> >    self._scanner.feed(handle, self._consumer)
> >  File "/var/lib/python-support/python2.5/Bio/Blast/NCBIStandalone.py",
> line
> > 99, in feed
> >    self._scan_parameters(uhandle, consumer)
> >  File "/var/lib/python-support/python2.5/Bio/Blast/NCBIStandalone.py",
> line
> > 570, in _scan_parameters
> >    has_re=re.compile(r"[Ll]ength of \s*[Dd]atabase"))
> >  File "/var/lib/python-support/python2.5/Bio/ParserSupport.py", line
> 300,
> > in read_and_call
> >    raise SyntaxError, errmsg
> > SyntaxError: Line does not match regex '[Ll]ength of \s*[Dd]atabase':
> > Number of HSP's gapped: 136690
> >
> > What could i do??? I'm using ubuntu feisty here.
>
> It looks like you are using the plain text output from blast, so we
> would recommend you try the XML output instead.
>
> See section 3.4 of the tutorial:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
>
> If you really want to use the plain text output, please file a bug
> (including Biopython version number) and then attach the plain text
> blast output which fails. But no promises - its an uphill battle to keep
> the parser up to date with each version of Blast!
>
> Peter
>
>


-- 
"A arrogância é a arma dos fracos."

===========================
Italo Moreira Campelo Maia
Ciência da Computação - UECE
Desenvolvedor WEB
Programador Java, Python

Meu blog ^^ http://eusouolobomal.blogspot.com/

===========================




More information about the Biopython mailing list