[Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy

Lee Katz lskatz at gmail.com
Tue Mar 13 18:06:48 EDT 2012


Hi, I am separating a blast output file into individual results, so that I
can multithread the reading of the results.  I cannot pass a result object
through Perl threads because it contains code, which is not sharable via
threads::share (sharing is used internally in Thread::Queue)--therefore I
must pass a sub-file.  My strategy is to read the whole file into
Bio::SearchIO and then write the result objects to a file, so that a thread
can read the file.  The thread would thus read one file at a time
containing one query and all its results.

Reading the original file works, but then outputting the blast file is
buggy.  The last line of the HSP is empty and has bad coordinates.  I have
an example, with an error when trying to read it again with SearchIO, and
its fasta file below.

Any help debugging?  Maybe I just need to update BioPerl since I installed
it around several months ago, maybe a year ago?   Thanks.


MSG: In sequence lcl|R009125 residue count gives end value 341.
Overriding value [340] with value 341 for Bio::LocatableSeq::end().
ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1
---------------------------------------------------



>lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein
[Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
         Length = 342

 Score = 79.3 bits (194), Expect = 2e-15
 Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%)


Query: 4   GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59
            +KTE+ + +KL  A K+GQ  + K+ ++ ++++V     I +F     SLS  ++
Sbjct: 2   ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53

Query: 60  VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111
             +R+   +  +       I +  Y     FA +I+F + I   +  C+L
Sbjct: 54  --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100

Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169
            F L+  A K  FS +NP+ G+K+IFS +T+ EF K++  + ++    Y+    +  +I
Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160

Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229
           S V +S         +   +++   +  +IL  ++D   + + +   M M KQE+K+E+
Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220

Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289
           EQEG  E KSR R++ ++         +  + +V+MNPTH A+ + ++   A APF+
Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280

Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349
             N+ A  +R  A +  +  +   ++ R +Y T      +  +    V +++ +++Q+++
Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340

Query: 350  349

Sbjct: 341  340

And the whole fasta entry is:
>lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein
[Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI
IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF
SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL
SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV
DFEHLDEVLRLIVWLEQVENTH



-- 
Lee Katz, Ph.D.



More information about the Bioperl-l mailing list