[Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy
Lee Katz
lskatz at gmail.com
Wed Mar 14 10:35:59 EDT 2012
I just want to clarify: I have an already existing blast output. Is there
a non-buggy way to split it? It is in human-readable text form (-m 0).
On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz <lskatz at gmail.com> wrote:
> Hi, I am separating a blast output file into individual results, so that I
> can multithread the reading of the results. I cannot pass a result object
> through Perl threads because it contains code, which is not sharable via
> threads::share (sharing is used internally in Thread::Queue)--therefore I
> must pass a sub-file. My strategy is to read the whole file into
> Bio::SearchIO and then write the result objects to a file, so that a thread
> can read the file. The thread would thus read one file at a time
> containing one query and all its results.
>
> Reading the original file works, but then outputting the blast file is
> buggy. The last line of the HSP is empty and has bad coordinates. I have
> an example, with an error when trying to read it again with SearchIO, and
> its fasta file below.
>
> Any help debugging? Maybe I just need to update BioPerl since I installed
> it around several months ago, maybe a year ago? Thanks.
>
>
> MSG: In sequence lcl|R009125 residue count gives end value 341.
> Overriding value [340] with value 341 for Bio::LocatableSeq::end().
>
> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1
> ---------------------------------------------------
>
>
>
> >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
> Length = 342
>
> Score = 79.3 bits (194), Expect = 2e-15
> Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%)
>
>
> Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59
> +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++
> Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53
>
> Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111
> +R+ + + I + Y FA +I+F + I + C+L
> Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100
>
> Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169
> F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I
> Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160
>
> Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229
> S V +S + +++ + +IL ++D + + + M M KQE+K+E+
> Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220
>
> Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289
> EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+
> Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280
>
> Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349
> N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++
> Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340
>
> Query: 350 349
>
> Sbjct: 341 340
>
> And the whole fasta entry is:
> >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion
> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501]
>
> MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI
>
> IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF
>
> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL
>
> SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV
> DFEHLDEVLRLIVWLEQVENTH
>
>
>
> --
> Lee Katz, Ph.D.
>
--
Lee Katz, Ph.D.
More information about the Bioperl-l
mailing list