[Bioperl-l] parsing blast report with long description

Dave Messina David.Messina at sbc.su.se
Sat May 15 19:34:58 UTC 2010


Shalabh,

Could you please file a bug report on this at bugzilla.open-bio.org?

Please include a description (pasting this email will do) and most importantly a test script and sample blast output file which reproduces the problem. We will need those in order to be able to diagnose and fix the problem.

Thanks!
Dave



On May 13, 2010, at 5:07 PM, shalabh sharma wrote:

> Hi All,
>        I need some help in parsing blast output.
> I have a inhouse database that contain sequences with really long
> description.
> 
>> SMPL_IDI_1105131728043
> /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open
> Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 -
> 0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
> IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV
> 
> So my blast report looks like this:
> 
> .....
> .....
>> SMPL_IDI_1105131728043
> /GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821
>           6887/Open Ocean/Galapagos Islands/134 miles NE of
>           Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2
>           m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
>          Length = 213
> 
> Score =  124 bits (310), Expect = 5e-27,   Method: Compositional matrix
> adjust.
> Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%)
> .....
> .....
> 
> (note that the tag "TI_1000008216887" is splitting in two lines).
> 
> I am using SeqIO to parse this report. What i am doing is parsing the
> description field again to get all the tags. like
> ....
> ....
>                                          my $desc = $hit->description;
>                                           my @f = split('/',$desc);
>                                           for(my $i = 0;$i < scalar
> @f;$i++){ print OUT "$f[$i]\t";}
> .....
> .....
> 
> 
> *I am getting the perfect parsed report but the field with TI_1000008216887
> has a space **TI_100000821 6887 *.
> 
> I would really appreciate if anyone can help me out.
> 
> Thanks
> Shalabh Sharma
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list