[Bioperl-l] parsing blast report with long description

shalabh sharma shalabh.sharma7 at gmail.com
Thu May 13 15:07:26 UTC 2010


Hi All,
        I need some help in parsing blast output.
I have a inhouse database that contain sequences with really long
description.

>SMPL_IDI_1105131728043
/GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open
Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 -
0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV

So my blast report looks like this:

.....
.....
>SMPL_IDI_1105131728043
/GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821
           6887/Open Ocean/Galapagos Islands/134 miles NE of
           Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2
           m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
          Length = 213

 Score =  124 bits (310), Expect = 5e-27,   Method: Compositional matrix
adjust.
 Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%)
.....
.....

(note that the tag "TI_1000008216887" is splitting in two lines).

I am using SeqIO to parse this report. What i am doing is parsing the
description field again to get all the tags. like
....
....
                                          my $desc = $hit->description;
                                           my @f = split('/',$desc);
                                           for(my $i = 0;$i < scalar
@f;$i++){ print OUT "$f[$i]\t";}
.....
.....


*I am getting the perfect parsed report but the field with TI_1000008216887
has a space **TI_100000821 6887 *.

I would really appreciate if anyone can help me out.

Thanks
Shalabh Sharma



More information about the Bioperl-l mailing list