[Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence

Razi Khaja razi.khaja at gmail.com
Wed Apr 29 19:08:14 UTC 2009


Hello,

I am generating BLAST alignments using the BLAST URL API from NCBI.

I want to parse details from BLAST reports whenever there are
"Features in/flanking this part of subject sequence".  A portion of
the BLAST report showing "Features flanking ..." is pasted below.

I am using Bio::SearchIO to parse details.  The relevant part of the
script is below.

The problem I am having is that for some reason the first occurrence
of a "Feature flanking this part of a subject sequence" is skipped.
I am only able to parse/print all occurrences of a "Feature
in/flanking this part of a subject sequence" from the second
occurrence to the last occurrence.

I believe the code responsible for parsing this information is in
Bio/SearchIO/blast.pm, starting on line 760.
I have tried fixing the code in Bio/SearchIO/blast.pm myself but was
not able to correct the problem.
Would it be possible for someone to fix the code in the
Bio/SearchIO/blast.pm module, or help me fix the code so that the
first occurrence is not skipped?

Thanks,
Razi

===== The part of the script that is relevant to parsing "Features
in/flanking..." ====
my $bio_searchio_in = Bio::SearchIO->new(
    -file   => 'blast_result.txt',
    -format => 'blast'
);

my $i = 1;
while( my $result = $bio_searchio_in->next_result() ){
    while( my $hit = $result->next_hit() ){
        while( my $hsp = $hit->next_hsp() ){
            my $hsp_features = $hsp->hit_features();
            if( $hsp_features ) {
                print "HSP FEATURE $i\t$hsp_features\n";
                $i++;
            }
        }
    }
}

===== A portion of a BLAST report with "Features flanking ..." =====
...
...
 Score = 54.7 bits (29),  Expect = 0.003
 Identities = 29/29 (100%), Gaps = 0/29 (0%)
 Strand=Plus/Minus

Query  6556     CCTGGGTGACAGAGTGAGACTCCATCTCA  6584
                |||||||||||||||||||||||||||||
Sbjct  6953042  CCTGGGTGACAGAGTGAGACTCCATCTCA  6953014


>gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig
Length=237250

 Features flanking this part of subject sequence:
   16338 bp at 5' side: PRAME family member 8
   11926 bp at 3' side: PRAME family member 9

 Score = 7286 bits (3945),  Expect = 0.0
 Identities = 5437/6145 (88%), Gaps = 152/6145 (2%)
 Strand=Plus/Plus

Query  23225  GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG
 23284
              |||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||||||
Sbjct  86128  GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG
 86187

Query  23285  GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA
 23344
              ||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| |||||
Sbjct  86188  GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA
 86247
...
...




More information about the Bioperl-l mailing list