[Bioperl-l] SearchIO: Features in/flanking this part of a subject sequence
Razi Khaja
razi.khaja at gmail.com
Wed Apr 29 19:08:14 UTC 2009
Hello,
I am generating BLAST alignments using the BLAST URL API from NCBI.
I want to parse details from BLAST reports whenever there are
"Features in/flanking this part of subject sequence". A portion of
the BLAST report showing "Features flanking ..." is pasted below.
I am using Bio::SearchIO to parse details. The relevant part of the
script is below.
The problem I am having is that for some reason the first occurrence
of a "Feature flanking this part of a subject sequence" is skipped.
I am only able to parse/print all occurrences of a "Feature
in/flanking this part of a subject sequence" from the second
occurrence to the last occurrence.
I believe the code responsible for parsing this information is in
Bio/SearchIO/blast.pm, starting on line 760.
I have tried fixing the code in Bio/SearchIO/blast.pm myself but was
not able to correct the problem.
Would it be possible for someone to fix the code in the
Bio/SearchIO/blast.pm module, or help me fix the code so that the
first occurrence is not skipped?
Thanks,
Razi
===== The part of the script that is relevant to parsing "Features
in/flanking..." ====
my $bio_searchio_in = Bio::SearchIO->new(
-file => 'blast_result.txt',
-format => 'blast'
);
my $i = 1;
while( my $result = $bio_searchio_in->next_result() ){
while( my $hit = $result->next_hit() ){
while( my $hsp = $hit->next_hsp() ){
my $hsp_features = $hsp->hit_features();
if( $hsp_features ) {
print "HSP FEATURE $i\t$hsp_features\n";
$i++;
}
}
}
}
===== A portion of a BLAST report with "Features flanking ..." =====
...
...
Score = 54.7 bits (29), Expect = 0.003
Identities = 29/29 (100%), Gaps = 0/29 (0%)
Strand=Plus/Minus
Query 6556 CCTGGGTGACAGAGTGAGACTCCATCTCA 6584
|||||||||||||||||||||||||||||
Sbjct 6953042 CCTGGGTGACAGAGTGAGACTCCATCTCA 6953014
>gi|51459264|ref|NT_077382.3|Hs1_77431 Homo sapiens chromosome 1 genomic contig
Length=237250
Features flanking this part of subject sequence:
16338 bp at 5' side: PRAME family member 8
11926 bp at 3' side: PRAME family member 9
Score = 7286 bits (3945), Expect = 0.0
Identities = 5437/6145 (88%), Gaps = 152/6145 (2%)
Strand=Plus/Plus
Query 23225 GGTTGGTTAATATTGATAATTAAATGACTTGGTACTGAGAAGAAGCTATAGGTGCAAATG
23284
|||||||||||||||||||||||||||||||| |||||| ||||||||||| ||||||||
Sbjct 86128 GGTTGGTTAATATTGATAATTAAATGACTTGGCACTGAGCAGAAGCTATAGATGCAAATG
86187
Query 23285 GGTGGCCTATGACTATTATTGATTTCATTACTGGTAATTTATCTCTATGCCTAGAAAACA
23344
||||||||||||||||| |||||||||||||| |||| ||||||| |||| ||| |||||
Sbjct 86188 GGTGGCCTATGACTATTGTTGATTTCATTACTTGTAACTTATCTCCATGCATAGGAAACA
86247
...
...
More information about the Bioperl-l
mailing list