[Bioperl-l] parsing an html blast result file

Jason Stajich jason at cgt.duhs.duke.edu
Wed Jul 23 10:14:32 EDT 2003


Later versions of NCBI BLAST XML aren't formed correctly - or XML::Parser
is tripping up on something it should ignore.

I have not had time to really figure out how to fix it, but basically if
you make your XML file look like

<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd"><BlastOutput>

instead of
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd">
<BlastOutput>

It should work.  Wanted to put something in the preprocessing in SearchIO
to handle it, but don't have time.

I'm no XML love/expert so I haven't really tried to dig deep into why this
is tripping up XML::Parser.

-jason


On Wed, 23 Jul 2003, Wes Barris wrote:

> Hi,
>
> I have installed bioperl etal. on a Sun (Solaris8).  I would now like
> to try parsing an html blast results file.  I saved example 4 from this
> page into a file:
>
>      http://www.bioperl.org/HOWTOs/html/Graphics-HOWTO.html
>
> The only thing I changed in the file is the format of the input file
> from this:
>
> -format => 'blast') or die "parse failed";
>
> to this:
>
> -format => 'blastxml') or die "parse failed";
>
> I am assuming that the format of an html blast result file is "blastxml",
> but I could be wrong.  I could not find a list of valid formats that can
> be used with the Bio::SearchIO->new constructor.
>
> When I run the example 4 script, I get this error:
>
> wes at sequence> blasttoimg.pl junk.html >junk.png
>
> -------------------- WARNING ---------------------
> MSG: error in parsing a report:
>
> not well-formed (invalid token) at line 9, column 34, byte 238 at
> /usr/local/lib/perl5/site_perl/5.6.1/sun4-solaris/XML/Parser.pm line 185
>
> ---------------------------------------------------
> no result at /home/wes/proj/blast/blasttoimg.pl line 15, <GEN1> line 669.
>
> Could anyone suggest what I might try to make this work?
>
> #!/usr/local/bin/perl
>
> # This is code example 4 in the Graphics-HOWTO
> use strict;
> #use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SearchIO;
>
> my $file = shift or die "Usage: render4.pl <blast file>\n";
>
> my $searchio = Bio::SearchIO->new(-file   => $file,
>                                    -format => 'blastxml') or die "parse failed";
>
>
> my $result = $searchio->next_result() or die "no result";
>
> my $panel = Bio::Graphics::Panel->new(-length    => $result->query_length,
>                                        -width     => 800,
>                                        -pad_left  => 10,
>                                        -pad_right => 10,
>                                       );
>
> my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>$result->query_length,
> -seq_id=>$result->query_name);
>
> $panel->add_track($full_length,
>                    -glyph   => 'arrow',
>                    -tick    => 2,
>                    -fgcolor => 'black',
>                    -double  => 1,
>                    -label   => 1,
>                   );
>
> my $track = $panel->add_track(-glyph       => 'graded_segments',
>                                -label       => 1,
>                                -connector   => 'dashed',
>                                -bgcolor     => 'blue',
>                                -font2color  => 'red',
>                                -sort_order  => 'high_score',
>                                -description => sub {
>                                  my $feature = shift;
>                                  return unless $feature->has_tag('description');
>                                  my ($description) = $feature->each_tag_value('description');
>                                  my $score = $feature->score;
>                                  "$description, score=$score";
>                                 });
>
> while( my $hit = $result->next_hit ) {
>    next unless $hit->significance < 1E-20;
>    my $feature = Bio::SeqFeature::Generic->new(-score   => $hit->raw_score,
>                                                -seq_id => $hit->name,
>                                                -tag     => {
>                                                             description => $hit->description
>                                                            },
>                                               );
>    while( my $hsp = $hit->next_hsp ) {
>      $feature->add_sub_SeqFeature($hsp,'EXPAND');
>    }
>
>    $track->add_feature($feature);
> }
>
> print $panel->png;
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list