[Bioperl-l] Re: parsing BLAST html

Brian Osborne brian_osborne at cognia.com
Wed Aug 13 08:42:04 EDT 2003


Sofia,

Just making sure here. The output from StripHTML can be parsed by SearchIO?
This probably belongs in the FAQ.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Sofia
Sent: Tuesday, August 12, 2003 10:41 AM
To: Bioperl Mailing List
Subject: [Bioperl-l] Re: parsing BLAST html

I use PerlIO::via::StripHTML and it works quite successfully
-
Sofia

Hi Wes,
Before I parse my html blast I use PerlIO::via::StripHTML.  It removes all
html and I save the new file as the orginalFileName.out.  I like the html
blast output because I save them later for another use. But if I didnt need
them I would just use text output.

use strict;
use Bio::SearchIO;
use PerlIO::via::StripHTML;

my @dir_html_files = </usr/local/www/Blast/NcbiBlast/*.htm>;
foreach my $file (@dir_html_files){
 my $outfile  = $file."\.out";
 open OUTFILE, ">$outfile";
 open INFILE, '<:via(StripHTML)', $file
    or die "Can't open $outfile: $!\n";
 while (<INFILE>){
  print OUTFILE $_;
 }
}

-Sofia
----- Original Message -----
From: "Jason Stajich" <jason at cgt.duhs.duke.edu>
To: "Wes Barris" <wes.barris at csiro.au>
Cc: "Bioperl Mailing List" <bioperl-l at bioperl.org>
Sent: Tuesday, August 12, 2003 6:23 AM
Subject: Re: [Bioperl-l] Parsing html blast output?


> No, it is not currently possible to parse BLAST HTML output.
>
> On Tue, 12 Aug 2003, Wes Barris wrote:
>
> > Hi,
> >
> > I know it is possible to use the SearchIO functions to parse either
> > text blast output or xml blast output.  However, I would like to know
> > if it is possible to parse html blast output?  For example, if I wanted
> > to parse the output of this command:
> >
> > blastcl3 -d nr -p blastn -T -i fasta.txt -o blast.html
> >
> > When I try parsing the above "blast.html" file using example number 4
> > from this file:
> >
> > http://bioperl.org/HOWTOs/html/Graphics-HOWTO.html
> >
> > I get errors.
> >
> > What I ended up doing is writing a perl "de-htmlizer" that I use to
> > convert an html blast output file into a text-only blast output file.
> > Then I run the result through a bioperl blast parsing script.  Is
> > there a more elegant way to do this?
> >
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman




More information about the Bioperl-l mailing list