[Bioperl-l] Bio::SeachIO::Fasta problem

Jason Stajich jason at cgt.duhs.duke.edu
Mon Aug 25 13:52:42 EDT 2003


Martin - it's tested on FASTA 3.4 and some versions of 3.3.  It can parse
the -m 9 tabluar output as well as standard default output (with or
without Histograms).

Personally I would just use the latest distribution:
ftp://ftp.virginia.edu/pub/fasta/fasta3.shar.Z

It has not been tested with the GCG-ized FASTA and as you report it
doesn't seem to work. I took the liberty of posting a bug report for you
with an example report as this is the type of information needed for
someone to diagnose a problem.

I don't know that fixing this will get a priority given that it is pretty
easy to install and run FASTA directly from Bill's distro and we can parse
that output just fine.

-jason

On Mon, 25 Aug 2003, Martin A. Hansen wrote:

> hi
>
> im trying to parse fasta search reports with Bio::SeachIO. however, i get this
> warning message:
>
> maasha at homer:~/bin$ parse_fasta btg1.fasta
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> this indicates that there might be something wrong with the fasta report file,
> but im not sure what that could be. im i supposed to run a certain version of
> fasta? and with a certain set of options? e.g. i have noticed that running
> fasta from the wisconsin packages (GCG) outputs a double dot (..) between the
> introtext and the data:
>
> The best scores are:                    init1 initn   opt    z-sc E(7402)..
>
> whereas running "normal" fasta does not produce the double dot?
>
> and to really twist the fork i am failing in identifying the different fasta
> versions :/
>
> anyways, here is the snippet of code im using to parse:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
>
> my ( $script, $usage, $file );
>
> $script = ( split "/", $0 )[ -1 ];
>
> $usage = qq(
>
> $script by Martin A. Hansen, August 2003.
>
> $script parses a FASTA report file
>
> Usage: $script [file]
>                [file]       - file with fasta report
>
> );
>
> print $usage and exit if not @ARGV;
>
> $file = shift @ARGV;
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MAIN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> my ( $lines );
>
> $lines = &parse_fasta( $file );
>
> print "$_\n" foreach @{ $lines };
>
> exit;
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SUBROUTINES <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> sub parse_fasta
> {
>     # Martin A. Hansen, August 2003.
>
>     # parses blast reports using Bioperl
>
>     my ( $file,   # file with blast report
>        ) = @_;
>
>     # returns list of sequence lines
>
>     my ( $result, $hit, $hit_name, $searchio, $white_space, $query_beg, $hsp, $hit_string, @lines, $query_string, $query_name );
>
>     $searchio = new Bio::SearchIO ( -format => 'fasta', -file => $file );
>     $result   = $searchio->next_result;
>
>     while ( $hit = $result->next_hit )
>     {
>         $query_name   = $result->query_name;
>         $hit_name     = $hit->name;
>         $hsp          = $hit->next_hsp;
>
>         $query_string = $hsp->query_string;
>         $query_beg    = $hsp->query->start;
>         $hit_string   = $hsp->hit_string;
>
>         $white_space  = ' ' x ( $query_beg - 1 );
>
>         push @lines, {
>                        "QUERY_NAME"     => $query_name,
>                        "QUERY_STRING"   => $white_space . $query_string,
>                        "SUBJECT_NAME"   => $hit_name,
>                        "SUBJECT_STRING" => $white_space . $hit_string,
>         }
>     }
>
>     return wantarray ? @lines : \@lines;
> }
>
>
>
>
> # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> __END__
>
>
>
> any suggestions?
>
>
> martin
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list