Bioperl: Another problem, this one uglier.

Michael B. Thornton lost@sea.incyte.com
Thu, 18 Nov 1999 15:32:21 -0800


Hello All,

We found this same problem in our hands.  I think that it comes from using
the ">" character as a delimiter.  Reports with no hits do not have the ">"
character and so all get strung together.

We modified Blast.pm   sub _get_parse_blast_func thusly......

----------------------------------------------------------------------------
----
       if($data =~ m/Database:\s+(.+?)$Newline/so ) {
  $current_db = $1;
       } else {
  # In some reports, the Database is only listed at end.
  #$Blast->warn("Can't determine database name from BLAST report.");
       }
     }

# Incyte_Fix:   Nasty Invisible Bug.
     # Records in blast report are delimited by '>', but... when
     #  there are no hits for a query, there won't be a '>'.  That
     #  causes several blast reports to run together in the data
     #  passed to this routine.  Need to get rid of non-hits in data
     if ($data =~ /.+(No hits? found.+Sequences.+)/so) {
  $data = $1;
     }
# End Incyte_Fix


     # Determine if we need to create a new Blast object
     # or use the $self object for this method.

     if($Blast->{'_multi_stream'} or $self->name eq 'Static Blast object') {
----------------------------------------------------------------------------
-----

I don't know if this is the best or right way to finx this, but it seems to
make the problem go away.

hope this helps

cheers,

ok
mbt



----- Original Message -----
From: carl virtanen <carl@cimmed.com>
To: <vsns-bcd-perl@lists.uni-bielefeld.de>
Sent: Wednesday, November 17, 1999 9:23 PM
Subject: Bioperl: Another problem, this one uglier.


> Hi.
> Still haven't figured out what was happening in the last problem i had,
but i
> managed a sort of unelegant work around.
> But here's something i don't understand. I have a bunch of blast reports
> (2.0.10) in a single file. Some are hits, some are misses. Now, i use the
> example program thusly:
>
> cat myblast.reports | perl parse_blast.pl -table 1
>
> everything works, and it has extracted a nice looking table of all the
hits in
> my blast file. Only one problem.... The query names DO NOT match with the
> correct subjects (ie-sequence identifiers-column 3).  Actually, this
problem is
> related to my earlier problem  i think (actually, i know it is).  Say for
> example you have a blast file with reports like this:
> Blast 1 report->has hits
> Blast 2 report->no hits
> Blast 3 report->no hits
> Blast 4 report-> has hits
>
> Now, the BioBlast picks up the first hits. Then it skips the next 2
reports (no
> hits) and picks up the Blast 4 report. Ok. Only problem, is that the query
name
> reported in the table is picked up off of Blast 2 and NOT Blast 4, which
is the
> correct query name for that set of hits.  This is a major problem. At
least for
> me!  If i'm doing something wrong here in my parsing, let me know. Better
yet,
> let everybody know EXACTLY what must be done. The worst thing is to get
results,
> that look ok but which are wrong.
>
> Carl
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================