[Bioperl-l] BLAST parsing Question

Brian Osborne brian_osborne@cognia.com
Thu, 9 May 2002 16:31:20 -0400


Simon,

>> There is no way to get what each
>> subject in the database best hits (based on
>> P value) which query in the query file?

Yes, you can write a script to do this using the information returned by
BPlite. But I'm not aware of a module or a script inside the current
distribution that performs exactly this operation. Bioperl has done the hard
work for you, reading, executing, parsing, but I think you'll have to use
the data returned, in arrays from each blastall run for example, and write a
script to do exactly what you're suggesting.

Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Simon Chan
Sent: Thursday, May 09, 2002 1:56 PM
To: brian_osborne@cognia.com
Cc: bioperl-l@bioperl.org
Subject: Re: RE: [Bioperl-l] BLAST parsing Question

Brian,

Thanks for the help.

I want to make sure I have this right.
There is no way to get what each
subject in the database best hits (based on
P value) which query in the query file?
Is that what you're saying?

The purpose of this is to program
is to match up as many of the seqs in
the query file to one of the seqs
in the database as possible.  Perhaps there is
another method of doing this?
(that is, besides forcing people to
follow more consistent naming
conventions.... :-)

>>> "Brian Osborne" <brian_osborne@cognia.com> 05/09/02 09:10 AM >>>
Simon,

>> In other words, I don't want any sequence in the database to be hit more
than once.

I'm guessing you're going to have to code this yourself, I don't think this
is built in to any of the Blast modules in Bioperl. Perhaps there's some
example code in there somewhere. Here's a possible starting point but I
haven't thought about this in the context of BPlite :

How do I compute the difference of two arrays? How do I compute the
intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is
unique in a given array:
    @union = @intersection = @difference = ();
    %count = ();
    foreach $element (@array1, @array2) { $count{$element}++ }
    foreach $element (keys %count) {
        push @union, $element;
        push @{ $count{$element} > 1 ? \@intersection : \@difference },
$element;
    }


Brian O.


-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
Behalf Of Simon K. Chan
Sent: Thursday, May 09, 2002 11:14 AM
To: Wiepert, Mathieu; 'Leonardo Marino-Ramirez'
Cc: Bioperl Help
Subject: RE: [Bioperl-l] BLAST parsing Question

Mathieu, Leonardo, ad Brian,

thanks for responding.  Leonardo, I have been using
BPlite.  I know that it is a blast parser.  However, I
think you misunderstood my question.

The sequences in the fasta file will no doubt hit more
than one sequence in the database made by formatdb.  I
would like to get what each sequence in the file hits
(if at all).  So, if 2 different sequences in the file
hit the same sequence in the database, I only want the
one with the better P value.  In other words, I don't
want any sequence in the database to be hit more than
once.


But maybe I missed something, so I'll carefully look
at the sites you guys mentioned.

Mathieu,  thanks for the sample script!


Thanks for your help, guys!
--- "Wiepert, Mathieu" <Wiepert.Mathieu@mayo.edu>
wrote:
> Hi,
>
> I believe the class to use is now Bio::SearchIO?
> I'll send you a longwinded
> program that has an example, or check out
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/SearchIO.html
>
>
>
>
> my $searchio = new Bio::SearchIO(-format => 'blast',
>                                  -file =>
> 'blast.out');
>
> -Mat
>
> -----Original Message-----
> From: Leonardo Marino-Ramirez
> [mailto:marino@tofu.tamu.edu]
> Sent: Thursday, May 09, 2002 6:34 AM
> To: Simon K. Chan
> Cc: Bioperl Help
> Subject: Re: [Bioperl-l] BLAST parsing Question
>
>
> Dear Simon,
>
> I encourage you to read the bioperl tutorial
>
> http://bio.perl.org/Core/bptutorial.html
>
> The module to use is BPlite. For usage see:
>
>
http://docs.bioperl.org/releases/bioperl-1.0/Bio/Tools/BPlite.html
>
> Leonardo
>
> On Wed, 8 May 2002, Simon K. Chan wrote:
>
> > My fellow BioPerl-ers,
> >
> > It took me a couple of hours to realize that what
> I
> > need to do is EXTREMELY common and that it's
> already
> > been done.  D'oh!! :-0
> >
> > So, I made a fasta database by using the formatdb
> > command.  I have a fasta file with a bunch of
> > sequences.
> >
> > Example:
> >
> > fasta file: seq A, seq B, seq C
> >db made with formatdb: seq 1, seq 2, seq 3
> >
> > I blast the sequences in the file against the
> > database.
> > So, let's say that seq A only hits seq 1 with a P
> > value of 10e-100 and seq B only hits seq 1 with a
> P
> > value of 2e-15.  Seq C hits nothing.  How would I
> pull
> > out that Seq A matches Seq1 and that Seq C AND Seq
> B
> > match nothing?
> >
> > I tried to do this with hashes where the keys were
> the
> > matches from the db and the values were arrays
> with
> > the P value and query name. But it go real messy.
> > Yuck!  Does anyone know of a module or some other
> > quick way of doing this?
> >
> > Thanks, all.
> >
> > =====
> >
> > #################
> >
> > Warmest Regards,
> >
> > Simon K. Chan - bioinformatics_rocks@yahoo.com
> >
> > "Great spirits have always encountered violent
> opposition from mediocre
> minds."
> >
> > - Albert Einstein
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Shopping - Mother's Day is May 12th!
> > http://shopping.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> --
>
> ___ _/
>
_______________________________________________________________
>    _/
>   _/    _/      _/_/_/    Leonardo Marino-Ramirez
> lmarino@tamu.edu
>  _/    _/_/  _/_/    _/  Biochemistry Department,
> Texas A&M University
> _/_/_/_/  _/  _/_/_/    2128 TAMU, College Station,
> TX 77843-2128, USA
>      _/      _/     _/ Voice: (979) 862-4055   Fax:
> (979) 845-9274
> ___ _/      _/     _/
> ________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l


_________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Mother's Day is May 12th!
http://shopping.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l



*************************************************************************
This email message is confidential and may contain privileged information.
Any unauthorized dissemination or copying is strictly prohibited. If you
have received it in error, please delete it and notify us immediately.
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l