[Bioperl-l] Sequence retrieval from BLAST indexes

Jason Stajich jason@cgt.mc.duke.edu
Wed, 20 Mar 2002 10:17:23 -0500 (EST)


On Wed, 20 Mar 2002, Martin Schenker wrote:

> Hi Brian & Ewan,
>
> now I'm a bit confused.
>
> Does Bio::Index::Fasta create the same indexes as the BLAST utility
> "formatdb"?
> >From the module description it doesn't look that way...

nope, uses berkeleyDB.  Blast has its own home grown format.  We have in
the past not supported it because the index format changes between
versions of blast.

You'd have to reindex your files with Bio::Index::Fasta - this would not
be compatible with blast index files so you'd use 2x as much disk space.

Additionally, you can store multiple IDs per sequence in the Index files.

Chris Mungall has sent me some code long and far away that reads blast 2
version indexes in perl.

Finally, the OBDA project has defined a new open-bio standard for
BerkeleyDB and index files that all the Open-Bio projects will support.
See Bio::DB::Flat for this functionality.

No idea about Bio::DB::BlastDB - not part of our dist so would have to
talk to Bradford.


>
> I really want to use the BLAST db indexes (like *.phr, *.pin, *.psq etc) to
> get some seqs back.
> On the command-line , the utility "fastacmd" does that and displays the
> result to STDOUT.
> The only reference to "fastacmd" and BioPerl was in the module from Bradford
> Powell (1999)
>
> (snip)
> 		=head1 NAME
>
> 		Bio::DB::BlastDB - Database object interface to local blast
> databases via fastacmd
>
> 		=head1 SYNOPSIS
>
> 		    $db = new Bio::DB::BlastDB;
>
> 		    $seq = $db->get_Seq_by_id('MUSIGHBA1'); # Unique ID
>
> 		    # or ...
>
> 		    $seq = $db->get_Seq_by_acc('J00522'); # Accession Number
>
> 		=head1 DESCRIPTION
>
> 		Permits retrieval of sequence data from a file which has
> been prepared for
> 		blast processing by the ncbi toolkit program 'formatdb'. The
> databases must
> 		be created with the option '-o T' (see the ncbi toolkit
> docs) This module requires
> 		the presence of another ncbi program, 'fastacmd', which
> performs the actual
> 		sequence access.
>
> 		=head1 FEEDBACK
> 		...
>
> (\snip)
>
> So my initial question was, if this module was further developed in BioPerl
> 1.0.
>
> Any ideas?
>
> Best, Martin
>
>
> > On Tue, 19 Mar 2002, Brian Osborne wrote:
> >
> > > Martin,
> > >
> > > There are a couple of ways to do this in Bioperl v. 1.0. The simplest
> > way is
> > > to use Bio::Index::Fasta, but one of the problem with this approach is
> > that
> > > you might not be able to use the id that's most easily available to you.
> > An
> > > alternative, Bio::DB::Fasta, has more features and gets around this
> > problem.
> > > Take a look at section III.1.2 of bptutorial.pl as a starting point.
> >
>
>
> **********************************************************************
> The information transmitted by this email is private and
> confidential and is intended for the use of the intended
> recipients specified therein.
> If you are neither an intended recipient nor an employee
> or agent responsible for delivery to an intended recipient,
> you should be aware that any dissemination, distribution
> or copying of this communication is strictly prohibited.
> If you received this communication in error, please
> notify us immediately.
> **********************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu