[Bioperl-l] Bio::Index::Fasta vs Bio::DB::Fasta

Lincoln Stein lstein@cshl.org
Mon, 14 Jan 2002 18:47:42 -0400


Hi Tony,

I wasn't proposing that Index::Fasta go away, but that DB::Fasta (my
own hack) exit.  So relax!

Lincoln

Tony Cox writes:
 > On Sat, 12 Jan 2002, Ewan Birney wrote:
 > 
 > Just a note that I have a _lot_ of code and time invested internally in the
 > Bio::Index::Fasta modules. It forms a fairly major plank of out internal
 > sequence fetching architecture here in Sanger (along with the more complex
 > functionality of SRS). Most of the time it is used for "normal" sequence
 > fetching (EMBL clones etc) and not for chr-sized DNA chunks where the DB::Fasta
 > really wins.It also compliments the Fastq modules that can be used to get
 > matching quality data if it exists. 
 > 
 > In short does Index::Fasta  _have _ to go?
 > 
 > Tony
 > 
 > 
 > +>On Fri, 11 Jan 2002, Lincoln Stein wrote:
 > +>
 > +>> Hi Folks,
 > +>> 
 > +>> I've just recently become aware that Bio::Index::Fasta has very heavy 
 > +>> overlapping functionality with Bio::DB::Fasta, and this is likely to lead to 
 > +>> some user confusion down the road.
 > +>> 
 > +>> I would remove Bio::DB::Fasta in favor of the Bio::Index version, except that 
 > +>> I don't think that Bio::Index::Fasta does the thing that first motivated 
 > +>> Bio::DB::Fasta, which was the ability to retrieve subsequences efficiently.  
 > +>> I have big (tens of megabyte) fasta files that contain 
 > +>> whole C. elegans chromosomes, and want to fetch a few base pairs from the 
 > +>> middle of them without reading the whole record into memory.  Can 
 > +>> Bio::Index::Fasta do this?
 > +>
 > +>
 > +>I am pretty sure it can't do this (which is why i believe you checked in
 > +>DB::Fasta in the first place). Does DB::Fasta make assumptions about line
 > +>length so it can SEEK to the right place?
 > +>
 > +>
 > +>Clearly merging the two pieces would be great. It is not something I am
 > +>overly worried about but it would be nice. 
 > +>
 > +>
 > +>Two routes:
 > +>
 > +>(I am assumming that we are still calling it Bio::Index::Fasta...)
 > +>
 > +>  (a)
 > +>
 > +>     Bio::Index::Fasta gives back a Bio::SeqI complianant object which is
 > +>actually a new thing called Bio::Seq::LargeFastaFixedLineLength (silly
 > +>name...). This object does not load the sequence into memory but executes
 > +>
 > +>     $seq->subseq(100000,1000020);
 > +>
 > +>     with a SEEK.
 > +>
 > +>
 > +>  (b) Bio::Index::Fasta will accept gets on slices
 > +>
 > +>
 > +>Reading the documentation of Bio::DB::Fasta I notice that you have put
 > +>nearly every access in (!) ---- I am always *so* impressed by your modules
 > +>Lincoln, they nearly always have every route into them first off.
 > +>
 > +>
 > +>
 > +>So --- you have carte blanche to rearrange this area. As long as you are
 > +>convinced that you wont be effecting exisiting FASTA indexes you can do
 > +>what you like with Bio::Index::Fasta before 1.0 ---- it should work
 > +>however with existing indexes - (ie, don't change the hash key
 > +>representations etc).
 > +>
 > +>
 > +>If you want to do a more serious reorganisation then it has got to be post
 > +>1.0.
 > +>
 > +>
 > +>
 > +>Your choice of options and code.
 > +>
 > +>
 > +>> 
 > +>> Lincoln
 > +>> 
 > +>> 
 > +>
 > +>_______________________________________________
 > +>Bioperl-l mailing list
 > +>Bioperl-l@bioperl.org
 > +>http://bioperl.org/mailman/listinfo/bioperl-l
 > +>
 > 
 > ******************************************************
 > Tony Cox			Email:avc@sanger.ac.uk
 > Sanger Institute		WWW:www.sanger.ac.uk
 > Wellcome Trust Genome Campus	Webmaster
 > Hinxton				Tel: +44 1223 834244
 > Cambs. CB10 1SA			Fax: +44 1223 494919
 > ******************************************************
 > 

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
Positions available at my lab: see http://stein.cshl.org/#hire
========================================================================