[Bioperl-l] Bio::Index::Fasta vs Bio::DB::Fasta

Ewan Birney birney@ebi.ac.uk
Sat, 12 Jan 2002 10:18:49 +0000 (GMT)


On Fri, 11 Jan 2002, Lincoln Stein wrote:

> Hi Folks,
> 
> I've just recently become aware that Bio::Index::Fasta has very heavy 
> overlapping functionality with Bio::DB::Fasta, and this is likely to lead to 
> some user confusion down the road.
> 
> I would remove Bio::DB::Fasta in favor of the Bio::Index version, except that 
> I don't think that Bio::Index::Fasta does the thing that first motivated 
> Bio::DB::Fasta, which was the ability to retrieve subsequences efficiently.  
> I have big (tens of megabyte) fasta files that contain 
> whole C. elegans chromosomes, and want to fetch a few base pairs from the 
> middle of them without reading the whole record into memory.  Can 
> Bio::Index::Fasta do this?


I am pretty sure it can't do this (which is why i believe you checked in
DB::Fasta in the first place). Does DB::Fasta make assumptions about line
length so it can SEEK to the right place?


Clearly merging the two pieces would be great. It is not something I am
overly worried about but it would be nice. 


Two routes:

(I am assumming that we are still calling it Bio::Index::Fasta...)

  (a)

     Bio::Index::Fasta gives back a Bio::SeqI complianant object which is
actually a new thing called Bio::Seq::LargeFastaFixedLineLength (silly
name...). This object does not load the sequence into memory but executes

     $seq->subseq(100000,1000020);

     with a SEEK.


  (b) Bio::Index::Fasta will accept gets on slices


Reading the documentation of Bio::DB::Fasta I notice that you have put
nearly every access in (!) ---- I am always *so* impressed by your modules
Lincoln, they nearly always have every route into them first off.



So --- you have carte blanche to rearrange this area. As long as you are
convinced that you wont be effecting exisiting FASTA indexes you can do
what you like with Bio::Index::Fasta before 1.0 ---- it should work
however with existing indexes - (ie, don't change the hash key
representations etc).


If you want to do a more serious reorganisation then it has got to be post
1.0.



Your choice of options and code.


> 
> Lincoln
> 
>