[Bioperl-l] added Bio::SeqIO::largefasta

Ewan Birney birney@ebi.ac.uk
Tue, 5 Dec 2000 10:22:37 +0000 (GMT)


On Mon, 4 Dec 2000, Jason Stajich wrote:

> I have added support for reading in a large fasta file and making it a
> Bio::Seq::LargePrimarySeq.  Some more testing and debugging will
> need to be done to insure all the weird fasta cases are handled
> since I cannot use the same patterns as are possible in the fasta.pm 
> module since I can only read in one line at a time in order to meet
> our not holding the sequence in memory requirements.  

Right.

> 
> Please note that currently next_seq will return a PrimarySeq 
> until I decide if we can have or need a LargeSeq class or just a wrapper 
> as well. Also the Bio::Seq::LargePrimarySeq implementation means that it
> will make a copy of the fasta file to your tmpdir (as defined by
> File::Spec->tmpdir) which if overly large could make your machine very
> unhappy as it could run out of swap space.  You can override the location
> of the tmp file by setting 
> $Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir' 
> BEFORE you instantiate a new LargePrimarySeq object.

I am with hilmar that this should return a Seq object which has-a
Bio::Seq::LargePrimarySeq.

> 
> The test, largefasta.t has been added as well and some additional routines
> were added LargePrimarySeq to bring it up to PrimarySeqI spec.
> 
> Some likely uses, at least from my perspective, is the ability to read in
> a large sequence file and chop it into smaller managable chunks for some
> specific tasks.
> 

Also for adding features put a massive coordinate scale (perhaps produced
by some database group somewhere...) and then dumping out the sequence
associated with that efficiently

BTW - so that people know, LargePrimarySeq relies on the fact that 
people use the 

   $seq->subseq(1000,1100); 

methods to get out regions, not

   substr($seq->seq,1000,100);



> This will likely not be on the 0.7 branch as it is new code so we'll have
> to omit it from the branch.
> 

I, personally, think this is fine on the branch, but Hilmar is branch
king, so he has the final say ...

I don't think this is going to break anything.


> Suggestions and Comments are always appreciated.
> 
> -Jason
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center 
> http://www.chg.mc.duke.edu/ 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------