[Bioperl-l] extracting subsequences

Sean Davis sdavis2 at mail.nih.gov
Wed Oct 26 06:36:48 EDT 2005


As an alternative, Jim Kent and UCSC have tools for working with .nib and
twobit files (smart storage for large sequences) that are also very fast.
They have names like "twoBitToFa".

That said, I really like Bio::DB::Fasta and use it.

Sean


On 10/25/05 10:57 PM, "Jason Stajich" <jason.stajich at duke.edu> wrote:

> Bio::DB::Fasta is the best way to do this for big sequences.
> 
> -jason
> On Oct 25, 2005, at 4:48 PM, Amit Indap wrote:
> 
>> Hi,
>> 
>> I have to extract subsequences from fasta files containing entire
>> human chromosomes. For example I would like to extract bp
>> 167506667..167523040. I know how to do this using the Bio::Seq and
>> Bio::SeqIO APIs. The problem is it takes a long time to read in an
>> entire fasta file containing a chromosome. Is there a way I can speed
>> this up?
>> 
>> The bp indices are taken from BLAT-ing my sequences to the genome. I
>> could use megablast to find which contigs my sequences lie on, and
>> then read in those files rather than the whole chromosome.
>> 
>> Any suggestions would be helpful. Thanks.
>> 
>> Amit
>> --
>> Amit Indap
>> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list