[Bioperl-l] Getting sequences by base pair locations

Chris Fields cjfields at uiuc.edu
Fri Jul 28 14:37:09 UTC 2006


...


> > If your genome file is in some standard format, use SeqIO.
> > http://www.bioperl.org/wiki/HOWTO:SeqIO
> >
> > And then get the sequence corresponding to the correct chromosome and
> > get the desired chunk with subseq();
> > http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object
> 
> My guess is that Yuval will need random access to the sequences.  With
> seqIO, this is possible with a relatively large amount of memory, but
> Bio::DB::Fasta might be the better bet.

Agreed.  This is one of the bioperl 'speed' issue areas:

http://www.bioperl.org/wiki/Project_priority_list

Bio::DB::Fasta returns a specialized PrimarySeq object which gets around the
current speed issues with SeqIO.  
 
> Alternatively, make a custom track (see the documentation for doing so
> at the UCSC genome browser site), upload it, and then getting the DNA is
> trivial with just a couple of mouseclicks.  This method also has the
> advantage of being able to do things like viewing the data in genome
> coordinates and allows the possibility of doing interections with known
> chimp genes so you could find hits that don't overlap known chimp genes,
> for example.
> 
> Sean

Would be nice to have a more automated and direct way of doing something
along these lines within bioperl (with the obvious caveat of not spamming
the server).  You can currently retrieve chunks of sequence based on start,
stop, strand from GenBank.

Ah, one can dream...

Chris




More information about the Bioperl-l mailing list