[Bioperl-l] How to download the exon sequences, and the exon and CDS boundary for a RefSeq ID?

Sean Davis sdavis2 at mail.nih.gov
Tue Dec 29 18:06:17 EST 2009


On Tue, Dec 29, 2009 at 5:01 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
> I see the following example. But it is not clear to me how to get the
> exon sequences. I also want to get the exon boundaries and associated
> CDS boundaries. Although, I can get the boundary information from ucsc
> table browser, but it would be convenient if I can get it in bioperl
> along with the sequence.
>
> Could somebody let me know how do it?
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/RefSeq.html

Hi, Peng.  There may be some confusion, as the UCSC database aligns
RefSeq sequence to a genome to generate exon start and end
coordinates.  However, the RefSeq records retrieved by Bio::DB::RefSeq
are not in genomic context and so do not have start and end locations
on the genome.  That is, if you want the starts and ends along the
genome, that information is not available from the RefSeq record
itself, I don't think.  If that is what you need (genomic
coordinates), you can download the information directly from UCSC,
download flat files from NCBI mapview, or even from ensembl (using
biomart, for instance).  If you are looking for a bioperl-compliant
way of doing this, look at the Ensembl Perl API.

Sean


More information about the Bioperl-l mailing list