[Bioperl-l] UCSC database backend

Sean Davis sdavis2 at mail.nih.gov
Fri Sep 1 11:53:14 UTC 2006


On Thursday 31 August 2006 19:53, Caleb Davis wrote:
> Hi folks, first time caller here.  Love the show!
>
> I just started going through the archive and saw this thread.  I vote in
> favor of this interface, for what it's worth.  What about doing it this
> way?:
>
> $objSeqIO  = Bio::SeqIO->new(-file => '~/seq/myseqCustomTrack.bed',
>                          -format => 'bed',
>                          -assembly => 'hg18',
>                          -track => 'hg18_myfavgenes');    #see example

Hi, Caleb.  Welcome to the list.  

What you are proposing seems to be two separate but related tasks.  First, 
parse bed-format files into bioperl-compatible sequence objects.  Second, 
once those are in, pull sequence if desired from UCSC.  

For the first, you could certainly write a parser for bed format that would 
give back sequence objects.  You might also want to look at the GFF format, 
as there are quite a few tools for GFF parsing, formatting, and sequence 
retrieval from local databases.  

For the second task, if what you are after is a straightforward way of 
retrieving arbitrary sequences bases on location, then you might want to look 
at the DAS service set up by ucsc.  Doing what you propose would be as simple 
as reading in a format your choice and then constructing a url like:

http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr1:1,5000;segment=chr10:52000,53000

Which will return an xml-format file containing two sequences.  As you can 
see, the construction of the URL is trivial.  See here for more information.

http://genome.ucsc.edu/FAQ/FAQdownloads#download23

Sean



More information about the Bioperl-l mailing list