[Bioperl-l] UCSC database backend

Thu Aug 10 14:37:46 UTC 2006

On 8/10/06 10:21 AM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Sendu,
> 
> Sean indicates that the sequences would be held in flatfiles.  The
> trick would be grabbing location information from a particular MySQL
> table, then using that to retrieve the sequence slice from the
> indexed flatfile.
> 
> MySQL table-->SeqFeatureI(?)-->
> Bio::LocationI(Simple/Split/Fuzzy etc)-->sequence slice from Indexed
> file

For genomic information, that can be done relatively easily, either using
DAS or local flat files indexed by whatever means.  Data at UCSC is stored
relative to the genome, so this may be enough, as long as one does not care
about having the "original" sequence that generated the alignment that UCSC
is reporting.

> Would be relatively easy if the MySQL table contains information
> about which flatfile is used; that I don't know.  If not, maybe use
> an .ini file to map the tables to flatfiles?

I don't think maintaining an additional file of flatfiles is reasonable,
given the complexity of the system at UCSC, but it is certainly worth
mentioning as a possibility.

Sean