[Bioperl-l] Retrieving genomic sequence remotely

Stefan Kirov skirov@utk.edu
Mon, 10 Jun 2002 14:04:16 -0500


I thought this might be useful:
If you want to retrieve a genomic sequence, without installing the
database locally (I can think of several reasons why some people would
like to do this):
The best way is if you have MySQL and Bioperl installed (and Perl
DBD::MySQL)- go install ensembl API as described in the tutorial:
http://www.ensembl.org/Docs/ensembl_tutorial_28.pdf. The access to the
database is pretty straightforward and it has pretty rich and nice
annotation. Also if you use already Bioperl it will look pretty
familiar. As an alternative you can use directly MySQL (as described in
the tutorial).
Another option is to query the NCBI server and then strip the HTML, find
the regions by the genbank record (no sequence inside) and query again
the server for the fasta file (sequence only- download to a LargeSeq
object) and the get the sequence region you need. I have written such a
parser to get a gene with a certain genomic context, but it is still
pretty ugly. Anyway if you need something like this you can write your
own, or I can send you mine and you will refine it if you wish to.
Cheers
Stefan
UT/ORNL