[Bioperl-l] sequence proxy server

Smithies, Russell Russell.Smithies at agresearch.co.nz
Mon Apr 16 21:28:11 UTC 2012


I assume you've done the obvious thing and tried downloading from your local mirror?
ftp://biomirror.aarnet.edu.au/biomirror/
Or ours:
http://www.biomirror.org.nz/

If you have a large number of requests it's almost always faster to download the refseq files and extract locally rather than run queries against NCBI via the web.

--Russell 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Kevin Murray
> Sent: Saturday, 7 April 2012 1:50 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] sequence proxy server
> 
> Hi all,
> 
> I'm an undergrad student in molecular biology at the ANU in Australia, and
> my research projects are becoming increasingly bioinformatics heavy. The
> latest one has involved quite a large amount of sequence retrieval from
> GenBank and GenPept. The download speed to Australia from NCBI's servers
> is rather slow, and i've been thinking about how we can improve this. One
> solution would be to use Bio::DB::Flat with GenBank sequences on a local
> computer. However, in a situation where there are multiple people in a lab
> doing bioinformatics, it seems to me a bit of a waste to have the entire
> genbank/genpept database, or even the relevant sections thereof, on each
> computer. So, i though about writing a "sequence proxy" cgi script, and a
> corresponding module, which would work a bit like this:
> 
> The user calls Bio::DB::SeqProxy::GenBank as they would Bio::DB::GenBank,
> with the exception that a parameter for the address of the sequence proxy
> server is required.
> The module then sends a request similar to that sent to NCBI's servers  by
> calling Bio::DB::GenBank->get_Seq_by_x() to the sequence proxy server I
> believe all requests go to the efetch page now (please correct me if I'm
> wrong, i have read the relevant bioperl module code but not thoroughly), so
> the CGI script on the sequence proxy would take arguments in a similar
> fashion to make writing the client side module easier.
> The CGI script would use a Bio::DB::Flat database, or an interface to an SQL
> database to determine if the required sequence is stored locally. (as a aside,
> i'd like your thoughts on Bio::DB::Flat vs Bio::DB::Sql or similar) If the
> sequence exists locally, it would be returned to the user, either as plain text,
> or inside an XML container (see below).
> If not, it would be retrieved from the remote database using the relevant
> Bio::DB module, and returned.
> 
> The sequence would either be returned as the relevant sequence format
> (which would default to GenBank format) in plain text, or as an XML
> document similar to:
> 
> <result>
> <successful>1</successful>
> <sequence>___YOUR GENBANK FILE HERE___</sequence> <source>Local
> Database</source> </result> The aim of the xml document would be to
> simplify handling of server errors and allow for the specification of other
> metadata such as which database the sequence came from.
> 
> 
> Firstly, I'd like to know if this sounds feasible, and if so, if someone is already
> working on something similar? I don't want to reinvent the wheel.
> Secondly, I'd like to ask for your comments and advice. Being reasonably new
> to bioperl (started using bioperl about 6 months ago, but I've been coding in
> various languages for 8 years) I don't expect to have considered things that
> may seem obvious to a more experienced bioperl-er, so please be as brutally
> constructive in your criticism as you see fit =].
> 
> I know this is alot of questions, so thanks in advance for your help.
> 
> Cheers, and a happy Easter to those who celebrate it.
> 
> Regards
> Kevin Murray
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list