[Bioperl-l] Bio::DB::GenPept server error

Ewan Birney birney at ebi.ac.uk
Mon Feb 3 07:53:18 EST 2003

On Mon, 3 Feb 2003, Neil Saunders wrote:

> Dear all,
> Still having problems with Bio::DB::GenPept, get_Stream_by_id().
> I have a test file, containing 3 UIDs separated by commas.  If I read in
> this file and assign it to an array:
> open IN,'test.file';
> @array=<IN>;
> then my code works fine and retrieves what I want using \@array.
> Now I move to my real file, which contains about 112 000 UIDs.  Same
> procedure and I get:
> MSG: WebDBSeqI Request Error:
> 500 (Internal Server Error) short write
> Is this because the server doesn't like such a large file, or some other
> problem?  Should I even be using this module to retrieve 112 000
> records?  I would get them using fastacmd from a local nr database, but
> the required -i option seems to be broken (gives duplicate records).

Getting 112 000 records over the web is going

  (a) take a while
  (b) be horribly inefficient
  (c) do nasty things to the webserver

The right thing to do here is to download the section of embl/genbank,
reformat to to Fasta file if you only want the sequence and want to save
space and then index with Bio::Index::Fasta or Bio::Index::Genbank or
whatever format you have decided on.

Then you will be able to pull sequences out to your hearts content. Spare
a thought for teh NCBI web servers - in no way should they try to honour a
request to pull out 100,000 sequences....

> thanks for any pointers,
> Neil
> --
>  School of Biotechnology and Biomolecular Sciences,
>  The University of New South Wales,
>  Sydney 2052,
>  Australia
> http://psychro.bioinformatics.unsw.edu.au/neil/index.php
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list