[Bioperl-l] Bio::DB::GenPept server error

Mon Feb 3 07:53:18 EST 2003

On Mon, 3 Feb 2003, Neil Saunders wrote:

> Dear all,
>
> Still having problems with Bio::DB::GenPept, get_Stream_by_id().
>
> I have a test file, containing 3 UIDs separated by commas.  If I read in
> this file and assign it to an array:
>
> open IN,'test.file';
> @array=<IN>;
>
> then my code works fine and retrieves what I want using \@array.
>
> Now I move to my real file, which contains about 112 000 UIDs.  Same
> procedure and I get:
>
> MSG: WebDBSeqI Request Error:
> 500 (Internal Server Error) short write
>
> Is this because the server doesn't like such a large file, or some other
> problem?  Should I even be using this module to retrieve 112 000
> records?  I would get them using fastacmd from a local nr database, but
> the required -i option seems to be broken (gives duplicate records).
>

Getting 112 000 records over the web is going

  (a) take a while
  (b) be horribly inefficient
  (c) do nasty things to the webserver

The right thing to do here is to download the section of embl/genbank,
reformat to to Fasta file if you only want the sequence and want to save
space and then index with Bio::Index::Fasta or Bio::Index::Genbank or
whatever format you have decided on.

Then you will be able to pull sequences out to your hearts content. Spare
a thought for teh NCBI web servers - in no way should they try to honour a
request to pull out 100,000 sequences....

> thanks for any pointers,
> Neil
> --
>  School of Biotechnology and Biomolecular Sciences,
>  The University of New South Wales,
>  Sydney 2052,
>  Australia
>
> http://psychro.bioinformatics.unsw.edu.au/neil/index.php
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>