[Bioperl-l] retrieve refseq ids from UIDs

Carnë Draug carandraug+dev at gmail.com
Tue Jun 28 21:24:17 UTC 2011


2011/6/28 Smithies, Russell <Russell.Smithies at agresearch.co.nz>:
> It's fairly common for NCBI to return partial or incomplete data, often 1/2 a record is missing or requests will time-out at random.
> If you have a lot of records, it may be better to download all the data from the ftp site then parse it locally. This is what we tend to do if there's more than a few hundred queries. I'd like to point out that it's NCBIs problem, not the BioPerl code at fault. You'll run into the same problems if you use NCBIs Perl API (http://www.ncbi.nlm.nih.gov/books/NBK1058/) directly.

Is there any way to catch this kind of errors? Other than repeat
fetching the data until there's two consecutive results that have the
same result?

> Take a look at the gene2accession, gene2refseq, and gene_info data at ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ and at the tax data ftp://ftp.ncbi.nih.gov/pub/taxonomy/ if you need to decode the taxids without doing web queries.
> It's much easier/faster to download these files, index them, them search rather than do queries against NCBI.

Any module already done written to parse these guys?

Thanks for all your answers,
Carnë




More information about the Bioperl-l mailing list