[Bioperl-l] Starting to use Bioperl

Gordon Haverland ghaverla at materialisations.com
Sat May 12 23:26:19 UTC 2018


On Fri, 11 May 2018 10:12:04 +0100
Peter Cock <p.j.a.cock at googlemail.com> wrote:

> This year the NCBI started offering this data in a slightly newer
> format:
> 
> https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/
> 
> Most of these files are plain text tables using the rather
> unusual field separator of "\t|\t" (tab, pipe, tab), but the
> README files are very comprehensive.

I found this, and got the tarball version.  I thought the README said
it was \t|\n?  Doesn't matter, it's an unusual separator.

There are Perl scripts in the tarball.  I think I read there, that if
the NCBI dump files are older than 180 days, it downloads newer
versions?  Or maybe I was reading something else.

In any event, the BioSQL site at Github doesn't see much updating.  It
looks to me like all the activity is in biopython, so I downloaded that
for my Devuan machine.

> This is in Python, but my most recent occasion to process
> this data was to make a cut-down version of the NCBI
> taxonomy as part of constructing a small test dataset:
> 
> https://github.com/abaizan/kodoja/blob/master/test/taxonomy/filter_taxonomy.py

I seen this at Google, you labelled something a bug.

In looking for the new_taxdump thing (via Google), another Perl script
about findingSpeciesFromGenus (or something like that) popped up.  So,
I have a few things of source to look through.

Thanks.

Gord



More information about the Bioperl-l mailing list