[Bioperl-l] Oddness in Bio::SeqIO

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Tue May 9 23:42:29 UTC 2006


Chris,

> I noticed an odd thing with SeqIO parsing of species lines (those
> problematic bacterial tax names again).  I have a simple script that runs
> output to STDOUT to generate a list of hits.  Here's what I get:

> Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
> K-10 <--

In this case,

Genus = Mycobacterium
Species = avium
Subspecies = paratuberculosis
Strain = K-10

which suggests that BioPerl is trying to handle something special, 
because the 'subsp.' is gone?

Here's the pertinent parts of the Genbank file
(apologies for the wrapping):

LOCUS       NC_002944            4829781 bp    DNA     circular BCT 
18-JAN-2006
DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete 
genome.
SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
   ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
             Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
             Corynebacterineae; Mycobacteriaceae; Mycobacterium; 
Mycobacterium
             avium complex (MAC).

                      /organism="Mycobacterium avium subsp. 
paratuberculosis K-10"
                      /strain="K-10"
                      /sub_species="paratuberculosis"


> Most (but not all) of the strain numbers get repeated (marked with arrows).
> This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
> (and thus passed through Bio::SeqIO).  Anyone seen this before?

The problem is mentioned in the wiki so it must have come up before?
http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data

I also deal with Bacteria mainly, and should also look into this. I 
haven't been using the genbank headers directly, only the features, so i 
never came across this.

Another thing which may crop up is when no Species has been allocated 
yet but the genus is known (or something like that). In that case the 
name is written as "Genus spp." eg.  	 Gallibacterium spp.

--Torsten





More information about the Bioperl-l mailing list