[Bioperl-l] Species name problems with bioperl-db

Roy Chaudhuri roy at colibase.bham.ac.uk
Thu Jan 25 22:19:00 UTC 2007


Hi.

I'm having problems similar to those discussed in this thread:
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/13766

and in bug 2092.

I'm using the 1.52 release code, that includes Sendu's fix for the 
problem, but I'm still getting errors with some species names. The 
process seems to fall foul of line 167 of Bio::Species, which checks 
that the lineage starts at the species in question.

Here are some of the error messages I'm getting:

Uniprot entry P21215:
MSG: The supplied lineage does not start near 'Clostridium sp.' (I was 
supplied 'sp. ATCC29733 | Clostridium | Clostridiaceae | Clostridiales | 
Clostridia | Firmicutes | Bacteria')

Uniprot entry Q98AM7:
MSG: The supplied lineage does not start near 'Rhizobium loti' (I was 
supplied 'loti | Mesorhizobium | Phyllobacteriaceae | Rhizobiales | 
Alphaproteobacteria | Proteobacteria | Bacteria')

Genbank entry CP000026:
MSG: The supplied lineage does not start near 'Salmonella enterica 
subsp. enterica serovar Paratyphi A str. ATCC 9150' (I was supplied 
'paratyphi | Salmonella | Enterobacteriaceae | Enterobacteriales | 
Gammaproteobacteria | Proteobacteria | Bacteria')

It is easy to see why problems are arising- the species name used in the 
GenBank/Uniprot entry is sometimes a synonym of that in the supplied 
lineage, rather than an exact duplicate. Is the check on line 167 really 
necessary? Or at least could the throw be changed to a warn?

Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk



More information about the Bioperl-l mailing list