[Bioperl-l] Species name validation problem

David Waner dwaner at scitegic.com
Sat Mar 25 00:19:04 UTC 2006


I have found that Bio::Seq->new() throws exceptions on some "species"
names containing special characters, or consisting of a single letter,
e.g:

	SwissProt: POLN_ONNVG	O'nyong-nyong virus
	SwissProt: FIBP_ADE1H	Human adenovirus 15/H9
	SwissProt: POLG_FMDVZ	Foot-and-mouth disease virus (strain
A22/550 Azerbaijan 65)
	SwissProt: RIR1_BHV1C	Bovine herpesvirus 1.1
	SwissProt: SODF_METJ	Methylomonas J
	GenBank: AJ416726		Stylosanthes aff. calcicola
	
It seems that the regex in validate_species_name() is too restrictive,
but I can't find a way to turn off validation without editing bioperl
modules.  There has been some recent discussion of this issue on the
mailing list (see below).  Does anyone know if or when a
-validate_species option to Bio::Seq->new() will be added? Or should I
just propose the code change?

Thanks,
  David Waner


> Stefan Kirov skirov at utk.edu 
> Wed Sep 21 08:46:05 EDT 2005 
> 
>
------------------------------------------------------------------------
--------
> 
> Thanks for the great answer Hilmar!
> I would prefer to have some kind of a check if the user wishes so. For

> example Entrezgene file contains some HTML tags in some entries
species 
> names which is good to know.
> I will put an option -validate_species in the constructor to turn the 
> check on and off. Maybe a species filter can be of some use as well. 
> though you can just select the correct file from the NCBI site....
> Thanks again!
> Stefan
> 




More information about the Bioperl-l mailing list