[Bioperl-l] Bug in SeqIO/swiss.pm

rfsouza at cecm.usp.br rfsouza at cecm.usp.br
Wed Jan 5 16:08:45 EST 2005


Hi,

I have found what might be a bug in the SeqIO parser for Swissprot flat files
(swiss.pm). The error message printed is

Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl
Gezira virus - [Okra-S << HERE hambat]$/ at
/home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/SeqIO/swiss.pm
line 985, <GEN0> line 10.

and the Swissprot entry is pasted below. The problem is a match operator
at line 985:

984 #if the organism belongs to taxid 32644 then no Bio::Species object.
985 return if grep { /^$binomial$/ } @Unknown_names;

I managed to fix this and have swiss.pm to parse the entire Uniprot release
2.1 by adding this line

$binomial =~ s/(\[|\])/\\$1/g;

just before line 985. Would anybody like to add this fix to the CVS
version of swiss.pm? Since this is the only entry which swiss.pm was not
able to
parse, out of 1520915 entries in Uniprot, I was considering if it is not an
annotation error in Uniprot, violating their own standard...

Greeting and happy new year :).
Robson

#==============

ID   Q8UYF6         STANDARD;      PRT;   258 AA.
AC   Q8UYF6;
DT   01-MAR-2002 (TrEMBLrel. 20, Created)
DT   01-MAR-2002 (TrEMBLrel. 20, Last sequence update)
DT   01-MAR-2004 (TrEMBLrel. 26, Last annotation update)
DE   Coat protein.
OS   Cotton leaf curl Gezira virus - [Okra-Shambat].
OC   Viruses; ssDNA viruses; Geminiviridae; Begomovirus.
OX   NCBI_TaxID=268964;
RN   [1]
RP   SEQUENCE FROM N.A.
RA   Idris A.M., Brown J.K.;
RT   "Molecular analysis of cotton leaf curl virus-Sudan reveals an
RT   evolutionary history of recombination.";
RL   Virus Genes 0:0-0(2002).
DR   EMBL; AY036008; AAK64541.1; -.
DR   GO; GO:0019028; C:viral capsid; IEA.
DR   GO; GO:0005198; F:structural molecule activity; IEA.
DR   InterPro; IPR000650; Gem_coat_AR1.
DR   InterPro; IPR000263; GV_A/BR1_coat.
DR   Pfam; PF00844; Gemini_coat; 1.
DR   PRINTS; PR00224; GEMCOATAR1.
DR   PRINTS; PR00223; GEMCOATARBR1.
DR   ProDom; PD000901; Gem_coat_AR1; 1.
KW   Coat protein.
SQ   SEQUENCE   258 AA;  29778 MW;  6FB1960A9D8763DD CRC64;
     MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY
     RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK
     VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV
     LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN
     PVYATLKIRI YFYDSVSN
//




More information about the Bioperl-l mailing list