[Bioperl-l] Question about embl format

Heikki Lehvaslaiho heikki at ebi.ac.uk
Thu Apr 17 18:12:03 EDT 2003


Lincoln,

The feature table documentation states that:

"Component names may be no more than 20 characters long  (Feature keys
15, Feature qualifiers 20)  and must contain at least one letter."

http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#naming_conventions

It also says that only certain keys are accepted. The parser used by the
EBI EMBL database group ignores all unknown keys. Since you are using
your own keys, you are free to do whatever you want. Incidentally, it
looks like that no-one is using asterisk to start private key names.

In my opinion, all sane parsers should read all the valid name
characters [a-zA-Z0-9*'_-] to build the key. Bioperl seems to do the
right thing.

I do not know what is best. Try with long keys, and wait and see if
someone complains?

	-Heikki

On Thu, 2003-04-17 at 14:45, Lincoln Stein wrote:
> Hello,
> 
> The "sequence dumper" plugin for the Generic Genome Browser has been crashing 
> when making an EMBL dump of a particular region of the worm genome.  The 
> issue is a "Transposon_insertion" feature, which exceeds the 15 character 
> limit for EMBL feature tags.  If I remove the Bio::SeqIO::embl check for this 
> limit, I get an output that looks like this:
> 
> ...
> FT   Transposon_insertion complement(13204595..13204596)
> FT                   /score=""
> FT                   /group="cxP4108"
> FT                   /id=7726466
> FT                   /method="Transposon_insertion"
> FT                   /source="Allele"
> FT                   /phase=""
> FT   repeat          13204572..13204602
> FT                   /score=80
> FT                   /group=""
> FT                   /notes="loop 283"
> FT                   /id=7775180
> FT                   /method="repeat"
> FT                   /source="inverted"
> FT                   /phase=""
> FT                   /note="score=80"
> ...
> 
> My question is whether this is acceptable embl format?  If not, I will have to 
> truncate feature type names at 15 characters, but this is going to lose 
> information.
> 
> Lincoln
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki at ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list