[Bioperl-l] more fuzziness checked in

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 31 Jan 2001 15:33:57 +0000


Jason Stajich wrote:
> 
> more robust fuzzy and split feature handling checked in.
> 
> FTHelper will try and see if start==end, if it does and there is no
> splitlocation delimiter then the code will return just a single number
> representing the location ie
> 
> variation       500
>                 /allele="C"
>                 /allele="T"
> 

I am just back from an one week holiday. I'll catch up with the list
in a day or two.

Jason,

In case you really are going to use the above format, it is not valid
according to The DDBJ/EMBL/GenBank Feature Table Definition. 

The allele qualifier gives a common name of the allele in free text,
e.g.:

	/allele="adh1-1"

In general there is the rule that there should not be identical
feature keys on the same location, but 'variation' is an exception.
When we are dealing with SNPs whe do not generally know which of the
alleles are present in that particular sequence the SNP is mapped to
(unless you want to check the sequence). 

The correct way to represent diallelic variation in  DDBJ/EMBL/GenBank
feature table is to repeat the feature key for each allele and use
/replace qualifier. 

 variation       500
                 /replace="C"
 variation       500
                 /replace="T"
 
It is ugly but that's what they (EMBL database people) told me to do a
few weeks ago when I was writing the to_FTHelper method to SNPs in
EnsEMBL.

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________