Bioperl: Re: feature parsing for GenBank/EMBL

Hilmar Lapp hlapp@gmx.net
Mon, 08 May 2000 12:32:23 +0200


There's a documentation of the feature table format at the NCBI website (URL
http://www.ncbi.nlm.nih.gov/collab/FT/index.html). Locations in particular are
documented at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#location

A couple of these are not covered (i.e., fail, but ignored after a warning)
presently by the feature table parsing methods, and some are even not really
covered by the SeqFeatureI interface, like (quoted from the URL)

(23.45)..600       Specifies that the starting point is one of the bases be-
                   tween bases 23 and 45, inclusive, and the end point is 
                   base 600 

(122.133)..(204.221) The feature starts at a base between 122 and 133, in-
                     clusive, and ends at a base between 204 and 221, in-
                     clusive

145^177            Points to a site between two adjacent bases anywhere 
                   between bases 145 and 177 


order(location,location, ... location) 
     The elements can be found in the specified order (5' to 3' direction),
but nothing is implied about the reasonableness about joining them 

J00194:(100..202)  Points to bases 100 to 202, inclusive, in the entry (in 
                   this database) with primary accession number 
                   'J00194'

Do you see a point in having 'wobble' information for start and end in the
SeqFeatureI interface, or in an implementation module?

I think just saying we don't let us govern by GenBank parsing issues
(actually, it's a joint definition for GenBank/EMBL/DDBJ) may not be the best
answer, because the feature annotation rules obviously reflect the biological
knowledge we have at present, and I think that's what we are trying to model,
at least to some extent.

Just a few thoughts off the top of my head.

Cheers,

	Hilmar
-- 
-----------------------------------------------------------------------
Hilmar Lapp                                      email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics                   phone: +43 1 86634 631
A-1235 Vienna                                      fax: +43 1 86634 727
ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
     Mountain Biking (hard tail, hard fork: feel the trail)
-----------------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================