[Biojava-l] Ensembl read problems
Keith James
kdj@sanger.ac.uk
26 Nov 2002 10:01:48 +0000
>>>>> "Mark" == Schreiber, Mark <mark.schreiber@agresearch.co.nz> writes:
Mark> Hi - My guess would be that line where it tries to join on a
Mark> negative location. That doesn't seem to make a whole lot of
Mark> sense and suggests to me an error in that record.
Mark> Can any embl experts confirm that?
According to the BNF (excerpts below) this is allowed (location can
contain a signed integer as a coordinate). However, this would then
conflict with the convention of using low/high base bounds (< or >)
and remote locations. The spec does not state which, if any, takes
precedence. That said, I can't recall -ve coords in any entries I've
seen, but I've never explicitly looked for them. My hunch is that they
should be expressed as remote locations (which I think someone
suggested earlier).
location ::= <absolute_location> | <feature_name> |
<functional_operator>(<location_list>)
absolute_location ::= <local_location> | <path> : <local_location>
local_location ::= <base_position> | <between_position> | <base_range>
base_range ::= <base_position>..<base_position>
base_position ::= <integer> | <low_base_bound> | <high_base_bound> |
<two_base_bound>
integer ::= <unsigned_integer> | - <unsigned_integer>
unsigned_integer ::= <digit> | <unsigned_integer><digit>
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
There seems to be a knock-on effect from the first rejected feature
now that the parser error handler allows a broken parse to
continue. Possibly a finally block needs to be added to restore the
location parser's state before the next feature starts.
This is a guess based on eyeballing the code - I haven't found time to
get into this yet. May get there in a few days.
Keith
--
- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -