[Open-bio-l] GenBank and EMBL - join(complement(...)) vs complement(join(...))

Fri Jan 8 12:33:02 EST 2010

Hi all,

Currently Biopython reads both GenBank and EMBL files, and write GenBank.
I'm looking at writing EMBL files too - and wanted to see if any of you knew
anything definitive on join(complement(...)) vs complement(join(...)) in
feature location strings.

References:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
http://www.genbank.lipi.go.id/docs/FTv6_2.html

Both give this in example, two ways for writing the same location:

complement(join(2691..4571,4918..5163)
                          Joins regions 2691 to 4571 and 4918 to 5163, then
                          complements the joined segments (the feature is
                          on the strand complementary to the presented strand)

join(complement(4918..5163),complement(2691..4571))
                          Complements regions 4918 to 5163 and 2691 to 4571,
                          then joins the complemented segments (the feature is
                          on the strand complementary to the presented strand)

This suggests that either form is valid in both GenBank and EMBL
format files.

Anecdotally, I have observed GenBank uses the first form (which is
shorter) while EMBL seems to use the second form (which to me is
logical, if you consider how to represent mixed strand features).
This seems to fit with this BioPerl wiki page:

http://www.bioperl.org/wiki/BioPerl_Locations

Is there any official documentation regarding this discrepancy that
I have overlooked? Am I right to think that GenBank and EMBL do
still use these different forms (any word on if they might
standardised one way or the other in future?)

What do EMBOSS, BioPerl, etc do in this situation? Do you treat
these two examples the same on parsing, and use one layout
when writing GenBank and the other for writing EMBL files?

Peter