[Open-bio-l] GenBank and EMBL - join(complement(...)) vs complement(join(...))
Chris Fields
cjfields at illinois.edu
Sat Jan 9 02:54:41 UTC 2010
On Jan 8, 2010, at 11:33 AM, Peter wrote:
> Hi all,
>
> Currently Biopython reads both GenBank and EMBL files, and write GenBank.
> I'm looking at writing EMBL files too - and wanted to see if any of you knew
> anything definitive on join(complement(...)) vs complement(join(...)) in
> feature location strings.
>
> References:
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
> http://www.genbank.lipi.go.id/docs/FTv6_2.html
>
> Both give this in example, two ways for writing the same location:
>
> complement(join(2691..4571,4918..5163)
> Joins regions 2691 to 4571 and 4918 to 5163, then
> complements the joined segments (the feature is
> on the strand complementary to the presented strand)
>
> join(complement(4918..5163),complement(2691..4571))
> Complements regions 4918 to 5163 and 2691 to 4571,
> then joins the complemented segments (the feature is
> on the strand complementary to the presented strand)
>
> This suggests that either form is valid in both GenBank and EMBL
> format files.
>
> Anecdotally, I have observed GenBank uses the first form (which is
> shorter) while EMBL seems to use the second form (which to me is
> logical, if you consider how to represent mixed strand features).
> This seems to fit with this BioPerl wiki page:
>
> http://www.bioperl.org/wiki/BioPerl_Locations
>
> Is there any official documentation regarding this discrepancy that
> I have overlooked? Am I right to think that GenBank and EMBL do
> still use these different forms (any word on if they might
> standardised one way or the other in future?)
>
> What do EMBOSS, BioPerl, etc do in this situation? Do you treat
> these two examples the same on parsing, and use one layout
> when writing GenBank and the other for writing EMBL files?
>
> Peter
I can't recall which of the two BioPerl uses, but if it helps it standardizes on one of them for output but parses both. I think GenBank and EMBL have converged on using the same format, but I'm not absolutely sure on that.
Ironic actually that I can't remember, as I'm the author of the above page and started a discussion about this very subject a while back on the list (in an effort to sort out some issues with BioPerl locations).
chris
More information about the Open-Bio-l
mailing list