[Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri May 14 09:33:48 EDT 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3069





------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2010-05-14 09:33 EST -------
(In reply to comment #11)
> 
> I think we should probably output all IMGT records using the increased
> indentation.  This way there will be no ambiguity and no information loss.  If
> you want to manually "convert" to standard EMBL format, I think the truncation
> makes sense as you proposed it, and we could issue a warning about lost
> information.

I've found a page describing the IMGT file format, and it does say their
feature indent should be 26 (while EMBL files use 21):
http://www.ebi.ac.uk/imgt/hla/docs/manual.html

> 
> I have already notified IMGT regarding the ">" problem, though they seem like
> they will be slow to change it.  It's a very simple fix to the flatfile, and I
> did it manually with regular expressions.  My preference is that we do NOT
> support the backwards notation, as it's clearly wrong.  We'll have them fix
> it. In the meanwhile, I can post my python script that corrects it somewhere
> (maybe as a gist on github) and we can just point people to it in a warning if
> they are using the IMGT parser.
> 
> Regarding the 1. problem, I have not yet told the IMGT people, but I will do
> so shortly.
>

The document I found does not discuss the details of the location, so I would
expect it to follow the same rules as EMBL (and GenBank and the DDBJ), see:
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

I now agree with you it makes sense to treat this as a new format in SeqIO
(i.e. "imgt" rather than "embl"). The actual new code should be minimal too.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list