[Biopython-dev] IMGT parser (modified EMBL format),
Peter
biopython at maubp.freeserve.co.uk
Tue Aug 24 11:56:57 UTC 2010
Hi all,
The IMGT is the international ImMunoGeneTics information system, a global
reference in immunogenetics and immunoinformatics. They have a sequence
databases, genome database, structure database, and monoclonal antibodies
database.
The IMGT use a variant of the EMBL flat file format with longer feature indents:
http://imgt.cines.fr/download/LIGM-DB/userman_doc.html
http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html
http://www.ebi.ac.uk/imgt/hla/docs/manual.html
Uri and I have been working on extending the SeqIO EMBL/GenBank parser
and writer to support IMGT files too. This uncovered a number of data formatting
issues (e.g. wrong sequence length in ID line, partial feature
locations) and Uri
has been liaising with the IMGT curators to address these. With their latest
(Aug 2010) release, we can now parse the whole file without errors:
http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z
I think this code is now ready to merge - comments welcome:
http://github.com/peterjc/biopython/commits/seqio-imgt
Potentially we could even include this in Biopython 1.55, although it would
be more cautious not to add any new features between the beta and the
final release...
Peter
More information about the Biopython-dev
mailing list