[Biopython-dev] IMGT parser (modified EMBL format),
Uri Laserson
laserson at mit.edu
Tue Aug 24 10:35:01 EDT 2010
Hi all,
I would obviously prefer it to go into the distribution as soon as it is
possible, but I don't want to mess with the releases. The IMGT people said
they'll put a news announcement on their site and a link to biopython once
the code is in the official release.
Uri
On Tue, Aug 24, 2010 at 07:56, Peter <biopython at maubp.freeserve.co.uk>wrote:
> Hi all,
>
> The IMGT is the international ImMunoGeneTics information system, a global
> reference in immunogenetics and immunoinformatics. They have a sequence
> databases, genome database, structure database, and monoclonal antibodies
> database.
>
> The IMGT use a variant of the EMBL flat file format with longer feature
> indents:
> http://imgt.cines.fr/download/LIGM-DB/userman_doc.html
> http://imgt.cines.fr/download/LIGM-DB/ftable_doc.html
> http://www.ebi.ac.uk/imgt/hla/docs/manual.html
>
> Uri and I have been working on extending the SeqIO EMBL/GenBank parser
> and writer to support IMGT files too. This uncovered a number of data
> formatting
> issues (e.g. wrong sequence length in ID line, partial feature
> locations) and Uri
> has been liaising with the IMGT curators to address these. With their
> latest
> (Aug 2010) release, we can now parse the whole file without errors:
> http://imgt.cines.fr/download/LIGM-DB/imgt.dat.Z
>
> I think this code is now ready to merge - comments welcome:
> http://github.com/peterjc/biopython/commits/seqio-imgt
>
> Potentially we could even include this in Biopython 1.55, although it would
> be more cautious not to add any new features between the beta and the
> final release...
>
> Peter
>
--
Uri Laserson
Graduate Student, Biomedical Engineering
Harvard-MIT Division of Health Sciences and Technology
M +1 917 742 8019
laserson at mit.edu
More information about the Biopython-dev
mailing list