[Biopython-dev] Merging Uniprot XML parser?

Peter biopython at maubp.freeserve.co.uk
Wed Nov 3 14:02:48 UTC 2010


On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I've fixed a few issues I felt were holding up merging Andrea's UniProt
> XML parser.
>
> I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed
> into more or less equivalent objects, and that these can be written out
> as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent
> work to support protein EMBL files - which do exist but are rarely used).
>
> This required "fixing" Bug 3026 to cope with long annotation that cannot
> be line wrapper nicely (lots of long URL strings in UniProt XML comments).
> http://bugzilla.open-bio.org/show_bug.cgi?id=3026
> I'm tempted to remove the warning because it is so common... or make
> it use the same text each time so you get warned once.
>
> There are also some additions to the Bio.SeqFeature position classes,
> since SwissProt/UniProt files can have uncertain positions.
>
> Could someone take a look at the code here (a rebased branch), as I'd
> like some independent testing (and better yet, code review):
> http://github.com/peterjc/biopython/tree/uniprot

I've now merged this into the trunk (with a git rebase first so the history
is linear - no branch+merge), and Andrea has agreed to retest it.
Other testing and comments are most welcome.

Peter



More information about the Biopython-dev mailing list