[Open-bio-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL

Chris Fields cjfields at illinois.edu
Thu Jan 21 08:34:12 EST 2010


Peter,

The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:

http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm

This is where the text output is derived from.  It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable.  We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. 

If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).  

chris

On Jan 21, 2010, at 6:33 AM, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter



More information about the Open-Bio-l mailing list