[Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML / TagTree as XML in BioSQL

Richard Holland holland at eaglegenomics.com
Fri Jan 22 10:51:52 UTC 2010


Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL).

On 21 Jan 2010, at 12:33, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/





More information about the Bioperl-l mailing list