[Biopython-dev] New: Uniprot XML parser

Peter Cock p.j.a.cock at googlemail.com
Thu Jan 14 18:04:36 EST 2010


On Thu, Jan 14, 2010 at 10:41 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>>
>> By default, copy the "swiss" parser. If that doesn't have the
>> annotation, see if there is anything similar in the "genbank"
>> parser (effectively our reference for rich annotation parsing).
>> If in doubt, for now discard the data with a comment in the
>> code - and then discuss it here.
>>
>> Peter
>>
> I'll take a look at both the swissprot and genbank parsers.
> right now the annotation parsing shema is based on the xml schema.
> eg.
> <comment type="function">
> <text>function text</text>
> </comment>
>
> is parsed in the annotations as:
>
> seqrecord.annotations['comment_function']=['function text']
>

My reasoning is it should be (almost) transparent for
users to switch from parsing the plain text SwissProt
files ("swiss") to the XML form. There are also knock
on implications for saving to BioSQL and file format
conversions e.g. saving as a GenBank protein file
(aka GenPept format).

However, the comment parsing in the plain text "swiss"
format is currently a little simplistic - partly to match
what BioPerl did at the time. We can revisit that as
part of this work.

Peter


More information about the Biopython-dev mailing list