[Biopython] Some help to access "hidden" features :-)

Téletchéa Stéphane stephane.teletchea at inserm.fr
Fri Mar 8 14:46:02 UTC 2013


Le 07/03/2013 23:19, Peter Cock a écrit :
>
> Excellent - a self contained example :)

:-)

> That makes it much easier for us to see what you're
> doing and how to help. Thank you.
>
>> In the Uniprot file, there are annotations for the 1AFO model:
>> NMR method, starts at 81 and ends at 120.
>>
>> The corresponding entry in the xml file is:
>>
>> <dbReference type="PDB" id="1AFO">
>> <property type="method" value="NMR"/>
>> <property type="chains" value="A/B=81-120"/>
>> </dbReference>
>>
>> According to the module source code
>> (http://biopython.org/DIST/docs/api/Bio.SeqIO.UniprotIO-pysrc.html),
>> it is possible to access these datas, they are correctly handled:
>>
>>          def  _parse_dbReference(element):
>>              self.ParsedSeqRecord.dbxrefs.append(element.attrib['type'] + ':' + element.attrib['id'])
>>              ...
> As you will have seen, the SeqRecord's dbxrefs does get
> populated with the key information - but this is (based on
> usage in other file formats) a very simple list of strings.
>
> Right now the extra information *is not returned*, mainly as
> it doesn't naturally map onto the existing SeqRecord model.

OK, I was suspected this, but this is confirmed, thank you.

> A little later in that method you'd have seen a comment:
> "TODO - How best to store these, do SeqFeatures make sense?"
> and the following lines created a SeqFeature object, but never
> add it to the returned SeqRecord. Elsewhere the UniProt
> file does have things we store as SeqFeature objects - so
> doing this for the database reference information is a bit
> odd. Perhaps we'd be better off following the approach
> used for references in GenBank files instead? I'm unclear
> what is best (partly since I don't use these bits of data).
>
> What do you think the parser should do with this data?
Bah, parse all :-) Seriously, this is very nice that a lot of fields are
properly parsed, I'm not involved enough to give my opinion on this,
I assume this was more a comment to say "we should may be post in
elsewhere, but after a second reading I understand better.


> [Note that in this situation you might be better off using one
> of the Python standard library modules to work with the XML
> directly (e.g. ElementTree or cElementTree) if you need
> all the details in the UniProt XML file which are not yet
> handled in the conversion to a SeqRecord object.]

Yes, this is what I will do, these informations were only factual in 
reality,
I was planning to retrieve this information by other means too.

>
> Regards,
>
> Peter

Thanks a lot for the rapid answer, and keep up the good job on biopython,
it is really appreciated.

Stéphane

-- 
Equipe DSIMB - Dynamique des Structures et
des Interactions des Macromolécules Biologiques
INTS, INSERM-Paris-Diderot UMR-S665
6 rue Alexandre Cabanel - 75739 Paris cedex 15- France
Tél : +33 144 493 057
Fax : +33 147 347 431
http://www.dsimb.inserm.fr / http://steletch.free.fr





More information about the Biopython mailing list