[Biopython] processing XML files in Biopython

David Suárez Pascal david.suarez at yahoo.com
Mon Jun 6 14:37:43 UTC 2011


Sheila,
I don't think you have to deal with XML files. Indeed I tried your code and
what I detected was that Entrez.read already parsed the data.
What I get when I try your code is a list:
>>> type(record)
<class 'Bio.Entrez.Parser.ListElement'>

which contains a dict with the following keys:
>>> record[0].keys()
[u'GBSeq_moltype',
 u'GBSeq_source',
 u'GBSeq_sequence',
 u'GBSeq_primary-accession',
 u'GBSeq_definition',
 u'GBSeq_accession-version',
 u'GBSeq_topology',
 u'GBSeq_length',
 u'GBSeq_feature-table',
 u'GBSeq_create-date',
 u'GBSeq_other-seqids',
 u'GBSeq_division',
 u'GBSeq_taxonomy',
 u'GBSeq_comment',
 u'GBSeq_source-db',
 u'GBSeq_references',
 u'GBSeq_update-date',
 u'GBSeq_organism',
 u'GBSeq_locus']

If you got the same response, then you can just do:
>>> record[0]['GBSeq_locus']
'NP_997807'

I hope this helps.

David

2011/6/6 Sheila the angel <from.d.putto at gmail.com>

> Hi All,
>
> I am new to BioPython. I have simple question 'How can I process XML files
> in Biopython?'
> For example I have NCBI Reference Sequence ID 'NP_997807.1'
> I want to download the 'xml' file and want to extract certain information
> (e.g. GeneID, amino acid length etc.).
> To download the file I did
>
> from Bio import Entrez
> handle = Entrez.efetch(db="protein", id= "NP_997807.1", retmode="xml")
> record = Entrez.read(handle)
> handle.close()
>
> Now I have no clue how to extract certain information (like GeneID) :(
> plz help
>
> --
> Cheers
>
> Sheila d. Angela
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list