[Biopython] processing XML files in Biopython

Mon Jun 6 13:35:15 UTC 2011

On Mon, Jun 6, 2011 at 2:29 PM, Sheila the angel <from.d.putto at gmail.com> wrote:
> Hi All,
>
> I am new to BioPython. I have simple question 'How can I process XML files
> in Biopython?'
> For example I have NCBI Reference Sequence ID 'NP_997807.1'

Personally I still download the plain text GenBank format file, and
use Biopython's Bio.SeqIO module to parse that.

> I want to download the 'xml' file and want to extract certain information
> (e.g. GeneID, amino acid length etc.).
> To download the file I did
>
> from Bio import Entrez
> handle = Entrez.efetch(db="protein", id= "NP_997807.1", retmode="xml")
> record = Entrez.read(handle)
> handle.close()
>
> Now I have no clue how to extract certain information (like GeneID) :(
> plz help

If you want to use the XML, then the Bio.Entrez.parse() function should
turn it into a nested structure of Python objects (dicts and lists). Or,
there are several built in XML parsers that come with Python, such
as ElementTree. That could be more efficient if you just wanted to
get one or two bits of information like a GeneID.

Peter