[Biopython] can I use the xml parser in biopython on other xml files? how?

Thu Jul 12 09:35:09 UTC 2012

On Thu, Jul 12, 2012 at 9:53 AM, Wheaton Little <wheatontrue at gmail.com> wrote:
> I would like to use the Biopython xml parser, if possible, on google
> patent xmls:
>
> http://www.google.com/googlebooks/uspto-patents-applications-text.html
>
> unfortunately, this is what I get:
>
>>>> t=open('ipa111229.xml','r').read()
>>>> import Bio
>>>> ttt=Bio.Entrez.read(t[:30000])
>
> Traceback (most recent call last):
>   ...
> TypeError: argument must have 'read' attribute
>
> What would I have to do to use the parser on this xml?

In your example, you opened the file and read all the data into a
string (variable t).

The parser is not expecting a string, but a handle. String objects
don't have a 'read' method, thus this error message.

You could 'fix' this particular error by doing:

handle=open('ipa111229.xml','r')
from Bio import Entrez
ttt=Entrez.read(handle)

However, I doubt this will work as the Entrez parser is intended to be
used with the NCBI XML files only.

Python comes with several XML libraries in the standard library.
ElementTree (or cElementTree) is quite popular, but as Brandom points
out there are also DOM and SAX style parsers.

Peter