[Biopython] can I use the xml parser in biopython on other xml files? how?

Peter Cock p.j.a.cock at googlemail.com
Thu Jul 12 09:35:09 UTC 2012


On Thu, Jul 12, 2012 at 9:53 AM, Wheaton Little <wheatontrue at gmail.com> wrote:
> I would like to use the Biopython xml parser, if possible, on google
> patent xmls:
>
> http://www.google.com/googlebooks/uspto-patents-applications-text.html
>
> unfortunately, this is what I get:
>
>>>> t=open('ipa111229.xml','r').read()
>>>> import Bio
>>>> ttt=Bio.Entrez.read(t[:30000])
>
> Traceback (most recent call last):
>   ...
> TypeError: argument must have 'read' attribute
>
> What would I have to do to use the parser on this xml?

In your example, you opened the file and read all the data into a
string (variable t).

The parser is not expecting a string, but a handle. String objects
don't have a 'read' method, thus this error message.

You could 'fix' this particular error by doing:

handle=open('ipa111229.xml','r')
from Bio import Entrez
ttt=Entrez.read(handle)

However, I doubt this will work as the Entrez parser is intended to be
used with the NCBI XML files only.

Python comes with several XML libraries in the standard library.
ElementTree (or cElementTree) is quite popular, but as Brandom points
out there are also DOM and SAX style parsers.

Peter



More information about the Biopython mailing list