[Biopython] can I use the xml parser in biopython on other xml files? how?

Thu Jul 12 11:57:51 UTC 2012

Indeed, it didn't like that.  Using BeautifulSoup seems to work but
not sure how well...

Thanks for the advice, all!

On Thu, Jul 12, 2012 at 5:35 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Jul 12, 2012 at 9:53 AM, Wheaton Little <wheatontrue at gmail.com> wrote:
>> I would like to use the Biopython xml parser, if possible, on google
>> patent xmls:
>>
>> http://www.google.com/googlebooks/uspto-patents-applications-text.html
>>
>> unfortunately, this is what I get:
>>
>>>>> t=open('ipa111229.xml','r').read()
>>>>> import Bio
>>>>> ttt=Bio.Entrez.read(t[:30000])
>>
>> Traceback (most recent call last):
>>   ...
>> TypeError: argument must have 'read' attribute
>>
>> What would I have to do to use the parser on this xml?
>
> In your example, you opened the file and read all the data into a
> string (variable t).
>
> The parser is not expecting a string, but a handle. String objects
> don't have a 'read' method, thus this error message.
>
> You could 'fix' this particular error by doing:
>
> handle=open('ipa111229.xml','r')
> from Bio import Entrez
> ttt=Entrez.read(handle)
>
> However, I doubt this will work as the Entrez parser is intended to be
> used with the NCBI XML files only.
>
> Python comes with several XML libraries in the standard library.
> ElementTree (or cElementTree) is quite popular, but as Brandom points
> out there are also DOM and SAX style parsers.
>
> Peter