[Biopython] Pubmeddata XML parsing with Entrez .fetch and .read

Peter biopython at maubp.freeserve.co.uk
Thu Jul 15 13:50:41 UTC 2010


On Thu, Jul 15, 2010 at 2:36 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> --- On Thu, 7/15/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> This is why I was suggesting to Michiel that we override
>> than seeing the __repr__ for our subclassed objects, so
>> that rather things like this:
>>
>> ['btp163', '10.1093/bioinformatics/btp163', '19304878',
>> 'PMC2682512']
>>
>> we get something like:
>>
>> ListElement(['btp163', '10.1093/bioinformatics/btp163',
>> '19304878', 'PMC2682512'], attributes={...})
>>
>> On deeper reflection, the trouble with this is that all the
>> children within the list would get longer, so the full
>> representation of a ListElement (or  any container) would
>> become very very long - swamping the console output.
>
> The attributes are almost always only a small fraction of the Entrez XML file.
> So while it's true that each element gets larger, it's a small relative increase.
> The elements that are very long after adding the attributes are also very long
> without the attributes. So I am in favor of your original suggestion. If there are
> no other suggestions, I'll make the change in Bio.Entrez over the weekend
> (or feel free to do so before that).

Maybe you can keep the basic data type repr if there are no attributes,
and only expand it if needed? It would be inconsistent but would keep the
total string length down.

Peter



More information about the Biopython mailing list