[Biopython-dev] Uniprot XML parser on TrEmbl
Peter
biopython at maubp.freeserve.co.uk
Fri Nov 12 12:00:42 UTC 2010
On Fri, Nov 12, 2010 at 11:05 AM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>> That's good - but I thought the patch broke the unit test so I reverted it
>> last night. I'll double check this.
>>
>
> yes I've seen it in github, can you fix it?
>
Probably. I'll make time to look at it before the Biopython 1.56 release
(which is unlikely to happen this week, delayed by the identification of
some problems running under Jython on Windows).
>> On the other hand, you only download it once, and will probably only
>> decompress it once (although you can parse gzipped files from within
>> python if you want to), but you will parse it many times.
>>
>
> well, if your looking to performance, you're not scanning a 62Gb file
> each time you search for an entry, but your going to index it. the of
> course it depends on what you are doing... but, given the monthly
> release, maybe you're downloading and decompressing (or parsing
> a compressed file) once a month.
Yeah, it depends.
>> My point is it probably could be made faster (if anyone wanted to spend
>> the time), but it is fast enough already to be useful, and worth having
>> in Biopython :)
>
> Yes, I hope it can be made faster, but I have no idea about this, since
> the process is very straightforward. I did not make any profiling of the
> parser, so I cannot exclude some bottleneck.
That would be worth while at some point.
> the only obvious speed up would be using the multiprocessing library in
> multi-cpu system, but I've never seen it used in biopython.
We haven't been able to due to the Python 2.4 requirement, but
I know of people using Biopython and multiprocessing together.
> It should be really easy to implement, and maybe we can think about
> it after python 2.4 support is dropped. as far as i know, multiprocessing
> is included in python 2.6 and available in python 2.5.
Personally I'd try profiling the current single threaded code before
going to multiprocessing.
> On the other hand, Biopython has the fastest uniprot XML parse
> among Bio* projects and (to my knowledge) the fastest public
> parser on the planet ;) I bet Uniprot guys have their parser...
Which of the other Bio* projects have a Uniprot XML parser?
(Or was that intended as a joke?)
Peter
More information about the Biopython-dev
mailing list