[Biopython-dev] MMCIF Parser

Lenna Peterson arklenna at gmail.com
Fri Mar 16 18:21:25 EDT 2012


Hi João,

Thanks for bringing Glen into the discussion.

My code is in this github branch: https://github.com/lennax/biopython/tree/ply2

It requires PLY, which I haven't added to setup.py yet. Available
here: http://www.dabeaz.com/ply/

Currently the lex (tokenizer) portion runs fine (~30k lines in 3.6 sec
on my machine). But the yacc (parser) portion hangs on files over ~5k
lines - it's certainly running worse than linear. I'm trying to debug
the problem, but I'm also considering using the approach from the
current MMCIF2Dict module (which uses python to parse).

Re: your question about C, CIF is only moderately complex but I think
the issue is that the files tend to be very long. Lexical analysis can
be a computationally intensive process, so any improvements in
efficiency that C can offer are beneficial. However, I haven't done
any performance comparisons between C (flex/bison) and python (PLY).

But considering Jython etc. haven't implemented the C API yet, I'm
focusing on a pure python implementation.

Lenna


On Fri, Mar 16, 2012 at 6:21 AM, João Rodrigues <anaryin at gmail.com> wrote:
>
> Hi all,
>
> I added Glen from PDBe to the thread. I asked him to have a look to this 'bug' report, his reply is below.
>
> "Had a look bug #2619 and it seems the thread was reignited recently by Lenna Peterson so we'll be keeping an eye on it. In terms of an mmCIF Parser we currently use a parser provided by our RCSB partners.It also has C dependencies and after using it, there is much that could be improved, in particular, we'd also like a pure python implementation"
>
> Id say Lenna to go ahead and keep on the effort on the parser. Maybe you could share the code on github or so to garner some comments and suggestions?
>
> one question though: why is everyone using C for it? i never really used this format so sorry for the ignorance..
>
> Cheers,
>
> João
>
> No dia 9 de Mar de 2012 20:13, "João Rodrigues" <anaryin at gmail.com> escreveu:
>
>> Hi Lenna,
>>
>> First of all, sorry to come so late in the discussion but as I said before, I was in a conference so I didn't really read my email that frequently..
>>
>>
>> The PDBe have their own parsers and I am yet to find out what kind of dependencies and even if it maps to the same SMCRA model we use. I will keep you informed. I sent them an email today and am waiting for the reply. I will eventually bring the discussion here so maybe we can take the best of both parsers. Nevertheless, thanks for the time and effort you are putting, it will surely be put to good use! :)
>>
>>
>> Best,
>>
>> João [...] Rodrigues
>> http://nmr.chem.uu.nl/~joao
>>
>>
>>
>> No dia 9 de Março de 2012 20:00, Lenna Peterson <arklenna at gmail.com> escreveu:
>>>
>>> I am in the process of implementing the formal grammar of CIF in PLY (python lex yacc). The result should be a strict, robust, extensible CIF parser.
>>>
>>> It's going very smoothly, and I plan on continuing it regardless as a learning exercise in lexical analysis.
>>>
>>> Please let me know if PDBe has a robust mmCIF python parser that would make mine redundant.
>>>
>>> Lenna
>>>
>>>
>>> On Mar 9, 2012, at 10:11, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>
>>> > On Fri, Mar 9, 2012 at 3:07 PM, João Rodrigues <anaryin at gmail.com> wrote:
>>> >> Hi all,
>>> >>
>>> >> I was recently in a conference in Heidelberg and I got to know that the
>>> >> PDBe is interested in collaborating with us in building a consolidated
>>> >> Python module for structural bioinformatics. From what I understood they
>>> >> already used our code sometimes.
>>> >>
>>> >> Since there is some movement on the MMCif parser front, maybe it's a good
>>> >> idea to query them and see if they have something implemented already?
>>> >> Asking first not to step on anyone's toes, but it might save time?
>>> >>
>>> >> João [...] Rodrigues
>>> >> http://nmr.chem.uu.nl/~joao
>>> >
>>> > Sounds good - you're one of the experts on the Bio.PDB code now
>>> > after all, so a good person to talk to them.
>>> >
>>> > Peter
>>> >
>>> > _______________________________________________
>>> > Biopython-dev mailing list
>>> > Biopython-dev at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>>



More information about the Biopython-dev mailing list