[BioPython] Prosite / Prorule

Michiel De Hoon mdehoon at c2b2.columbia.edu
Mon Nov 19 04:49:23 EST 2007


> Re-writing the parser might be the best choice here. Unfortunately, I have
not
> much experience in writing parsers and also had quite a hard time trying to
> understand what was going on in the Prosite RecordParser... 8-/
> 
> The way I THINK this should be done, is some event-driven mechanism, where
the
> first letters of the scanned line determine what kind of information
follows.
> As compared to iterating over a list (like in the current _scan_fns) and
trying
> to match each entry with the line...
> 
> Could you point me to a parser-implementation which functions as a
'template' of
> good parser design. Maybe I can merge it with the existing
Prosite-Parser...

You could have a look at the function "parse" in 
Bio/KEGG/Enzyme/__init__.py

This is something I wrote for Biopython release 1.44, when it turned out that
the new version of mxTextTools caused the previous Bio/KEGG/Enzyme parser to
fail. At that time, I decided to write the parser from scratch instead of
trying to fix the existing parser (mainly because I didn't understand how the
existing parser worked). The result is a rather straightforward parser.

Now, for KEGG it is possible that one file contains several KEGG.Enzyme
records. The "parse" functions pulls them out one by one (using an iterator).
This is why the function has a "yield", and no "return" in the end. From the
user perspective, it works as follows:

from Bio.KEGG import Enzyme
input = open("my_kegg_file_containing_lots_of_enzymes.txt")
records = Enzyme.parse(input)
for record in records:
    # record is now one Bio.KEGG.Enzyme.Record instance
    # Do something with the record
    print record

For Prosite, I don't know if you can have several Prosite records
concatenated in one file. If you do, you can use the same approach as for the
KEGG parser. If not, I guess a Prosite "parse" function should just return
one record directly. As in:

from Bio import Prosite
input = open("my_prosite_file.txt")
record = Prosite.parse(input)
# record is now one Bio.Prosite.Record instance


--Michiel.




More information about the BioPython mailing list