[Biopython-dev] Bio.Motif AlignAce parser

Michiel de Hoon mjldehoon at yahoo.com
Sat Aug 11 04:25:05 UTC 2012


Hi guys,

Looking some more at the parsers in Bio.Motif.

In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.

I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.

The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.

The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.

Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.

This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>> from Bio.Motif.Parsers import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
If we move the parsers one level up, this would be
>>> from Bio.Motif import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.AlignAce.Record object at 0x10058c7d0>
which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.

Any comments, objections?

Best,
-Michiel.



More information about the Biopython-dev mailing list