[BioPython] Bio.Medline parser

Michiel de Hoon mjldehoon at yahoo.com
Sat Aug 2 13:32:32 UTC 2008




--- On Sat, 8/2/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > 1) Use the key shown in the Medline file instead of
> > the name to store each field.
> > 2) Let the record class derive from a dictionary, and
> > store each field as a key, value pair in this dictionary.
....
> One downside of this is that the user then has to go and
> consult the file format documentation to discover "DA" is the
> entry date, etc.  In some cases the abbrevations are probably
> a little unclear.  I would find code using the current named
> properties easier to read than the suggested dictionary based
> approach which exposes the raw field names.

What I noticed when I was playing with this parser is that it is often unclear which (Biopython-chosen) name goes with which (NCBI-chosen) key. For example, PMID is the pubmed ID number in the flat file. Should I look under "pmid", "PMID", "PubmedID"? (the correct answer is "pubmed_id").

As you mention, the NCBI-chosen keys are often not very informative (who can guess that TT stands for "transliterated title"?). I was thinking to have a list of NCBI keys and their description in the docstring of Bio.Medline's Record class, so users can always find them without having to go into NCBI's documentation.

Another possibility is to overload the dictionary class such that all keys are automatically mapped to their more descriptive names. So the parser only knows about the NCBI-defined keys, but if a user types record["Author"], then the Record class knows it should return record["AU"]. With a corresponding modification of record.keys().

> Also, could you make the changes whiling leaving the older
> parser with the old record behaviour in place (with deprecation
> warnings) for a few releases?

Yes that is possible. Existing scripts will use the parser = RecordParser(); parser.parse(handle) approach. This approach can continue use the same Record class, basically ignoring the fact that it now derives from a dictionary. A deprecation warning is given when a user tries to create a RecordParser instance.

--Michiel.


      



More information about the Biopython mailing list