[BioPython] Bio.Medline parser

Peter biopython at maubp.freeserve.co.uk
Thu Aug 7 18:15:08 UTC 2008


On Thu, Aug 7, 2008 at 3:56 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> If there are no further suggestions, I'll implement the .find() method as described below.
>
> ...
>
>> Thinking it over, I think that having a key and an
>> attribute mapping to the same value is not so clean.
>> Alternatively we could add a .find(term) method to the
>> Bio.Medline.Record class, which takes a term and returns the
>> appropriate value. So record.find("author")
>> returns record["AU"]. This gives a clear
>> separation between the raw keys in the Medline file and the
>> more descriptive names. Also, such a .find method can accept
>> a wider variety of terms than an attribute name (e.g.,
>> "Full Author", "full_author", etc. all
>> return record["FAU"]).
>>
>> --Michiel

When would anyone use the .find() method?  Perhaps if exploring at the
command line.  If you are writing a script, then once you know you
that "FAU" means "Full Author" then you would always just use
record["FAU"] directly.  Maybe it would make sense just to describe
the keys in the docstring, and that would be enough.

On a related point, from the Entrez documentation can the MedLine
records be accessed as either plain text, XML  (or html or asn.1)/
How does the data structure from parsing the XML version with the
Bio.Entrez.read() compare to your ideas for the MedLine plain text
parser?  Maybe we can just deprecate Bio.Medline (i.e. the plain text
parser) in favour of Bio.Entrez (and its XML parser)?
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html

Peter



More information about the Biopython mailing list