Bioperl: Parsing Medline Docs

Paul Gordon pgordon@cs.dal.ca
Tue, 28 Sep 1999 14:34:31 -0300 (ADT)


> The MEDLARS format is documented in the MEDLINE chapter of the NLM's
> Online Services Reference Manual at: 
> http://www.nlm.nih.gov/pubs/osrm_nlm.html
> 
> It's quite hard to find otherwise.
> 
> A while ago, I started working on a python module to parse it, but never
> quite finished...
> 
> Jeff
> > Hi
> > Does any one know of perl scripts/modules that parse Medline documents
> > in Medlars format?  Surely someone has attempted this before.

It seems to be a common trend...  I too started to write a MedlarsII
parser in Perl, but didn't finish it because we ended up not using the
data.  

There is a distinction to be made here for those who aren't familiar.
Medline as you normally see it is nice and text-based, but NLM
ships it to you in MedlarsII format, which is a very complicated binary
format with embedded EBCDIC text.  If this is what you have, I have never
seen a Perl parser for it (though maybe Boulder::Medline has Medlars
support, I haven't checked it out).

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================