[Biopython-dev] Removing obsolete bits of the Tutorial

Michiel de Hoon mjldehoon at yahoo.com
Sun Jun 29 11:07:36 EDT 2008



>> Then there is the section on "Parser Design" which focuses
on the
>> scanner/consumer model and lists lots of the events these parsers
>> (used to) generate.  I don't think any of this is useful, and
suspect
>> that a lot of it is out of date.  Again, should we just remove this
>> section?
>
> That too. Otherwise, we may inadvertently be causing new
> Biopython developers to write their parsers using this out of
> date parser design, which as far as I know is not being used
> in the major Biopython modules.

It's not entirely out of date - don't SAX based XML parsers do
something similar?
Yes, but there's a difference:

In an XML file, we need to find out where the XML tags are to be able to parse the file. These tags can appear anywhere in the file.

In flat-file text formats, typically different information is stored in different lines. So finding out where one piece of information ends and another one starts becomes trivial. We just need to pull out the lines one by one, and check whether they are a new piece of information or a continuation of the current piece of information.

Especially for simple formats (e.g. Fasta), using a scanner / consumer model can be unnecessarily complex. But also for more complicated formats, parsing line by line can be entirely straightforward. For example, have a look at Bio/SwissProt/KeyWList.py, which currently contains a line-by-line parser and a scanner/consumer parser (which is deprecated). The former takes 26 lines, the latter more than a 100.

--Michiel.




      


More information about the Biopython-dev mailing list