[Biopython-dev] Building Gene Ontology support into Biopython
Michiel de Hoon
mjldehoon at yahoo.com
Sun Oct 18 08:05:10 UTC 2009
--- On Sun, 10/18/09, Chris Lasher <chris.lasher at gmail.com> wrote:
> I'd like to make GO support in Biopython a reality now.
That would be nice.
> Bioperl has solid GO support. I don't find their code
> straightforward at all; I haven't picked out what component is
> responsible for what task.
To arrive at a good design of a Biopython module, sometimes it helps to write its documentation first, before writing the actual code.
> 2) What is the Biopython stance on introducing third-party
> dependencies?
I think we should avoid them as much as possible. In addition to the additional hassle for users and developers, unforeseen changes in third-party dependencies may break your module.
> What would be the aversion to requiring either NetworkX or igraph
> as a dependency for the GO library.
Are these Python modules or C software? Do NetworkX or igraph have their own third-party dependencies? Do we need the full NetworkX or igraph or just a part of it? In the latter case, assuming that these are open-source software packages, we may simply include the parts we need into Biopython. Also, how far do you get by using NumPy?
> 3) What are parsers written using these days?
Current parsers typically work as follows, assuming that a data file contains exactly one record:
>>> handle = open("mydatafile")
>>> from Bio import SomeModule
>>> record = SomeModule.read(handle)
# record is now a SomeModule.Record object
If one data file typically contains multiple records, use a "parse" function to return an iterator:
>>> handle = open("mydatafile")
>>> from Bio import SomeModule
>>> records = SomeModule.parse(handle)
>>> for record in records:
... # record is now a SomeModule.Record object
> Any pointers to recently written parsers?
Bio.SeqIO.read and parse are good examples. Also you can look at Bio.Medline for a simple parser using this approach.
> I seem to recall Biopython has moved away from Martel
> parsers, correct?
Yes.
> Has anything been done with pyparsing or some other
> parser, or is it strictly manual now?
Not as far as I know.
> Also, I'm welcoming tips on the
> architecture of parsers in general.
See above. Also note that few parsers nowadays use Bio.ParserSupport. This was previously used to implement parsers in Biopython (with parsers, scanners, and consumers). I would avoid Bio.ParserSupport and simply write a straightforward parser using the Python standard library.
> 4) Tying the GO Annotations to a fundamental Biopython data
> structure.
> Any suggestions on this?
A SeqRecord doesn't seem to be appropriate for gene ontology.
How about a Record class specifically for GO?
Also, what should such a class contain?
Best,
--Michiel.
More information about the Biopython-dev
mailing list