[Biopython-dev] Blast parsers and records
Michael Sandford
sandford at ufl.edu
Sat May 29 22:35:18 EDT 2010
I've got a few comments as well:
> 4) The current Blast record stores its information in attributes. If you use Bio.Entrez to parse Blast XML output (Biopython 1.54 contains the necessary DTDs to do so), the information is stored in dictionaries. This has some advantages. For example, it allows you to use record.keys() to find out what the record contains. Ideally, I think that a Blast Record class should inherit from a dictionary.
>
The disadvantage that I can immediately think of using this methodology
is that you lose the ability to have a heavyweight IDE give you
intellisense on what fields are available. Many may say that
intellisense is evil and/or a crutch and I won't really argue that. But
Eclipse is pretty good at giving you options if you type in
"variablename." and then it'll bring up a whole list of attributes and
functions, and I find that handy. Moving to a dictionary based approach
will stop that.
Calling dir(variablename) will enable you to see not only the attributes
available, but the functions as well. That may not be as elegant as
iterating over keys in a dictionary but it is some measure of an
alternative.
It seems to me that there is a fair amount of xml parsing that gets done
in bioinformatics these days. I know that one of the goals of the
project is minimal dependence on external libraries, however, I think
that lxml ( http://codespeak.net/lxml/) might provide some rather
substantial gains in terms of parsing code complexity reduction. I also
think that the lxml/etree representation of parsed data is fairly
reasonable.
Mike
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
More information about the Biopython-dev
mailing list