[Biopython-dev] [Wg-phyloinformatics] GSoC Weekly Update: PhyloXML for Biopython

Brad Chapman chapmanb at 50mail.com
Wed Jun 17 08:41:01 EDT 2009


Hi Eric;
Nice update and thanks again for copying the Biopython development
list on this.

>  * Added to_seqrecord and from_seqrecord methods to the PhyloXML.Sequence
> class
>    -- getting Bio.SeqRecord to stand in for PhyloXML.Sequence entirely will
>    require some more thought

I'm looking forward to seeing how you decide to go forward with
this. For the work I do on a day to day basis, a continual
struggle involves establishing relationships between things to
retrieve more information. For instance, a pair of nodes on a tree
is interesting -- how would I find papers, experiments and other
information associated with those sequences? It seems like Accession
and the ref attribute of Annotation help establish these
relationships.

>  * Test-driven development kind of went out the window this week.

Heh. It happens -- sounds sensible to have a clean up and
documentation week this week; that will also help others who are
interested dig into using it.

>  * The unit tests I do have in place give some sense of memory and CPU usage.
>    For the full NCBI taxonomy, memory usage climbs up above 2 GB with the
>    read() function, which isn't a problem on this workstation but could be for
>    others.

Do you see an opportunity to offer iterating over clades instead of
loading them all into memory for these larger trees? This would
involve lazily loading subclades on request and would limit some
functionality for querying the full tree without loading it all into
memory.

Another option is to offer some pruning ability as a tree is
loading. For instance, if I am loading the whole NCBI taxonomy on a
memory limited computer and only need the Angiosperm flowering plant
part of the tree. In this case, you'd want to throw away all clades
not under the clades of interest.

These are probably fringe cases; just brainstorming some ideas.

Thanks again,
Brad


More information about the Biopython-dev mailing list