[Biopython] GSoC - BioPython and PyCogent Interoperability

Brad Chapman chapmanb at 50mail.com
Thu Apr 8 12:39:53 UTC 2010


Singer;
Thanks for the introduction and initial project plan. Glad that you
are interested. I'll try to tackle a few of the specific points
Peter has not already talked about, and suggest some specifics for
the application.

> Questions:
> What does it mean by BioPython's acquired sequences? I can't seem to
> find out what or where information about "acquired sequences" is.
> Thus, I do not discuss anything about it in my current proposal.

Following up on what Peter mentioned, what we're trying to say there
is to use the results from step 1 (interoperability) to create
unique workflows that use both Biopython and PyCogent. This is a
suggested workflow to utilize some of the strengths of both
packages.

> For the creation of workflows, do there already exist use and test
> cases for this or would I be best off looking for ones in papers and
> trying to mimic them? Right now, I have an example paper where the
> interoperability would have been helpful.

Yes, that is exactly the right approach. The ideas we've suggested
are just brainstorming; please select workflows that are interesting
to you.

> My current proposed schedule:
> 
> For Bio Python and PyCogent interoperability.
> Week 1: Familiarization with the code and soliciting requests. While
> what seems intuitive to me might not seem so to others. It would be
> best to spend this time to determine a group of people who would
> highly benefit from the interoperability and ask them for what they
> would look for. For example, would they rather use one, save the data,
> and use the other. Would they want to use them directly. Basically, I
> want to get a good idea of how this code will be used before making my
> own decisions on how I think people will use it. Also important here
> is to create sets of data which can be used later on the process.

All of this type of non-coding work should be done in the community
bonding period, from April 26th to the start of coding. When week 1
hits, you want to be ready to code. See the timeline for more
specific information on dates:

http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/timeline

> Week 5: Familiarize with phyloXML and make interoperable with
> PyCogent. phyloXML has already been added with BioPython. Making
> phyloXML work with PyCogent could be based on how it was adapted for
> BioPython. Clear risks here include problems with making sure that the
> API for phyloXML in PyCogent gives an intuitive interface to use
> phyloXML.

Again, all of the non-coding activities should be moved to before
the actual coding period. In your timeline you want to focus on code
deliverables for each week. Of course there will be learning and
reading during the program, but you want to be sure to have a code
centric focus.

> Week 6 and 7: Adapt PyCogent to query genomics databases. Currently
> there is at least some support for PyCogent to query ENSEMBL. It seems
> like it would be useful to query other genomics databases such as
> Entrez of NCBI. Unfortunately, it seems like NCBI only has PERL
> queries into their MySQL database. Ideally, if everything previously
> has been alright, the conversion of PyCogent to BioPython forms shoudl
> already be accounted for.

Following up on your discussion with Peter, you should think about
some workflows that use Biopython Entrez queries and PyCogent
Ensembl queries to answer interesting questions that could not be
done with either. This should help to focus your ideas on integration 
and workflows, as opposed to implementing new functionality.

> Week 8-12: Slip days and additional features. The initial set of use
> cases will surely expand and this is extra time to allow for those use
> cases to be accounted for.

You need to continue your detailed project plan for the entire
period. See the examples in the NESCent application documentation to 
get an idea of the level of detail in accepted projects from previous years:

https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2010#When_you_apply
http://spreadsheets.google.com/pub?key=puFMq1smOMEo20j0h5Dg9fA&single=true&gid=0&output=html

Practically, applications are due tomorrow, so you should have a
submission sent in to OpenBio through the GSoC interface
(http://socghop.appspot.com).

Hope this helps,
Brad



More information about the Biopython mailing list