From 88whacko at gmail.com  Sat Mar  2 11:49:48 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Sat, 2 Mar 2013 17:49:48 +0100
Subject: [Biopython-dev] New contributor
Message-ID: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>

Hello!
My name is Andrea Rizzi and I'm a master's student in computer science and
computational biology. I would be glad to help you developing biopython.
I've used the library quite extensively but I'm mostly familiar with
handling sequences, MSAs and PDB files.

I've read through the small contributing guide on the wiki and on the
tutorial and I thought I could start with something relatively
straightforward like writing/completing some unit tests (if I understood
correctly there's a fairly strong need for them). I've good knowledge of
both git and unittest. Anyway any task is actually fine to me :) .

If you agree I'll try to look for a module that needs some more testing (or
maybe you have one to suggest me), otherwise I could just go to the bug
tracker and try to help out fixing some bugs.

-- 
-- Andrea

From p.j.a.cock at googlemail.com  Sun Mar  3 07:00:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 3 Mar 2013 12:00:25 +0000
Subject: [Biopython-dev] Fwd: GSoC 2013 is ON
In-Reply-To: <20130303112326.GA5638@thebird.nl>
References: <20130303112326.GA5638@thebird.nl>
Message-ID: <CAKVJ-_6DDE8LJ-x3XZXd-hgZ5wo7XYsx_11z6+V3Se9F0p-t3w@mail.gmail.com>

Time to start preparations for Google Summer of Code 2013 :)

---------- Forwarded message ----------
From: *Pjotr Prins*
Date: Sunday, March 3, 2013
Subject: GSoC 2013 is ON

Game on! GSoC 2013 is ON. I am running with the OBF project
administration this year for the Google Summer of code (GSoC).
First and foremost I want to thank Robert Buels and others for making
OBF/GSoC a success in the previous three years! This year, Robert, Chris
Fields and Hilmar Lapp will act as backup administrators.

The deadline for the OBF application for GSoC2013 as a mentoring
organisation is Friday March 29! See

  http://www.google-melange.com/gsoc/events/google/gsoc2013

Similar to previous years, each Bio* project needs to update and add project
ideas on the project's individual OBF wiki page and create links from the
main
OBF page at

  http://www.open-bio.org/wiki/Google_Summer_of_Code

(we will update the main information on that page soon).

So, for each of the OBF projects that wants to do GSoC again this
year:

1. Update the list of project ideas on your project's GSoC page (BioPython,
   BioPerl, BioRuby, etc). Add new ones, remove ones that have already been
done
   or no longer relevant, etc. For an example see

     http://bioruby.open-bio.org/wiki/Google_Summer_of_Code

2. Update the final list of project ideas on the main OBF GSoC page
   to match.

     http://www.open-bio.org/wiki/Google_Summer_of_Code

3. Register with gsoc at lists.open-bio.org <javascript:;>

4. Announce it on that list when you are ready :)

Anyone can submit a project idea! Former GSoC students are especially
encouraged to contribute ideas to the mailing lists.

Please have the updates done by Friday March 22nd. The number and quality of
the project ideas are part of the evaluation process for whether OBF is
accepted as a Summer of Code organisation again this year, so let's come up
with some good ones!

Pj. (Pjotr Prins)

Important dates:

  * March 22nd: Finalise project ideas
  * March 29th: Deadline OBF mentoring organisation submission to Google

http://www.open-bio.org/wiki/Google_Summer_of_Code

From saketkc at gmail.com  Mon Mar  4 05:59:26 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Mon, 4 Mar 2013 16:29:26 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
Message-ID: <CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>

Hi,

I have updated the code here :
https://github.com/saketkc/biopython/tree/bwa_wrapper

I have added unittests for the wrapper. And yes, this did help me in
fixing a lot of minor bugs in my original wrapper.

@Peter :  Is this 'pull request' ready ?

Thanks

Saket

On 19 February 2013 19:55, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>
>> Thanks Peter.
>>
>> I will add that. Any pointers to what would be a good reference test_aba.py
>> file in Tests/ directory for writing unit tests for this ?
>>
>> I have worked on BDD before but Unit Tests are new for me, so it may take
>> some time.I plan to finish it the coming week once my university
>> examinations are done
>>
>> Thanks
>>
>> Saket
>
> There's a chapter in the Tutorial about our test framework. In this
> case existing command line tool wrappers are the best reference,
> e.g. test_Emboss.py or test_Muscle.py
>
> Also if you want to use doctests and have them included in the
> test suite, add the module to the list in Tests/run_tests.py - however
> this does not handle optional dependencies (other than NumPy).
> Therefore all the application wrapper doctests to date have carefully
> avoided actually invoking the command line - and instead most
> print the string representation instead. This allows us to check
> the example use cases should run (and catches silly errors in
> the examples like a typo in an argument name).
>
> Thanks,
>
> Peter

From saketkc at gmail.com  Tue Mar  5 12:26:57 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 5 Mar 2013 22:56:57 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>

I had this idea of an online biopython shell on the lines of  bioruby shell :
http://bioruby.open-bio.org/wiki/BioRubyOnRails


On 13 February 2013 07:38, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC?
>
> Best,
> -Michiel.
>
> --- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>> From: Peter Cock <p.j.a.cock at googlemail.com>
>> Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
>> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
>> Date: Tuesday, February 12, 2013, 12:51 PM
>> Hello all,
>>
>> Google recently confirmed they will be running Google Summer
>> of Code 2013,
>> and we (Biopython and the other Bio* projects) would hope to
>> be accepted again
>> under the Open Bioinformatics Foundation as in previous
>> years:
>> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>>
>> It would be great to start coming up with potential project
>> ideas, both larger
>> pieces of work suitable for GSoC but also smaller tasks for
>> other project
>> students, or 'low hanging fruit' for potential contributors
>> to cut
>> their teeth on.
>>
>> See also http://biopython.org/wiki/Active_projects
>> and the ideas list there.
>>
>> Regards,
>>
>> Peter
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From p.j.a.cock at googlemail.com  Fri Mar  8 11:08:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 8 Mar 2013 16:08:46 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
Message-ID: <CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>

On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> I had this idea of an online biopython shell on the lines of  bioruby shell :
> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>

That screenshot makes me think of http://ipython.org/ - is that similar?

Peter

From redmine at redmine.open-bio.org  Fri Mar  8 11:49:48 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 8 Mar 2013 16:49:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3422] (New) Missing
Message-ID: <redmine.issue-3422.20130308164948@redmine.open-bio.org>


Issue #3422 has been reported by Jared Sampson.

----------------------------------------
Bug #3422: Missing 
https://redmine.open-bio.org/issues/3422

Author: Jared Sampson
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/bookdoc_130101.dtd


When using Entrez.efetch to retrieve an XML file, I get the following warning about a missing DTD: bookdoc_130101.dtd

===

/path/to/my/virtualenv/lib/python2.7/site-packages/Bio/Entrez/Parser.py:522: UserWarning: Unable to load DTD file bookdoc_130101.dtd.

Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez.
Though most of NCBI's DTD files are included in the Biopython distribution,
sometimes you may find that a particular DTD file is missing. While we can
access the DTD file through the internet, the parser is much faster if the
required DTD files are available locally.

For this purpose, please download bookdoc_130101.dtd from

http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/bookdoc_130101.dtd

and save it either in directory

/path/to/my/virtualenv/lib/python2.7/site-packages/Bio/Entrez/DTDs

or in directory

/Users/me/.biopython/Bio/Entrez/DTDs

in order for Bio.Entrez to find it.

Alternatively, you can save bookdoc_130101.dtd in the directory
Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython.

Please also inform the Biopython developers about this missing DTD, by
reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing
list and emailing us, so that we can include it with the next release of
Biopython.

Proceeding to access the DTD file through the internet...

  warnings.warn(message)

===

Also, the bugzilla.open-bio.org URL mentioned comes up empty.

Thanks, 
Jared Sampson


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From saketkc at gmail.com  Fri Mar  8 13:30:03 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 9 Mar 2013 00:00:03 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
	<CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
Message-ID: <CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>

It is essentially an online RoR based application that allows you to
try bioruby through your browser without the need of a bioruby native
install . I was thinking of a django/flask application that would
essentially be  a  playground for trying out biopython


Saket

On 08/03/2013, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>> I had this idea of an online biopython shell on the lines of  bioruby
>> shell :
>> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>>
>
> That screenshot makes me think of http://ipython.org/ - is that similar?
>
> Peter
>

From chapmanb at 50mail.com  Sat Mar  9 11:06:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 09 Mar 2013 11:06:34 -0500
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
	<CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
	<CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>
Message-ID: <87wqtgeewl.fsf@fastmail.fm>


Saket and Peter;
What you're describing is what Ipython provides, a web-based way to edit
and interact with Python code. There are some projects that build on top
of it to provide more of a playground environment like you're describing:

http://continuum.io/wakari.html
https://github.com/Exhibitionist/Exhibitionist

Hope this helps,
Brad


> It is essentially an online RoR based application that allows you to
> try bioruby through your browser without the need of a bioruby native
> install . I was thinking of a django/flask application that would
> essentially be  a  playground for trying out biopython
>
>
> Saket
>
> On 08/03/2013, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>> I had this idea of an online biopython shell on the lines of  bioruby
>>> shell :
>>> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>>>
>>
>> That screenshot makes me think of http://ipython.org/ - is that similar?
>>
>> Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From xbello at gmail.com  Tue Mar 12 05:36:35 2013
From: xbello at gmail.com (Xabier Bello)
Date: Tue, 12 Mar 2013 10:36:35 +0100
Subject: [Biopython-dev] Consumer of "KW" in embl format
Message-ID: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>

Hi:

I don't know if this is the right way to do this. The code:

records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
for record in records:
    print record.annotations["keywords"]

Doesn't work

I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():

elif line_type == 'KW':
    consumer.keywords(data.rstrip(";"))

And now it seems to parse the keyword lines.

Regards.

From p.j.a.cock at googlemail.com  Tue Mar 12 05:54:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 09:54:51 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
Message-ID: <CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:36 AM, Xabier Bello <xbello at gmail.com> wrote:
> Hi:
>
> I don't know if this is the right way to do this. The code:
>
> records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
> for record in records:
>     print record.annotations["keywords"]
>
> Doesn't work
>
> I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():
>
> elif line_type == 'KW':
>     consumer.keywords(data.rstrip(";"))
>
> And now it seems to parse the keyword lines.
>
> Regards.

Good idea, although it needs a little more generalisation for handling
multiple keywords - a list of strings seems sensible here. Quoting
ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt

<quote>
3.4.6  The KW Line
The KW (KeyWord) lines provide information which can be used to generate
cross-reference indexes of the sequence entries based on functional,
structural, or other categories deemed important.
The format for a KW line is:
     KW   keyword[; keyword ...].
More than one keyword may be listed on each KW line; the keywords are
separated by semicolons, and the last keyword is followed by a full
stop. Keywords may consist of more than one word, and they may contain
embedded blanks and stops. A keyword is never split between lines.
An example of a keyword line is:
     KW   beta-glucosidase.
The keywords are ordered alphabetically; the ordering implies no hierarchy
of importance or function.  If an entry has no keywords assigned to it,
it will contain a single KW line like this:
     KW   .
</quote>

Likewise the GenBank parser should support the KEYWORDS line
too - and then writing the keywords out again too.

Is this something you'd like to work on, or should I do it?

(If you are interested in getting involved in Biopython development
this seems like a nice project to start with - not too complicated, but
large enough to make creating a fork on GitHub and your own
enhancement branch a good idea.)

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Mar 12 06:02:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 10:02:15 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
Message-ID: <CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:54 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 12, 2013 at 9:36 AM, Xabier Bello <xbello at gmail.com> wrote:
>> Hi:
>>
>> I don't know if this is the right way to do this. The code:
>>
>> records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
>> for record in records:
>>     print record.annotations["keywords"]
>>
>> Doesn't work
>>
>> I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():
>>
>> elif line_type == 'KW':
>>     consumer.keywords(data.rstrip(";"))
>>
>> And now it seems to parse the keyword lines.
>>
>> Regards.
>
> Good idea, although it needs a little more generalisation for handling
> multiple keywords - a list of strings seems sensible here. Quoting
> ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt
>
> <quote>
> 3.4.6  The KW Line
> The KW (KeyWord) lines provide information which can be used to generate
> cross-reference indexes of the sequence entries based on functional,
> structural, or other categories deemed important.
> The format for a KW line is:
>      KW   keyword[; keyword ...].
> More than one keyword may be listed on each KW line; the keywords are
> separated by semicolons, and the last keyword is followed by a full
> stop. Keywords may consist of more than one word, and they may contain
> embedded blanks and stops. A keyword is never split between lines.
> An example of a keyword line is:
>      KW   beta-glucosidase.
> The keywords are ordered alphabetically; the ordering implies no hierarchy
> of importance or function.  If an entry has no keywords assigned to it,
> it will contain a single KW line like this:
>      KW   .
> </quote>
>
> Likewise the GenBank parser should support the KEYWORDS line
> too - and then writing the keywords out again too.
>
> Is this something you'd like to work on, or should I do it?

To clarify - Biopython should already be reading and writing any
KEYWORDS line in GenBank files - the same data structure should
be used for EMBL files (your suggestion looks good, but an explicit
unit test covering single and multiple keywords would be ideal),
and then the EMBL writer updated to write this. i.e. code added in
Bio/SeqIO/InsdcIO.py

Peter

From p.j.a.cock at googlemail.com  Tue Mar 12 06:58:39 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 10:58:39 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
Message-ID: <CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>

On Tue, Mar 12, 2013 at 10:12 AM, Xabier Bello <xbello at gmail.com> wrote:
> I think I'm not that close to the Biopython code.
>
> I found a problem (I needed to read the Keywords), and solved it quick and
> dirty. In fact, it doesn't read multiline KW. I'm not sure I could implement
> that in a fair amount of time.
>
> Regards.

No problem - I've committed your fix, a basic test, and extended this for
multiple KW lines. As discussed I've thanked you in the NEWS file too.

https://github.com/biopython/biopython/commit/fc036dcdac22252a366647823a0c7c317c303313
https://github.com/biopython/biopython/commit/606ea9360d262d21c3e01eda66c4cf9118880d46

Updating the EMBL writer in Bio/SeqIO/InsdcIO.py should be a nice
small task for any volunteer wanting to make a first contribution...

(Potential Google Summer of Code students - Hint hint ;) )

Thank you Xabier,

Peter

From p.j.a.cock at googlemail.com  Tue Mar 12 10:40:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 14:40:16 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
	<CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>
	<CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
Message-ID: <CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>

On Tue, Mar 12, 2013 at 11:35 AM, Xabier Bello <xbello at gmail.com> wrote:
> Lets try it:
>
> Line 997:
> def _write_keywords(self, record):
>     #Put the keywords right after DE line.
>     self._write_multi_line("KW", "%s." % "; ".join(
>         record.annotations["keywords"]))
>     self.handle.write("XX\n")

Looks good - although there is a potential problem here with long keywords
where this does not avoid splitting a single keyword over multiple KW lines
(as specified in the EMBL specification). This is a corner case though...

> Line 1070:
> if "keywords" in record.annotations:
>     self._write_keywords(record)
>
> Note to self: learn to make diff patches and forks on github.

Good plan :)

Meanwhile, I committed that change:
https://github.com/biopython/biopython/commit/41470eac55a665d1cb1c7e73ebfd3c1df98af5ad

I added a little more testing, from which I think we may need to
do some work with some of the other EMBL fields like dbxrefs:
https://github.com/biopython/biopython/commit/07639dde32083f4f024616292a5c736e85770a4e

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Mar 12 11:13:23 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 15:13:23 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
	<CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>
	<CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
	<CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>
Message-ID: <CAKVJ-_6T=4SNWw2=u=cGkV5Y68bhYqy0s9_P_Dg5AAH2B_dy7w@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 12, 2013 at 11:35 AM, Xabier Bello <xbello at gmail.com> wrote:
>> Lets try it:
>>
>> Line 997:
>> def _write_keywords(self, record):
>>     #Put the keywords right after DE line.
>>     self._write_multi_line("KW", "%s." % "; ".join(
>>         record.annotations["keywords"]))
>>     self.handle.write("XX\n")
>
> Looks good - although there is a potential problem here with long keywords
> where this does not avoid splitting a single keyword over multiple KW lines
> (as specified in the EMBL specification). This is a corner case though...

OK, not such a rare case:

$ python test_SeqIO_features.py
...
======================================================================
ERROR: test_cor6 (__main__.TestWriteRead)
Write and read back cor6_6.gb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_SeqIO_features.py", line 1105, in test_cor6
    write_read(os.path.join("GenBank", "cor6_6.gb"), "gb")
  File "test_SeqIO_features.py", line 35, in write_read
    compare_records(gb_records, gb_records2)
  File "test_SeqIO_features.py", line 110, in compare_records
    if not compare_record(old,new,expect_minor_diffs):
  File "test_SeqIO_features.py", line 101, in compare_record
    % (key, old.annotations[key], new.annotations[key]))
ValueError: Annotation mis-match for keywords:
['antifreeze protein homology', 'cold-regulated gene', 'cor6.6 gene',
'KIN1 homology']
['antifreeze protein homology', 'cold-regulated gene', 'cor6.6 gene',
'KIN1', 'homology']

----------------------------------------------------------------------

I'll fix this later today...

Peter

From clements at galaxyproject.org  Tue Mar 12 18:01:49 2013
From: clements at galaxyproject.org (Dave Clements)
Date: Tue, 12 Mar 2013 15:01:49 -0700
Subject: [Biopython-dev] 2013 Galaxy Community Conference (GCC2013),
	30 June - 2 July, Oslo
Message-ID: <CA+He-X_a6gvMXcqiXPLHmTz0j==rGYzHDjZYeDBLqcPMdEf5KQ@mail.gmail.com>

Hello all,

 We are pleased to announce that early
registration<http://wiki.galaxyproject.org/Events/GCC2013/Register>
 and paper and poster abstract
submission<http://wiki.galaxyproject.org/Events/GCC2013/Abstracts> are
now open for the 2013 Galaxy Community Conference
(GCC2013)<http://wiki.galaxyproject.org/Events/GCC2013>
.  GCC2013 will be held 30 June through July 2 in Oslo Norway, at the
University
of Oslo <http://uio.no/>.

GCC2013 <http://wiki.galaxyproject.org/Events/GCC2013> is an opportunity to
participate in two full days of presentations, discussions, poster
sessions, keynotes, lightning talks and breakouts, *all about
high-throughput biology and the tools that support it*. The conference also
includes a Training
Day<http://wiki.galaxyproject.org/Events/GCC2013/TrainingDay> for
the second year in a row, this year with more in-depth topic coverage, more
concurrent sessions, and more topics.

If you are a biologist or bioinformatician performing or enabling
high-throughput biological research, then please consider attending.
 GCC2013 is aimed at:

   - Bioinformatics tool developers and data providers
   - Workflow developers and power bioinformatics users
   - Sequencing and Bioinformatics core staff
   - Data archival and analysis reproducibility specialists

*Early registration <http://wiki.galaxyproject.org/Events/GCC2013/Register>*
 *saves up to 75% off regular registration costs,* and is very affordable,
with combined registration (Training
Day<http://wiki.galaxyproject.org/Events/GCC2013/TrainingDay> +
main meeting) starting at ~ ?95 for post-docs and students.  Registering
early also assures you a spot in the Training Day workshops you want to
attend.  Once a Training Day session becomes full, it will be closed to new
registrations.  Early registration closes 24 May.

*Abstract submission<http://wiki.galaxyproject.org/Events/GCC2013/Abstracts>
* for oral presentations closes 12 April, and for posters on 3 May.  Please
consider presenting your work. If you are working with big biological data,
then the people at this meeting want to hear about your work.

Thanks, and hope to see you in Oslo!

The GCC2013 Organizing
Committee<http://wiki.galaxyproject.org/Events/GCC2013/Organizers>

PS: And please help get the word
out<http://wiki.galaxyproject.org/Events/GCC2013/Promotion>
!

-- 
http://galaxyproject.org/GCC2013
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://wiki.galaxyproject.org/


From anaryin at gmail.com  Wed Mar 13 07:09:29 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 13 Mar 2013 12:09:29 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
Message-ID: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>

Hello all,

I updated the GSOC page on the wiki to be more organized:
http://biopython.org/wiki/GSOC

If no one opposes, I'll replace the current page
(here<http://biopython.org/wiki/Google_Summer_of_Code>)
with it, just in time for GSOC 2013.

Best,

Jo?o

PS. sorry for the spamming but I posted this 5 days ago in the non dev list
and got no answers so..


2013/3/8 Jo?o Rodrigues <anaryin at gmail.com>

> Small update: http://biopython.org/wiki/GSOC
>
> If ok, We can just link the normal one for this one. I kept it separate
> just in case.
>
>
> 2013/3/4 Peter Cock <p.j.a.cock at googlemail.com>
>
>> On Sun, Mar 3, 2013 at 11:07 PM, Jo?o Rodrigues <anaryin at gmail.com>
>> wrote:
>> > Hello all,
>> >
>> > Does any oppose to a refreshment of our GSOC
>> > page<http://biopython.org/wiki/Google_Summer_of_Code>based on the
>> > BioRuby
>> > page <http://bioruby.open-bio.org/wiki/Google_Summer_of_Code>? It
>> could use
>> > a facelift before the new round of projects/students come in.
>> >
>> > Best,
>> >
>> > Jo?o
>>
>> A good idea - see also the GSoC discussions on the biopython-dev
>> list about potential project ideas.
>>
>> Thanks,
>>
>> Peter
>>
>
>


From mikael.trellet at gmail.com  Wed Mar 13 07:17:17 2013
From: mikael.trellet at gmail.com (Mikael Trellet)
Date: Wed, 13 Mar 2013 12:17:17 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
Message-ID: <CAMHOhY1VqM0vyJ1NtEscnbBxv3s=gsdKeEJhWCgRWtUyFCNU2Q@mail.gmail.com>

It's well-formated and looks nice for me, the improvement from the former
one is signifcant so I would agree to update the page.

Good work ;)

Mikael


On Wed, Mar 13, 2013 at 12:09 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> Hello all,
>
> I updated the GSOC page on the wiki to be more organized:
> http://biopython.org/wiki/GSOC
>
> If no one opposes, I'll replace the current page
> (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> with it, just in time for GSOC 2013.
>
> Best,
>
> Jo?o
>
> PS. sorry for the spamming but I posted this 5 days ago in the non dev list
> and got no answers so..
>
>
> 2013/3/8 Jo?o Rodrigues <anaryin at gmail.com>
>
> > Small update: http://biopython.org/wiki/GSOC
> >
> > If ok, We can just link the normal one for this one. I kept it separate
> > just in case.
> >
> >
> > 2013/3/4 Peter Cock <p.j.a.cock at googlemail.com>
> >
> >> On Sun, Mar 3, 2013 at 11:07 PM, Jo?o Rodrigues <anaryin at gmail.com>
> >> wrote:
> >> > Hello all,
> >> >
> >> > Does any oppose to a refreshment of our GSOC
> >> > page<http://biopython.org/wiki/Google_Summer_of_Code>based on the
> >> > BioRuby
> >> > page <http://bioruby.open-bio.org/wiki/Google_Summer_of_Code>? It
> >> could use
> >> > a facelift before the new round of projects/students come in.
> >> >
> >> > Best,
> >> >
> >> > Jo?o
> >>
> >> A good idea - see also the GSoC discussions on the biopython-dev
> >> list about potential project ideas.
> >>
> >> Thanks,
> >>
> >> Peter
> >>
> >
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
--------------------------------------------
Mikael TRELLET,
- Groupe VENISE, CNRS LIMSI
91403 Orsay CEDEX
- LBT/IBPC,
75005 Paris
France
+33650607172


From p.j.a.cock at googlemail.com  Wed Mar 13 08:04:28 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 13 Mar 2013 12:04:28 +0000
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
Message-ID: <CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>

On Wed, Mar 13, 2013 at 11:09 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hello all,
>
> I updated the GSOC page on the wiki to be more organized:
> http://biopython.org/wiki/GSOC
>
> If no one opposes, I'll replace the current page
> (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> with it, just in time for GSOC 2013.
>
> Best,
>
> Jo?o

Sounds sensible, and you can set a direct on GSOC to
Google_Summer_of_Code by replacing the content with:

#REDIRECT [[link]]

Peter


From anaryin at gmail.com  Wed Mar 13 09:22:23 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 13 Mar 2013 14:22:23 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
	<CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>
Message-ID: <CAJ9sUYOjtScLgizcD9j=RWhdmUjRA=UPaTz72Kvc2WscyuGecw@mail.gmail.com>

Done, thanks.

http://biopython.org/wiki/Google_Summer_of_Code
http://biopython.org/wiki/GSOC


2013/3/13 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Mar 13, 2013 at 11:09 AM, Jo?o Rodrigues <anaryin at gmail.com>
> wrote:
> > Hello all,
> >
> > I updated the GSOC page on the wiki to be more organized:
> > http://biopython.org/wiki/GSOC
> >
> > If no one opposes, I'll replace the current page
> > (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> > with it, just in time for GSOC 2013.
> >
> > Best,
> >
> > Jo?o
>
> Sounds sensible, and you can set a direct on GSOC to
> Google_Summer_of_Code by replacing the content with:
>
> #REDIRECT [[link]]
>
> Peter
>


From mjldehoon at yahoo.com  Wed Mar 13 10:44:55 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 13 Mar 2013 07:44:55 -0700 (PDT)
Subject: [Biopython-dev] New contributor
In-Reply-To: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
Message-ID: <1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Andrea,

Welcome to Biopython!
It's great that you want to contribute.
Writing & finishing some unit tests sounds like a good idea, and of course bug fixing is always welcome.
Other options are to look at orphan modules in Biopython (modules without active maintainers, without documentation, or without unit tests).
Once you decide what specifically you want to work on, it's good to let us know on the mailing list, to see if anybody else is working on the same.

Good luck!

Best,
-Michiel. 


--- On Sat, 3/2/13, Andrea Rizzi <88whacko at gmail.com> wrote:

> From: Andrea Rizzi <88whacko at gmail.com>
> Subject: [Biopython-dev] New contributor
> To: biopython-dev at biopython.org
> Date: Saturday, March 2, 2013, 11:49 AM
> Hello!
> My name is Andrea Rizzi and I'm a master's student in
> computer science and
> computational biology. I would be glad to help you
> developing biopython.
> I've used the library quite extensively but I'm mostly
> familiar with
> handling sequences, MSAs and PDB files.
> 
> I've read through the small contributing guide on the wiki
> and on the
> tutorial and I thought I could start with something
> relatively
> straightforward like writing/completing some unit tests (if
> I understood
> correctly there's a fairly strong need for them). I've good
> knowledge of
> both git and unittest. Anyway any task is actually fine to
> me :) .
> 
> If you agree I'll try to look for a module that needs some
> more testing (or
> maybe you have one to suggest me), otherwise I could just go
> to the bug
> tracker and try to help out fixing some bugs.
> 
> -- 
> -- Andrea
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From eric.talevich at gmail.com  Wed Mar 13 14:32:25 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 13 Mar 2013 14:32:25 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>

On Tue, Feb 12, 2013 at 9:08 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> It would be great to have better support for microarray analysis in
> Biopython. Something like lumi/limma in R. Perhaps this is an option for
> the GSoC?
>
> Best,
> -Michiel.
>

I like Michiel's idea, and I'll suggest two more:

1. Codon alignment & analysis:
- PAL2NAL-style conversion of unaligned nucleic acid sequences and a
protein sequence alignment to a codon alignment. (Previously discussed)
- dN/dS and the related functions needed to calculate it.
- Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
codon alignments, including validation (testing for frame shifts etc.)

2. Phylo enhancements:
2a. Tree drawing:
- A proper draw_unrooted function to perform radial layout, with an
optional "iterations" argument to use Felsenstein's Equal Daylight
algorithm -- I feel this layout approach is neglected in most libraries.
- Better matplotlib/pylab integration, so the plot components can be
tweaked using matplotlib functions.
- Other common layout approaches, e.g. circular.
2b. A "Phylo.consensus" module:
- strict consensus, like Bio.Nexus already implements.
- other consensus methods, time permitting.
2c. A "Phylo.distance" module:
- Robinson-Foulds distance -- though others might be working on this
already.
2d. Simple tree inference:
- Straightforward algorithms exist for neighbor-joining and parsimony tree
estimation. For small alignments (and perhaps medium-sized ones with PyPy),
it would be nice to run these without an external program, e.g. to
construct a guide tree for another algorithm or quickly view a phylogenetic
clustering of sequences.

Any interest in either of these? Shall I add them to the wiki?

-Eric


--- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: [Biopython-dev] Project ideas for GSoC (or other student
> projects)
> > To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, February 12, 2013, 12:51 PM
> > Hello all,
> >
> > Google recently confirmed they will be running Google Summer
> > of Code 2013,
> > and we (Biopython and the other Bio* projects) would hope to
> > be accepted again
> > under the Open Bioinformatics Foundation as in previous
> > years:
> > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
> >
> > It would be great to start coming up with potential project
> > ideas, both larger
> > pieces of work suitable for GSoC but also smaller tasks for
> > other project
> > students, or 'low hanging fruit' for potential contributors
> > to cut
> > their teeth on.
> >
> > See also http://biopython.org/wiki/Active_projects
> > and the ideas list there.
> >
> > Regards,
> >
> > Peter
>

From p.j.a.cock at googlemail.com  Wed Mar 13 17:16:27 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 13 Mar 2013 21:16:27 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
	<CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
Message-ID: <CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>

On Monday, March 4, 2013, Saket Choudhary wrote:

> Hi,
>
> I have updated the code here :
> https://github.com/saketkc/biopython/tree/bwa_wrapper
>
> I have added unittests for the wrapper. And yes, this did help me in
> fixing a lot of minor bugs in my original wrapper.
>
> @Peter :  Is this 'pull request' ready ?
>
> Thanks
>
> Saket
>
>
Sorry I've not had time to test this yet - and have
been off ill today as well. The basic approach you've
taken seems sound, and a good basis for other
samtools style tools.

Peter

From p.j.a.cock at googlemail.com  Thu Mar 14 07:25:41 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Mar 2013 11:25:41 +0000
Subject: [Biopython-dev] Fwd: [biopython] Add the ability to parse CEL
 version 4 files from Affy (#168)
In-Reply-To: <biopython/biopython/pull/168@github.com>
References: <biopython/biopython/pull/168@github.com>
Message-ID: <CAKVJ-_71jbGz2y2WFquRYQxX2AeANMc6Ek=RwdcQLK5jSNdKJA@mail.gmail.com>

Who would be the best person to review this? Michael?

Peter

---------- Forwarded message ----------
From: *Jeff Hammerbacher*
Date: Thursday, March 14, 2013
Subject: [biopython] Add the ability to parse CEL version 4 files from Affy
(#168)
To: biopython/biopython <biopython at noreply.github.com>


Hey,

I noticed that Biopython was missing the ability to parse binary CEL files
(version 4), so I've added a rough implementation. I've kept TODOs in the
code and a main method to demonstrate example use. I realize these are not
best practices for a mature library, but this corner of Biopython (the Affy
module) seems quite immature, so I figured I'd leave the code in this state
to indicate to others that there is much room for improvement.

I have not contributed to this project before, so please let me know how to
get this pull request in shape for a commit.

Thanks,
Jeff
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/hammer/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/168
Commit Summary

   - Add the ability to parse CEL version 4 files from Affy.

File Changes

   - *A* Bio/Affy/CelFileV4.py<https://github.com/biopython/biopython/pull/168/files#diff-0>(186)

Patch Links:

   - https://github.com/biopython/biopython/pull/168.patch
   - https://github.com/biopython/biopython/pull/168.diff

From mjldehoon at yahoo.com  Fri Mar 15 09:09:18 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 06:09:18 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
Message-ID: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi everybody,

I looked at the mmCIF parser again, and it turned out that the Python standard library contains a shlex lexical analyzer module that makes mmCIF parsing straightforward without relying on flex or PLY. I uploaded a modified version of MMCIF2Dict.py to the git repository. This parser does the exact same thing as the flex-based parser, but is in pure Python. If you're interested, have a look at MMCIF2Dict.py in the git repository; comments and suggestions are welcome.

If there are no objections, I think we can remove everything in Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF file is represented in Biopython. I think there are more Pythonic ways to store the contents of an mmCIF file in a Python object.

Best,
-Michiel.

--- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>, "Lenna Peterson" <arklenna at gmail.com>
> Date: Saturday, February 16, 2013, 5:42 AM
> On Sat, Feb 16, 2013 at 2:46 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Lenna,
> >
> > Maybe we are confusing each other..
> > I am looking for a solution that (a) doesn't introduce
> new dependencies,
> 
> +1
> 
> > (b) is pure-Python so it can run on Jython,
> 
> +1 And on PyPy (which to me is more interesting that Jython)
> etc.
> 
> > and (c) if that is not possible and we do need to use
> C, then that C code
> > should be understandable so that it can be debugged if
> necessary.
> >
> > I was suggesting to clean up lex.yy.c so that we can at
> least achieve (c).
> 
> This does mean we essentially give up on ever regenerating
> the lex.yy.c
> file every again - could that be a problem if Flex itself
> changes much?
> 
> > The alternative is to start from the PLY-based parser
> and remove the
> > dependency on PLY.
> >
> > Best,
> > -Michiel.
> 
> Peter
> 

From anaryin at gmail.com  Fri Mar 15 09:20:16 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 15 Mar 2013 14:20:16 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
	<1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>

Hi Michiel,

Speaking without really checking the code.. What we perhaps should have it
the parsers, whatever they are, populating the same type of object in the
end (PDBParser and mmCIFParser). Is this the current status of the mmCIF?

Best,

Jo?o


2013/3/15 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi everybody,
>
> I looked at the mmCIF parser again, and it turned out that the Python
> standard library contains a shlex lexical analyzer module that makes mmCIF
> parsing straightforward without relying on flex or PLY. I uploaded a
> modified version of MMCIF2Dict.py to the git repository. This parser does
> the exact same thing as the flex-based parser, but is in pure Python. If
> you're interested, have a look at MMCIF2Dict.py in the git repository;
> comments and suggestions are welcome.
>
> If there are no objections, I think we can remove everything in
> Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF
> file is represented in Biopython. I think there are more Pythonic ways to
> store the contents of an mmCIF file in a Python object.
>
> Best,
> -Michiel.
>
> --- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> > To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> > Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>, "Lenna
> Peterson" <arklenna at gmail.com>
> > Date: Saturday, February 16, 2013, 5:42 AM
> > On Sat, Feb 16, 2013 at 2:46 AM,
> > Michiel de Hoon <mjldehoon at yahoo.com>
> > wrote:
> > > Hi Lenna,
> > >
> > > Maybe we are confusing each other..
> > > I am looking for a solution that (a) doesn't introduce
> > new dependencies,
> >
> > +1
> >
> > > (b) is pure-Python so it can run on Jython,
> >
> > +1 And on PyPy (which to me is more interesting that Jython)
> > etc.
> >
> > > and (c) if that is not possible and we do need to use
> > C, then that C code
> > > should be understandable so that it can be debugged if
> > necessary.
> > >
> > > I was suggesting to clean up lex.yy.c so that we can at
> > least achieve (c).
> >
> > This does mean we essentially give up on ever regenerating
> > the lex.yy.c
> > file every again - could that be a problem if Flex itself
> > changes much?
> >
> > > The alternative is to start from the PLY-based parser
> > and remove the
> > > dependency on PLY.
> > >
> > > Best,
> > > -Michiel.
> >
> > Peter
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Fri Mar 15 09:21:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 15 Mar 2013 13:21:50 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
	<1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4GbeJBy2qe8Qz=b5qg4uaMFwxBU21YuubnH=Y+8VqOKg@mail.gmail.com>

On Fri, Mar 15, 2013 at 1:09 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> I looked at the mmCIF parser again, and it turned out that the Python
> standard library contains a shlex lexical analyzer module that makes mmCIF
> parsing straightforward without relying on flex or PLY. I uploaded a
> modified version of MMCIF2Dict.py to the git repository. This parser does
> the exact same thing as the flex-based parser, but is in pure Python. If
> you're interested, have a look at MMCIF2Dict.py in the git repository;
> comments and suggestions are welcome.

That makes MMCIF2Dict look a lot shorter :)
https://github.com/biopython/biopython/commit/b2bafdfcd67c738f91722495bb732297b7936828

> If there are no objections, I think we can remove everything in
> Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF
> file is represented in Biopython. I think there are more Pythonic ways to
> store the contents of an mmCIF file in a Python object.
>
> Best,
> -Michiel.

Do you think we need a deprecation cycle for Bio.PDB.mmCIF? It has
been available by default on Debian etc where the dependency was
taken care of by the packagers.

I've never used this code so perhaps Eric or Jo?o's perspective would be
more helpful than mine.

Peter


From mjldehoon at yahoo.com  Fri Mar 15 11:08:43 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 08:08:43 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_4GbeJBy2qe8Qz=b5qg4uaMFwxBU21YuubnH=Y+8VqOKg@mail.gmail.com>
Message-ID: <1363360123.26690.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi all,

--- On Fri, 3/15/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Do you think we need a deprecation cycle for Bio.PDB.mmCIF?
> It has been available by default on Debian etc where the
> dependency was taken care of by the packagers.

Probably not. The Bio.PDB.mmCIF module was essentially a private module used by Bio.PDB.MMCIF2Dict, whose usage is unchanged. Also, AFAICT the Bio.PDB.mmCIF module is not documented anywhere. And finally, all this module does is tokenize the mmCIF file, so probably not something an end user would be interested in.

I am not a heavy user of Bio.PDB myself, so feel free to correct me if I am wrong.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Fri Mar 15 11:28:48 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 15 Mar 2013 15:28:48 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
	<1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7gK0+=H-H4LFOsduLqSHqahD3mELOxHeGsSgxV0-Khaw@mail.gmail.com>

On Fri, Mar 15, 2013 at 3:22 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e.
> how to use the Bio.PDB module) on the Biopython website with additional
> information on Bio.PDB, including some information on things that are not in
> the main Biopython Tutorial. Perhaps this is a good time to integrate this
> FAQ into the main documentation?
>

Both are LaTeX documents so this shouldn't be too hard to do.

Peter

From mjldehoon at yahoo.com  Fri Mar 15 11:22:30 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 08:22:30 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
Message-ID: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Jo?o,

--- On Fri, 3/15/13, Jo?o Rodrigues <anaryin at gmail.com> wrote:
What we perhaps should have it the parsers, whatever they are, populating the same type of object in the end (PDBParser and mmCIFParser).
I think that there are two options:

1) PDBParser and mmCIFParser both produce Structure objects, with any additional information found in mmCIF files stored as additional attributes of Structure objects (and the same thing for PDB files);

2) We make a module mmCIF with a function mmCIF.read that reads an mmCIF file and stores the information in a mmCIF.Record object that is optimized for storing mmCIF information. The mmCIFParser uses mmCIF.read, and pulls out the necessary information from the mmCIF.Record object to create a Structure object (which is free of mmCIF-specific stuff). Users can make Structure objects if that is all they need, or use mmCIF.read if they want to have all information in an mmCIF file.

Currently the situation is closer to (2), with MMCIF2Dict playing the role of mmCIF.read, but I don't like much the way MMCIF2Dict stores information.

Since I am not a power user of Bio.PDB, other people may have more insight in whether (1) or (2) (or something completely different) is best. 
Is this the current status of the mmCIF?
I just replaced the flex-dependent part of mmCIF by pure Python code, but I didn't change the functionality or usage of the mmCIF code. So the current status is still the same as described in the documentation.

Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e. how to use the Bio.PDB module) on the Biopython website with additional information on Bio.PDB, including some information on things that are not in the main Biopython Tutorial. Perhaps this is a good time to integrate this FAQ into the main documentation?

Best,
-Michiel 


From jacobs at bioinformed.com  Fri Mar 15 11:40:38 2013
From: jacobs at bioinformed.com (Kevin Jacobs)
Date: Fri, 15 Mar 2013 08:40:38 -0700
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
	<CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
	<CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>
Message-ID: <CAPipXkKS8c3CCMQef8zn2a2oSFuj=XbTKGaKFWRUCcVZ8iWt4A@mail.gmail.com>

FYI, I am working on a direct Cython wrapper around the new BWA-MEM
aligner, which will allow API-level access to Heng Li's extremely
impressive new algorithm.  It is still in early development and is missing
many bells and whistles, but will be shaping up in the next few weeks.

Test program:

import bwamem

mem = bwamem.MEMAligner('ref/human_g1k_v37.fasta')
a = mem.align('TCACGACGCTCTTCCGATCTGTT...GTGCATTCTCTGGTCAGACAGCCAAGG')
a = a[0]

print 'ref id =',a.rid
print 'pos    =',a.pos
print 'CIGAR  =',a.cigar.to_string()


Output (correct):

ref id = 0
pos    = 115250385
CIGAR  = 17N134M


On Wed, Mar 13, 2013 at 2:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Monday, March 4, 2013, Saket Choudhary wrote:
>
> > Hi,
> >
> > I have updated the code here :
> > https://github.com/saketkc/biopython/tree/bwa_wrapper
> >
> > I have added unittests for the wrapper. And yes, this did help me in
> > fixing a lot of minor bugs in my original wrapper.
> >
> > @Peter :  Is this 'pull request' ready ?
> >
> > Thanks
> >
> > Saket
> >
> >
> Sorry I've not had time to test this yet - and have
> been off ill today as well. The basic approach you've
> taken seems sound, and a good basis for other
> samtools style tools.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From anaryin at gmail.com  Fri Mar 15 11:53:41 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 15 Mar 2013 16:53:41 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
	<1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>

Hi Michiel,


> 1) PDBParser and mmCIFParser both produce Structure objects, with any
> additional information found in mmCIF files stored as additional attributes
> of Structure objects (and the same thing for PDB files);
>

This approach has a few advantages. First and most obvious, converting one
file format to another seamlessly. Second, reducing the code to something
easier to maintain and to extend too. The disadvantage is that the
Structure objects might become a bit too bloated. On the other hand, we can
make them lighter and take advantage of Python's dynamic attributes (if I
need a b-factor, I just add atom.bfactor). This would also help a lot with
the current parser which is quite "sluggish" for some purposes and bring a
lot more flexibility (parsing pqr files, mol2 files, etc). All we'd need
would be a parser for each file format and a generic container to have the
backbone of the structure and extend is as we need. A simple flag for the
parser type would make checking if function X can be used on this
particular structure easier too.


>
> 2) We make a module mmCIF with a function mmCIF.read that reads an mmCIF
> file and stores the information in a mmCIF.Record object that is optimized
> for storing mmCIF information. The mmCIFParser uses mmCIF.read, and pulls
> out the necessary information from the mmCIF.Record object to create a
> Structure object (which is free of mmCIF-specific stuff). Users can make
> Structure objects if that is all they need, or use mmCIF.read if they want
> to have all information in an mmCIF file.
>

I'm completely unfamiliar with mmCIF files.. how much more information do
they have than a PDB file? And what kind of information is useful to
extract from them?

Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e.
> how to use the Bio.PDB module) on the Biopython website with additional
> information on Bio.PDB, including some information on things that are not
> in the main Biopython Tutorial. Perhaps this is a good time to integrate
> this FAQ into the main documentation?


We could also update it a bit because it's been a while and there are some
different things here and there. And additions too.

Best,

Jo?o


From bartek at rezolwenta.eu.org  Fri Mar 15 19:06:57 2013
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sat, 16 Mar 2013 00:06:57 +0100
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>

Hi All,
I would add one more (old) idea for a GSoC pool, i.e. adding support
for different biological ontologies to biopython.

This was already discussed some time ago
(http://www.biopython.org/w/index.php?title=Gene_Ontology&redirect=no)
mostly in the context of gene ontology, and to some extent this is
addressed by the development of GOAtools
(https://github.com/tanghaibao/goatools), but I think it would be
worth to have a decent support for OBO-file-based ontologies (not only
gene ontology, I'm also interested myself in anatomical ontologies,
there are also other available at obofoundry.org) in biopython.

I think it would need to include support for IO operations on both OBO
and annotation files, as well as statistical enrichment measures and
potentially some visualisation.

Would anyone be interested in co-mentoring this project? There is one
student in my department who would be interested in applying to GSoC
for this project, but I think it would be great if other people joined
the discussion on the functionality and having more people involved is
always better...

best
Bartek Wilczynski

On Wed, Mar 13, 2013 at 7:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Feb 12, 2013 at 9:08 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:
>
>> It would be great to have better support for microarray analysis in
>> Biopython. Something like lumi/limma in R. Perhaps this is an option for
>> the GSoC?
>>
>> Best,
>> -Michiel.
>>
>
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:
> - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
> protein sequence alignment to a codon alignment. (Previously discussed)
> - dN/dS and the related functions needed to calculate it.
> - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
> codon alignments, including validation (testing for frame shifts etc.)
>
> 2. Phylo enhancements:
> 2a. Tree drawing:
> - A proper draw_unrooted function to perform radial layout, with an
> optional "iterations" argument to use Felsenstein's Equal Daylight
> algorithm -- I feel this layout approach is neglected in most libraries.
> - Better matplotlib/pylab integration, so the plot components can be
> tweaked using matplotlib functions.
> - Other common layout approaches, e.g. circular.
> 2b. A "Phylo.consensus" module:
> - strict consensus, like Bio.Nexus already implements.
> - other consensus methods, time permitting.
> 2c. A "Phylo.distance" module:
> - Robinson-Foulds distance -- though others might be working on this
> already.
> 2d. Simple tree inference:
> - Straightforward algorithms exist for neighbor-joining and parsimony tree
> estimation. For small alignments (and perhaps medium-sized ones with PyPy),
> it would be nice to run these without an external program, e.g. to
> construct a guide tree for another algorithm or quickly view a phylogenetic
> clustering of sequences.
>
> Any interest in either of these? Shall I add them to the wiki?
>
> -Eric
>
>
> --- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> > From: Peter Cock <p.j.a.cock at googlemail.com>
>> > Subject: [Biopython-dev] Project ideas for GSoC (or other student
>> projects)
>> > To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
>> > Date: Tuesday, February 12, 2013, 12:51 PM
>> > Hello all,
>> >
>> > Google recently confirmed they will be running Google Summer
>> > of Code 2013,
>> > and we (Biopython and the other Bio* projects) would hope to
>> > be accepted again
>> > under the Open Bioinformatics Foundation as in previous
>> > years:
>> > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>> >
>> > It would be great to start coming up with potential project
>> > ideas, both larger
>> > pieces of work suitable for GSoC but also smaller tasks for
>> > other project
>> > students, or 'low hanging fruit' for potential contributors
>> > to cut
>> > their teeth on.
>> >
>> > See also http://biopython.org/wiki/Active_projects
>> > and the ideas list there.
>> >
>> > Regards,
>> >
>> > Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


--
Bartek Wilczynski

From mjldehoon at yahoo.com  Fri Mar 15 22:38:48 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 19:38:48 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
Message-ID: <1363401528.82829.YahooMailClassic@web164001.mail.gq1.yahoo.com>


--- On Fri, 3/15/13, Jo?o Rodrigues <anaryin at gmail.com> wrote:
I'm completely unfamiliar with mmCIF files.. how much more information do they have than a PDB file?
These are two examples from the Biopython tests:

https://github.com/biopython/biopython/blob/master/Tests/PDB/1A8O.cif
https://github.com/biopython/biopython/blob/master/Tests/PDB/1LCD.cif
And what kind of information is useful to extract from them?
I think we should extract all information from these files, and let the user decide which parts are useful.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Sat Mar 16 10:38:22 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Mar 2013 14:38:22 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
Message-ID: <CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>

On Wed, Dec 5, 2012 at 6:41 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi David,
>
> I've been experimenting with your pull request, thank you:
> https://github.com/biopython/biopython/pull/116
>

Hi again David,

I've not used your code as is, but have started by pulling out
and generalising what I felt was the least contentious part:

https://github.com/biopython/biopython/commit/087712510421ec7f655a7981926a757aa93e9177

This means that label_position = start, middle, end (and some
historic aliases defined in the linear drawer code) now work
on circular GenomeDiagrams. I have made the default None,
which gives the current behaviour (as 'start' on linear, the
more complicated to explain vertical bottom on circular).

Support for allowing the default label orientation to be radially
consistent all round the circle (rather than the current flipping
for the left/right halves which assumes the output is kept
vertical) would be nice, but the thing I am most keen on is the
inside/outside of the track label placement. Hopefully I'll have
time to finish that this weekend...

Peter

From p.j.a.cock at googlemail.com  Sat Mar 16 16:37:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Mar 2013 20:37:12 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
	<CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
Message-ID: <CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>

On Sat, Mar 16, 2013 at 2:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Dec 5, 2012 at 6:41 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi David,
>>
>> I've been experimenting with your pull request, thank you:
>> https://github.com/biopython/biopython/pull/116
>>
>
> Hi again David,
>
> I've not used your code as is, but have started by pulling out
> and generalising what I felt was the least contentious part:
>
> https://github.com/biopython/biopython/commit/087712510421ec7f655a7981926a757aa93e9177
>
> This means that label_position = start, middle, end (and some
> historic aliases defined in the linear drawer code) now work
> on circular GenomeDiagrams. I have made the default None,
> which gives the current behaviour (as 'start' on linear, the
> more complicated to explain vertical bottom on circular).
>
> Support for allowing the default label orientation to be radially
> consistent all round the circle (rather than the current flipping
> for the left/right halves which assumes the output is kept
> vertical) would be nice, but the thing I am most keen on is the
> inside/outside of the track label placement. Hopefully I'll have
> time to finish that this weekend...

Here's a version on a branch which addresses the label placement
by adding a label_strand argument, where +1 means the label is
on the forward strand side of the track (above or outside), while
-1 means the reverse strand side of the track (below or inside),
and the default is to follow the strand of the feature being draw.
This seemed to me quite an intuitive arrangement:

https://github.com/peterjc/biopython/tree/label_strand

This branch also (without making it optional) switches circular
diagram feature labels to be "outside" the sigil like the linear
diagram, rather than "insider" the sigil. This does tend to take
up more space (which would explain the original motivation),
but rarely gives a very legible result except with a box sigil
and a very small/short label which falls completely within the
sigil. This could be made a user option if there is demand...
my inclination is not to (the API is already quite complex).

David, I will email you an updated version of your example
script using this branch for you to look at. It allows me to
recreate the same effect as your code (bar the orientation
changes which I have not at this point incorporated).

David & Leighton, what do you think of this label idea?

Peter

From p.j.a.cock at googlemail.com  Mon Mar 18 07:58:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 18 Mar 2013 11:58:49 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
	<CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
	<CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>
Message-ID: <CAKVJ-_6RMyKfxzjUi9spf6NTSAgo5HNW37KeroDV1Mtcbndidw@mail.gmail.com>

On Sat, Mar 16, 2013 at 8:37 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> David & Leighton, what do you think of this label idea?
>
> Peter

>From discussion off list, my branch seems positively
accepted by both, and so I've applied that to the master.
I probably will need to update some images in the Tutorial...

We appear to agree that label orientation is an aesthetic
judgement, and therefore a user option to control this on
circular diagrams would be nice - but I've not done this (yet)
and remain cautious about further complicating this bit of
the code & while trying to have a consistent API between
the linear and circular drawers.

See also: https://github.com/biopython/biopython/pull/116

Peter

From chapmanb at 50mail.com  Mon Mar 18 12:49:33 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 18 Mar 2013 12:49:33 -0400
Subject: [Biopython-dev] SciPy Bioinformatics symposium: abstracts due
	Wednesday Mar 20th
Message-ID: <87y5dkvejm.fsf@fastmail.fm>


Hi all;
I'm helping organize a bioinformatics mini-symposium as part of SciPy 2013:

Bioinformatics mini-symposia: http://j.mp/Z4xxXB
SciPy info: http://conference.scipy.org/scipy2013/about.php

This is a great chance for the Python bioinformatics community to connect
with the wider Python scientific computing world. SciPy will feature programmers
working on IPython reproducible research, scikit-learn machine learning
approaches, large scale computing problems with NumPy and lots more relevant to
bioinformatics work.

This year there will a special symposium track dedicated to bioinformatics and
I'd like to encourage everyone to submit abstracts. The deadline is this
Wednesday, March 20th:

http://conference.scipy.org/scipy2013/speaking_overview.php
http://conference.scipy.org/scipy2013/speaking_submission.php

SciPy takes place June 24-29th in Austin, TX. I'm looking forward to seeing lots of
bioinformatics people there. Please feel free to write me if you have any
questions,
Brad

From 88whacko at gmail.com  Wed Mar 20 14:10:14 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Wed, 20 Mar 2013 19:10:14 +0100
Subject: [Biopython-dev] New contributor
In-Reply-To: <1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>

Thank you for your welcome Michiel!

I will looking for a good project to work on in the next few days and I
will let you know soon. Meanwhile I've started to read some code to become
familiar with the modules and I bumped into few small bugs concerning the
Seq objects, in particular I found:

1) a duplicated test method name (one test in test_Seq_objs.py wasn't
performed);
2) an error in Alphabet._case_less().

I've also expanded a little bit the documentation and I've substituted
tostring() method with the suggested str() method in a function of
MutableSeq. The branch is located here

https://github.com/andrrizzi/biopython/tree/seq-branch

I'm not sure if it is more comfortable for you to merge this kind of
commits from a git branch or it is more advisable to open a ticket and
create a patch. Anyway if you think this small commits may be useful, feel
free to use them.

Best,
Andrea


2013/3/13 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi Andrea,
>
> Welcome to Biopython!
> It's great that you want to contribute.
> Writing & finishing some unit tests sounds like a good idea, and of course
> bug fixing is always welcome.
> Other options are to look at orphan modules in Biopython (modules without
> active maintainers, without documentation, or without unit tests).
> Once you decide what specifically you want to work on, it's good to let us
> know on the mailing list, to see if anybody else is working on the same.
>
> Good luck!
>
> Best,
> -Michiel.
>
>
>
> --- On Sat, 3/2/13, Andrea Rizzi <88whacko at gmail.com> wrote:
>
> > From: Andrea Rizzi <88whacko at gmail.com>
> > Subject: [Biopython-dev] New contributor
> > To: biopython-dev at biopython.org
> > Date: Saturday, March 2, 2013, 11:49 AM
> > Hello!
> > My name is Andrea Rizzi and I'm a master's student in
> > computer science and
> > computational biology. I would be glad to help you
> > developing biopython.
> > I've used the library quite extensively but I'm mostly
> > familiar with
> > handling sequences, MSAs and PDB files.
> >
> > I've read through the small contributing guide on the wiki
> > and on the
> > tutorial and I thought I could start with something
> > relatively
> > straightforward like writing/completing some unit tests (if
> > I understood
> > correctly there's a fairly strong need for them). I've good
> > knowledge of
> > both git and unittest. Anyway any task is actually fine to
> > me :) .
> >
> > If you agree I'll try to look for a module that needs some
> > more testing (or
> > maybe you have one to suggest me), otherwise I could just go
> > to the bug
> > tracker and try to help out fixing some bugs.
> >
> > --
> > -- Andrea
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>


-- 
-- Andrea

From p.j.a.cock at googlemail.com  Thu Mar 21 08:17:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 12:17:51 +0000
Subject: [Biopython-dev] New contributor
In-Reply-To: <CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
Message-ID: <CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>

On Wed, Mar 20, 2013 at 6:10 PM, Andrea Rizzi <88whacko at gmail.com> wrote:
> Thank you for your welcome Michiel!
>
> I will looking for a good project to work on in the next few days and I
> will let you know soon. Meanwhile I've started to read some code to become
> familiar with the modules and I bumped into few small bugs concerning the
> Seq objects, in particular I found:
>
> 1) a duplicated test method name (one test in test_Seq_objs.py wasn't
> performed);
> 2) an error in Alphabet._case_less().

Well spotted - changes applied to the master, thanks.

> I've also expanded a little bit the documentation and I've substituted
> tostring() method with the suggested str() method in a function of
> MutableSeq. The branch is located here
>
> https://github.com/andrrizzi/biopython/tree/seq-branch
>
> I'm not sure if it is more comfortable for you to merge this kind of
> commits from a git branch or it is more advisable to open a ticket and
> create a patch. Anyway if you think this small commits may be useful, feel
> free to use them.

If you're happy on GitHub, a pull request is simplest. I've looked
at these changes one by one and applied and/or commented
on them.

(We're debating moving our issue tracker from RedMine to
GitHub, which would make things a little easier in future).

Thank you!

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 12:11:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:11:44 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
Message-ID: <CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>

On Fri, Mar 15, 2013 at 11:06 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hi All,
> I would add one more (old) idea for a GSoC pool, i.e. adding support
> for different biological ontologies to biopython.
>
> This was already discussed some time ago
> (http://www.biopython.org/w/index.php?title=Gene_Ontology&redirect=no)
> mostly in the context of gene ontology, and to some extent this is
> addressed by the development of GOAtools
> (https://github.com/tanghaibao/goatools), but I think it would be
> worth to have a decent support for OBO-file-based ontologies (not only
> gene ontology, I'm also interested myself in anatomical ontologies,
> there are also other available at obofoundry.org) in biopython.
>
> I think it would need to include support for IO operations on both OBO
> and annotation files, as well as statistical enrichment measures and
> potentially some visualisation.
>
> Would anyone be interested in co-mentoring this project? There is one
> student in my department who would be interested in applying to GSoC
> for this project, but I think it would be great if other people joined
> the discussion on the functionality and having more people involved is
> always better...
>
> best
> Bartek Wilczynski

That's a good idea - I would have used this recently with some GO
stuff (e.g. given a GO term, is it a molecular function, biological
process, or cellular compartment - can solve this easily by traversing
up any branch of the DAG).

Right now we need to put this list of ideas on the wiki page (ready
for combining into the OBF page which will be shown to Google
to make our case for taking part in the GSoC 2013 program).
http://biopython.org/wiki/Google_Summer_of_Code

If any of you as a potential mentor want to put up an outline
proposal, even better.

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 12:29:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:29:29 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
Message-ID: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>

On Thu, Mar 21, 2013 at 4:11 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Right now we need to put this list of ideas on the wiki page (ready
> for combining into the OBF page which will be shown to Google
> to make our case for taking part in the GSoC 2013 program).
> http://biopython.org/wiki/Google_Summer_of_Code
>
> If any of you as a potential mentor want to put up an outline
> proposal, even better.
>

I've been wondering about potential GSoC projects which I'd
be interested in mentoring (or co-mentoring), and thus far I've
only got one outline idea.

I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
functionality (which does whole record parsing on demand)
and extending this with lazy-loading or lazy-parsing (which
has precedent in our BioSQL wrappers). For example, with
whole genome FASTA files you may never need to load the
entire sequence, but using an index system like tabix (or
even actually using a tabix index) Biopython could provide
a lazy-loading Seq object which extracts only the sequence
region of interest on demand.

The same idea applies to richer file formats too, like EMBL
and GenBank. Here lazy loading the sequence is actually
easier (the number of bases per line is strictly defined),
but you can apply the same ideas to lazy loading features
too. This means indexing both the sequence and the feature
table.

Likewise, this makes sense for GTF/GFF/GFF3 where you
would index the features, and also if present index the
embedded FASTA sequence at the end of the file. Clearly
handling this would ideally build on Lenna and Brad's work
with the underlying parser.

With what I have in mind, there are two technical sides to
this. First, the index format (binning strategies etc) for
which we should review tabix and BAM's indexing and its
planned replacement CSI (able to handle longer references).

Second, to avoid code duplication, this would mean some
re-factoring of the existing parser code to ensure that if
a record is loaded in full via the traditional API, it would
go though the same code as if it were loaded via the new
lazy loading approach. Potentially the existing parsers
could optionally also become lazy loaders (contingent
on this requiring ownership of the file handle as it will
use seek and tell to move the file pointer). That in theory
could make our parsers much faster (depending on the
overheads) for tasks where only a minority of the data
is ever used. I've had some fun chats with Pjotr Prins
from BioRuby about this at a CodeFest/BOSC meeting.

Brad and Lenna, I've CC'd you explicitly as I'm guessing
from the GFF work you are most likely to have considered
some of these issues.

Does this sound like something worth exploring further,
and worth proposing as an outline GSoC project? I think
it would be quite a challenging project - but like last year,
it is something I would like to try myself if I had the time.

Regards,

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 13:01:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:01:51 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CAKVJ-_5fX+dFL1CJH47cjsz2gEU7HGi83OZyb3O2iCDQUiSwww@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:

Already up on the wiki :)

>
> 2. Phylo enhancements:
> 2a. Tree drawing:
> - A proper draw_unrooted function to perform radial layout, with an optional
> "iterations" argument to use Felsenstein's Equal Daylight algorithm -- I
> feel this layout approach is neglected in most libraries.
> - Better matplotlib/pylab integration, so the plot components can be tweaked
> using matplotlib functions.
> - Other common layout approaches, e.g. circular.
> 2b. A "Phylo.consensus" module:
> - strict consensus, like Bio.Nexus already implements.
> - other consensus methods, time permitting.
> 2c. A "Phylo.distance" module:
> - Robinson-Foulds distance -- though others might be working on this
> already.
> 2d. Simple tree inference:
> - Straightforward algorithms exist for neighbor-joining and parsimony tree
> estimation. For small alignments (and perhaps medium-sized ones with PyPy),
> it would be nice to run these without an external program, e.g. to construct
> a guide tree for another algorithm or quickly view a phylogenetic clustering
> of sequences.

One more idea for a sub-task?

2e. Using multiple trees for bootstrapping a master tree. Take the master
tree and for each edge you have a partition of the leaves, which can be
used as a dictionary hash (e.g. as a binary representation). Then for
each of the bootstrap runs, look at each edge, compute the hash for
that split of the leaves, and increment the count. Then at the end, you
have a dictionary of counts which are the branch bootstrap supports.

I wrote that once in Python some time back, and used it to take a set
of boot strap trees generated on a cluster and give the support values
to the master tree.

>
> Any interest in either of these? Shall I add them to the wiki?
>

They both seem worth posting on the wiki, although we may not have
enough mentors for both to go ahead :(

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 12:55:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:55:30 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:
> - PAL2NAL-style conversion of unaligned nucleic acid sequences and a protein
> sequence alignment to a codon alignment. (Previously discussed)

e.g. https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py

> - dN/dS and the related functions needed to calculate it.
> - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
> codon alignments, including validation (testing for frame shifts etc.)

http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis

I see you've started fleshing this idea out on the wiki, which is great.
Right now it seems a little on the light weight side - or is that deliberate
(to see if a student can take this idea and come up with a solid
project proposal in this area)? Things like model selection might
be a fun extension - I can think of a local expert who would be
great to get involved on the science side if he's interested.

Alternatively this could include doing some more general work
on the alignment object - for instance per-column-annotation
for things like a consensus sequence - or an array-of-char
implementation as an alternative to the list-of-SeqRecords
we have now (with its poor column access speed).

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 13:29:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:29:44 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>
Message-ID: <CAKVJ-_6Na2QWDKaVO5wK+QDKTv=Vh-6PGi33hmShz-qErVRkSg@mail.gmail.com>

On Tue, Feb 12, 2013 at 6:29 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> It's more or less a 'low hanging fruit', but I've been thinking
> perhaps it may be useful if we have our own interface to the HMMER3
> online service? The corresponding SearchIO parsers may be written for
> this as well (they return different formats for which we haven't any
> parsers currently).

Worth adding to the projects list here (or filing an enhancement bug)
http://biopython.org/wiki/Active_projects#Project_ideas - but not
enough to base a whole GSoC project around.

> And I think there are more things being worked on, not yet mentioned
> in the wiki:
>
> 1. Porting our docs to Sphinx[1]
> 2. Converting some/all of the print and compare tests to unit tests.
> For example, our Bio.Seq's tests are still print and compare tests.
>
> regards,
> Bow
>
> [1] See the original feature request here:
> https://redmine.open-bio.org/issues/3221
> https://redmine.open-bio.org/issues/3220
> https://redmine.open-bio.org/issues/3219

I don't think a purely documentation focused project is eligible
for GSoC. But both ideas make sense separately from GSoC.

Regards,

Peter

From p.j.a.cock at googlemail.com  Thu Mar 21 13:36:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:36:24 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
Message-ID: <CAKVJ-_7VQ3AS=A-VJ9XxzDTwx0qvrCBEQdKmJwHf8NyXXu3aLg@mail.gmail.com>

On Thu, Mar 21, 2013 at 4:29 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Mar 21, 2013 at 4:11 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Right now we need to put this list of ideas on the wiki page (ready
>> for combining into the OBF page which will be shown to Google
>> to make our case for taking part in the GSoC 2013 program).
>> http://biopython.org/wiki/Google_Summer_of_Code
>>
>> If any of you as a potential mentor want to put up an outline
>> proposal, even better.
>>
>
> I've been wondering about potential GSoC projects which I'd
> be interested in mentoring (or co-mentoring), and thus far I've
> only got one outline idea.
>
> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
> functionality (which does whole record parsing on demand)
> and extending this with lazy-loading or lazy-parsing (which
> has precedent in our BioSQL wrappers). For example, with
> whole genome FASTA files you may never need to load the
> entire sequence, but using an index system like tabix (or
> even actually using a tabix index) Biopython could provide
> a lazy-loading Seq object which extracts only the sequence
> region of interest on demand.
>
> The same idea applies to richer file formats too, like EMBL
> and GenBank. ...
>
> Likewise, this makes sense for GTF/GFF/GFF3 ...

P.S. An example use case, http://www.biostars.org/p/64363/

Part of this work could include enhancements to the SeqRecord
handling of SeqFeatures - offering more than just the current
simple list - for example lookup by ID, dbxref, or position. That
would be nice to have now with the current in-memory parsers.

An old but still relevant example usecase:
http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features

Regards,

Peter

From eric.talevich at gmail.com  Thu Mar 21 13:42:19 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 21 Mar 2013 13:42:19 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
Message-ID: <CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>

On Thu, Mar 21, 2013 at 12:55 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > I like Michiel's idea, and I'll suggest two more:
> >
> > 1. Codon alignment & analysis:
> > - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
> protein
> > sequence alignment to a codon alignment. (Previously discussed)
>
> e.g.
> https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py


Well, check you out. Would you be interested in mentoring this project?


> > - dN/dS and the related functions needed to calculate it.
> > - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage
> of
> > codon alignments, including validation (testing for frame shifts etc.)
>
>
> http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis
>
> I see you've started fleshing this idea out on the wiki, which is great.
> Right now it seems a little on the light weight side - or is that
> deliberate
> (to see if a student can take this idea and come up with a solid
> project proposal in this area)? Things like model selection might
> be a fun extension - I can think of a local expert who would be
> great to get involved on the science side if he's interested.
>

I put up a quick sketch to avoid locking the wiki page for too long, but
also deliberately left it vague to see where the applicants take it. Model
selection would be cool, I added it. Local expert, also great.


> Alternatively this could include doing some more general work
> on the alignment object - for instance per-column-annotation
> for things like a consensus sequence - or an array-of-char
> implementation as an alternative to the list-of-SeqRecords
> we have now (with its poor column access speed).
>
> Peter
>

I wonder if that's something we could just do incrementally -- change the
MultipleSeqAlignment class to store a list-of-lists-of chars (or
list-of-strings), a list of SeqRecord-like husks (all the annotations, but
without the Seq itself) for each row, a list of column annotations, and a
single alphabet for the whole alignment.

How do you suppose the speed of that would compare to the current
list-of-SeqRecords, and also to that of a wrapped NumPy matrix? Would it be
a significant enough speed improvement to justify both replacing the
current implementation, and to make the NumPy approach less tempting (given
PyPy's progress toward including a compliant implementation)?
Alternatively, we could post a GSoC project for creating a separate
TurboAlignment class/module based on NumPy which would be mostly
interchangeable and interconvertible with the pure-Python version in the
Biopython core.

Speaking of which, should we also post the idea of storing sequences as an
efficient byte array, BioJava-style?

-Eric

From p.j.a.cock at googlemail.com  Thu Mar 21 13:59:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:59:10 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
	<CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>
Message-ID: <CAKVJ-_60s40p0PNihxayxFEWRg+jx=KbxRsOO5z=AXO0hgq8Mw@mail.gmail.com>

On Thu, Mar 21, 2013 at 5:42 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Thu, Mar 21, 2013 at 12:55 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> > I like Michiel's idea, and I'll suggest two more:
>> >
>> > 1. Codon alignment & analysis:
>> > - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
>> > protein
>> > sequence alignment to a codon alignment. (Previously discussed)
>>
>> e.g.
>> https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py
>
> Well, check you out. Would you be interested in mentoring this project?
>

If I'm not primary mentor on another project, I'd be open to co-mentoring
something on the alignment side.

>> > - dN/dS and the related functions needed to calculate it.
>> > - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage
>> > of
>> > codon alignments, including validation (testing for frame shifts etc.)
>>
>>
>> http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis
>>
>> I see you've started fleshing this idea out on the wiki, which is great.
>> Right now it seems a little on the light weight side - or is that
>> deliberate
>> (to see if a student can take this idea and come up with a solid
>> project proposal in this area)? Things like model selection might
>> be a fun extension - I can think of a local expert who would be
>> great to get involved on the science side if he's interested.
>
>
> I put up a quick sketch to avoid locking the wiki page for too long, but
> also deliberately left it vague to see where the applicants take it. Model
> selection would be cool, I added it. Local expert, also great.

If he's available and willing, yes. I've not mentioned this to him
yet so no promises - the idea only occurred to me while writing
that email ;)

>>
>> Alternatively this could include doing some more general work
>> on the alignment object - for instance per-column-annotation
>> for things like a consensus sequence - or an array-of-char
>> implementation as an alternative to the list-of-SeqRecords
>> we have now (with its poor column access speed).
>>
>> Peter
>
>
> I wonder if that's something we could just do incrementally -- change the
> MultipleSeqAlignment class to store a list-of-lists-of chars (or
> list-of-strings), a list of SeqRecord-like husks (all the annotations, but
> without the Seq itself) for each row, a list of column annotations, and a
> single alphabet for the whole alignment.
>
> How do you suppose the speed of that would compare to the current
> list-of-SeqRecords, and also to that of a wrapped NumPy matrix? Would it be
> a significant enough speed improvement to justify both replacing the current
> implementation, and to make the NumPy approach less tempting (given PyPy's
> progress toward including a compliant implementation)? Alternatively, we
> could post a GSoC project for creating a separate TurboAlignment
> class/module based on NumPy which would be mostly interchangeable and
> interconvertible with the pure-Python version in the Biopython core.

When I said array-of-char I did have NumPy in mind, and PyPy does now
cope with two or more dimensional arrays in NumPyPy. Note that NumPy
handles both row and column orientated arrays with a simple class init
option, so this can easily be setup to favour row or column access.

Last time I did anything with the alignment object where column access
was a bottleneck (calculating mutual information between columns), I
just loaded all the columns into memory as a list of strings, and computed
on that. It worked very nicely.

> Speaking of which, should we also post the idea of storing sequences as an
> efficient byte array, BioJava-style?

I'd wondered about that (in combination with the discussion about strict
alphabet checking), but is there enough for a whole GSoC project?
Related to this one could look at something with k-mer hashes...

(Its good to see lots of possible project ideas bouncing around)

Peter

From chapmanb at 50mail.com  Fri Mar 22 08:48:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 22 Mar 2013 08:48:34 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
Message-ID: <87zjxvsiql.fsf@fastmail.fm>


Peter;

> I've been wondering about potential GSoC projects which I'd
> be interested in mentoring (or co-mentoring), and thus far I've
> only got one outline idea.
>
> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
> functionality (which does whole record parsing on demand)
> and extending this with lazy-loading or lazy-parsing (which
> has precedent in our BioSQL wrappers). For example, with
> whole genome FASTA files you may never need to load the
> entire sequence, but using an index system like tabix (or
> even actually using a tabix index) Biopython could provide
> a lazy-loading Seq object which extracts only the sequence
> region of interest on demand.

This sounds incredibly useful. It's definitely worthwhile writing up if
you'll have time this summer to mentor it.

> Likewise, this makes sense for GTF/GFF/GFF3 where you
> would index the features, and also if present index the
> embedded FASTA sequence at the end of the file.

I'm cc'ing Ryan, who has been thinking about similar work as part of
gffutils. We're planning now on an approach that takes the BCBio.GFF
parsing and rolls it into gffutils so we can parse, index in a SQLite
database and expose as Biopython objects. Here is some initial
discussion and planning:

https://github.com/daler/gffutils/issues/2
https://docs.google.com/document/d/15l_yZ_pge22ETw-pz2g4NWRAUAccmr1MYPmqXbj1Jl8/edit?usp=sharing

Brad

From dalerr at niddk.nih.gov  Fri Mar 22 12:20:45 2013
From: dalerr at niddk.nih.gov (Ryan Dale)
Date: Fri, 22 Mar 2013 12:20:45 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <87zjxvsiql.fsf@fastmail.fm>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
	<87zjxvsiql.fsf@fastmail.fm>
Message-ID: <514C84DD.9070306@niddk.nih.gov>

Hi Brad & Peter -


On 03/22/2013 08:48 AM, Brad Chapman wrote:
> Peter;
>
>> I've been wondering about potential GSoC projects which I'd
>> be interested in mentoring (or co-mentoring), and thus far I've
>> only got one outline idea.
>>
>> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
>> functionality (which does whole record parsing on demand)
>> and extending this with lazy-loading or lazy-parsing (which
>> has precedent in our BioSQL wrappers). For example, with
>> whole genome FASTA files you may never need to load the
>> entire sequence, but using an index system like tabix (or
>> even actually using a tabix index) Biopython could provide
>> a lazy-loading Seq object which extracts only the sequence
>> region of interest on demand.
> This sounds incredibly useful. It's definitely worthwhile writing up if
> you'll have time this summer to mentor it.

Agreed - a general, lazy-loading/lazy-parsing, indexed mechanism for 
accessing data annotation-like file formats would be fantastic.

>> Likewise, this makes sense for GTF/GFF/GFF3 where you
>> would index the features, and also if present index the
>> embedded FASTA sequence at the end of the file.
> I'm cc'ing Ryan, who has been thinking about similar work as part of
> gffutils. We're planning now on an approach that takes the BCBio.GFF
> parsing and rolls it into gffutils so we can parse, index in a SQLite
> database and expose as Biopython objects. Here is some initial
> discussion and planning:
>
> https://github.com/daler/gffutils/issues/2
> https://docs.google.com/document/d/15l_yZ_pge22ETw-pz2g4NWRAUAccmr1MYPmqXbj1Jl8/edit?usp=sharing

As Peter pointed out on the GitHub issues page, what he has in mind is 
more general than just GFF/GTF, and I see gffutils as extending upon a 
specific subset of the functionality he proposes.

For example, there are common use-cases that I think make sense for a 
GFF/GTF-only library (say, adding new annotations for introns, as 
inferred from the isoform + exon annotations) that might not be readily 
generalizable to all annotation-like file formats. But if this general 
indexing approach were already available, then gffutils could just be a 
wrapper around that, adding the specific GFF/GTF functionality as 
another layer.

Then again . . . currently gffutils imports GFF data into a sqlite3 
database, so data are persistent and both read/write.  For the 
intron-inferring example, we simply add new records to the db, but with 
an indexing approach, the file would presumably have to be re-indexed 
before reading again.  So how you'd like to use your GFF files 
(read-only vs read/write) would influence which strategy you'd chooses.

So I think there's actually smaller-than-expected overlap between 
gffutils and Peter's general indexing idea, and in the context of GSoC, 
I'm not sure you'd have to take gffutils into account.  But gffutils 
would certainly benefit from general indexing, especially when 
retrieving sequences for features!

-ryan


From mjldehoon at yahoo.com  Tue Mar 26 09:21:35 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 26 Mar 2013 06:21:35 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
Message-ID: <1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi all,


Speaking of which, we have a Biopython Structural
 Bioinformatics FAQ (i.e. how to use the Bio.PDB module) on the Biopython website with additional information on Bio.PDB, including some information on things that are not in the main Biopython Tutorial. Perhaps this is a good time to integrate this FAQ into the main documentation?


We could also update it a bit because it's been a while and there are some different things here and there. And additions too.


I went over the Biopython Structural Bioinformatics FAQ and integrated it into the main Biopython tutorial; see

biopython.org/DIST/docs/tutorial/Tutorial-dev.html
or
biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

Though I think everything is there, it may be good if somebody more experienced with Bio.PDB were to look it over to see if it still makes sense.

In addition, I converted the Biopython Structural Bioinformatics FAQ to our wiki format and added it to our wiki documentation; see
http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ
This wiki now contains the exact same information (except for some minor updates/fixes) as the PDF with the Biopython Structural Bioinformatics FAQ that we have on the Biopython website.

I guess with this we can remove the lyx/tex source code of the Biopython Structural Bioinformatics FAQ from the git repository, as well as the PDF from the Biopython website. Any objections?

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Tue Mar 26 09:53:52 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Mar 2013 13:53:52 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
	<1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>

On Tue, Mar 26, 2013 at 1:21 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> I guess with this we can remove the lyx/tex source code of the Biopython Structural Bioinformatics FAQ from the git repository, as well as the PDF from the Biopython website. Any objections?
>

Good work Michiel :)

I would suggest making a final revision to the Biopython Structural
Bioinformatics
FAQ to explain this document is now obsolete, and where the information has
moved to. Commit that to git, and put the final PDF online replacing the current
version. That way anyone looking at the PDF online (or the git
history) will have
a clear route to finding the current information.

Thanks,

Peter

From anaryin at gmail.com  Tue Mar 26 09:54:55 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Mar 2013 14:54:55 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>
References: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
	<1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
	<CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>
Message-ID: <CAJ9sUYN5CBMYrpXpK=pB_v5tVbT6mjLNs_9oZ+R+-FR6RWwa-A@mail.gmail.com>

Great work!

I'll go over it in the next few days.


2013/3/26 Peter Cock <p.j.a.cock at googlemail.com>

> On Tue, Mar 26, 2013 at 1:21 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >
> > I guess with this we can remove the lyx/tex source code of the Biopython
> Structural Bioinformatics FAQ from the git repository, as well as the PDF
> from the Biopython website. Any objections?
> >
>
> Good work Michiel :)
>
> I would suggest making a final revision to the Biopython Structural
> Bioinformatics
> FAQ to explain this document is now obsolete, and where the information has
> moved to. Commit that to git, and put the final PDF online replacing the
> current
> version. That way anyone looking at the PDF online (or the git
> history) will have
> a clear route to finding the current information.
>
> Thanks,
>
> Peter
>

From lara.vignotto at gmail.com  Wed Mar 27 10:09:50 2013
From: lara.vignotto at gmail.com (Lara Vignotto)
Date: Wed, 27 Mar 2013 15:09:50 +0100
Subject: [Biopython-dev] [GSoC] Further info about Codon alignment idea
Message-ID: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>

Hello,
I'm a student from Italy. I'm attending the first year of Biotechnology at
the University of Udine, and I'm interested about the Codon alignment and
analysis project proposed fot the Google Summer of Code 2013.
Since I would like to know if I have got the skills required to contribute,
can you tell me more about the project?

Regards,
Lara Vignotto

From 88whacko at gmail.com  Thu Mar 28 06:39:07 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Thu, 28 Mar 2013 11:39:07 +0100
Subject: [Biopython-dev] New contributor
In-Reply-To: <CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
	<CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>
Message-ID: <CANKZbvPB549up+O7Q4fGmmOX2EBshYtxBbRA6jmqrMsrVkD7yg@mail.gmail.com>

Thank you for the great feedback Peter.

I'll write a test case for Bio.Alphabet then since I couldn't find any.
When it's ready I'll request a pull.

Thank you again!
Andrea


2013/3/21 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Mar 20, 2013 at 6:10 PM, Andrea Rizzi <88whacko at gmail.com> wrote:
> > Thank you for your welcome Michiel!
> >
> > I will looking for a good project to work on in the next few days and I
> > will let you know soon. Meanwhile I've started to read some code to
> become
> > familiar with the modules and I bumped into few small bugs concerning the
> > Seq objects, in particular I found:
> >
> > 1) a duplicated test method name (one test in test_Seq_objs.py wasn't
> > performed);
> > 2) an error in Alphabet._case_less().
>
> Well spotted - changes applied to the master, thanks.
>
> > I've also expanded a little bit the documentation and I've substituted
> > tostring() method with the suggested str() method in a function of
> > MutableSeq. The branch is located here
> >
> > https://github.com/andrrizzi/biopython/tree/seq-branch
> >
> > I'm not sure if it is more comfortable for you to merge this kind of
> > commits from a git branch or it is more advisable to open a ticket and
> > create a patch. Anyway if you think this small commits may be useful,
> feel
> > free to use them.
>
> If you're happy on GitHub, a pull request is simplest. I've looked
> at these changes one by one and applied and/or commented
> on them.
>
> (We're debating moving our issue tracker from RedMine to
> GitHub, which would make things a little easier in future).
>
> Thank you!
>
> Peter
>

From p.j.a.cock at googlemail.com  Thu Mar 28 09:39:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 13:39:57 +0000
Subject: [Biopython-dev] [GSoC] Further info about Codon alignment idea
In-Reply-To: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>
References: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>
Message-ID: <CAKVJ-_7LfAmbnLvUXtLaS+KmQ+3MXtVf5MLWcHbEWxmmfvSNQA@mail.gmail.com>

On Wed, Mar 27, 2013 at 2:09 PM, Lara Vignotto <lara.vignotto at gmail.com> wrote:
> Hello,
> I'm a student from Italy. I'm attending the first year of Biotechnology at
> the University of Udine, and I'm interested about the Codon alignment and
> analysis project proposed fot the Google Summer of Code 2013.
> Since I would like to know if I have got the skills required to contribute,
> can you tell me more about the project?
>
> Regards,
> Lara Vignotto

Hi Lara,

Welcome and thank you for your interest in taking part in GSoC 2013.

The background discussion to the outline idea on the wiki was here:
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010449.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010471.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010474.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010475.html
(I think that was all the posts - check the archive to be sure).

The text of the wiki is hopefully enough to spark your interest - what
we're really like to see is a student intrigued by the idea and driven
to expand the topic into a full project proposal. If for example your
current course work included some phylogenetics that might help
give you perspective about what is useful and worth adding to
Biopython. You should probably also have a look at the NESCent
GSoC project ideas if it is the phylogenetic side that really interest
you - in previous years Biopython has mentored GSoC students
with NESCent:
http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013

You would also need to be competent with Python - although if you
also know and love Perl or Ruby (etc) there might be a mentor
willing to supervise a related project with BioPerl or BioRuby -
that's good too from the wider OBF and Bio* perspective.

For tree traversal some back ground reading on things like
breadth first search and other algorithms for 'walking' the
tree would be a good idea (see also the Python os.path
module for 'walking' a file system tree).

I'm sure there will be other technical things to learn about and use,
depending on where a GSoC project based on this idea went.

Did that help? Is there something more specific I can try to
answer?

Regards,

Peter

From p.j.a.cock at googlemail.com  Thu Mar 28 11:44:11 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 15:44:11 +0000
Subject: [Biopython-dev] Fwd: [biopython] Custom GenBank locus length (#171)
In-Reply-To: <biopython/biopython/pull/171@github.com>
References: <biopython/biopython/pull/171@github.com>
Message-ID: <CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>

For those not getting the pull request emails from GitHub,

---------- Forwarded message ----------
From: Marco Galardini <notifications at github.com>
Date: Thu, Mar 28, 2013 at 3:19 PM
Subject: [biopython] Custom GenBank locus length (#171)
To: biopython/biopython <biopython at noreply.github.com>


Instead of an exception, raise a warning, so the file is saved and the
user can decide to correct the error.

I don't know if this is a good pratice, but I have some GenBank files
provided by the JGI/DOE with locus names longer than 16 chars, so I
guess that providing a warning to the user instead of a complete
failure could be better.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/mgalardini/biopython patch-1

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/171

Commit Summary

Custom GenBank locus length

File Changes

M Bio/SeqIO/InsdcIO.py (4)

Patch Links:

https://github.com/biopython/biopython/pull/171.patch
https://github.com/biopython/biopython/pull/171.diff

From marco.galardini at unifi.it  Thu Mar 28 11:54:38 2013
From: marco.galardini at unifi.it (Marco Galardini)
Date: Thu, 28 Mar 2013 16:54:38 +0100
Subject: [Biopython-dev] Fwd: [biopython] Custom GenBank locus length
 (#171)
In-Reply-To: <CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>
References: <biopython/biopython/pull/171@github.com>
	<CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>
Message-ID: <515467BE.7090105@unifi.it>

Good afternoon everyone,

Actually, i have been testing a bit more and some other changes may be 
needed (sorry about that, this is my first change to the biopython code).
The assertions on the lines length still fail, so my guess is that 
probably it's not a good idea to try to write down a genbank with 
unusual identifiers (even if they are from JGI!).

Marco

On 03/28/2013 04:44 PM, Peter Cock wrote:
> For those not getting the pull request emails from GitHub,
>
> ---------- Forwarded message ----------
> From: Marco Galardini <notifications at github.com>
> Date: Thu, Mar 28, 2013 at 3:19 PM
> Subject: [biopython] Custom GenBank locus length (#171)
> To: biopython/biopython <biopython at noreply.github.com>
>
>
> Instead of an exception, raise a warning, so the file is saved and the
> user can decide to correct the error.
>
> I don't know if this is a good pratice, but I have some GenBank files
> provided by the JGI/DOE with locus names longer than 16 chars, so I
> guess that providing a warning to the user instead of a complete
> failure could be better.
>
> ________________________________
>
> You can merge this Pull Request by running
>
>    git pull https://github.com/mgalardini/biopython patch-1
>
> Or view, comment on, or merge it at:
>
>    https://github.com/biopython/biopython/pull/171
>
> Commit Summary
>
> Custom GenBank locus length
>
> File Changes
>
> M Bio/SeqIO/InsdcIO.py (4)
>
> Patch Links:
>
> https://github.com/biopython/biopython/pull/171.patch
> https://github.com/biopython/biopython/pull/171.diff
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


-- 
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------


From p.j.a.cock at googlemail.com  Thu Mar 28 14:00:38 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 18:00:38 +0000
Subject: [Biopython-dev] stdout/stderr handling oddity
Message-ID: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>

Hi all,

While looking at the BWA wrapper from Saket Choudhary
https://github.com/biopython/biopython/pull/167 and the
associated enhancement to the __call__ functionality of the
command line wrapper base class, I wrote a couple of unit
tests - which have left me a little puzzled:

https://github.com/biopython/biopython/commit/3f5d4c442424a7ca33ae0bafa60c840e80ae2fda

Could a few of you try running this test_Application.py
file and confirm it works as is, and try uncommenting the
two problem test cases?

(I'm curious if the echo test works as intended on a plain
Windows machine without cygwin installed - I hope so).

Unless anyone else can explain this, I think the next step
is a simple test program which produces predictable
output to both stdout and stderr, just in case this is due
to there being no stderr output in these tests.

e.g. Print integers 1, 2, 3, 4, ..., to some sensible limit,
like 20, where non-primes are on stdout while primes on
stderr.

Peter

From arklenna at gmail.com  Thu Mar 28 16:54:11 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 28 Mar 2013 16:54:11 -0400
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
Message-ID: <CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>

Hi Peter,

On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
Application __init__.py causes the tests to pass for me.

Lenna

From saketkc at gmail.com  Thu Mar 28 16:57:54 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 29 Mar 2013 02:27:54 +0530
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
Message-ID: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>

Yes.
And the reason is this
:http://stackoverflow.com/questions/2368967/bad-file-descriptor-error

On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi Peter,
>
> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
> Application __init__.py causes the tests to pass for me.
>
> Lenna
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From saketkc at gmail.com  Thu Mar 28 17:00:00 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 29 Mar 2013 02:30:00 +0530
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
	<CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
Message-ID: <CAEDHeisqqY2TcQz5wGsDfduMuhGjs61VbBffZx7Mb0+3nseVYA@mail.gmail.com>

Forgot to add : Tested on Ubuntu 12.04

On 29 March 2013 02:27, Saket Choudhary <saketkc at gmail.com> wrote:
> Yes.
> And the reason is this
> :http://stackoverflow.com/questions/2368967/bad-file-descriptor-error
>
> On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
>> Hi Peter,
>>
>> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
>> Application __init__.py causes the tests to pass for me.
>>
>> Lenna
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From p.j.a.cock at googlemail.com  Thu Mar 28 18:11:11 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 22:11:11 +0000
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
	<CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
Message-ID: <CAKVJ-_794tzTdsmBiYfcxs0y_5ksjK5rC2qKBUrrUUogwJhLzA@mail.gmail.com>

> On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
>> Hi Peter,
>>
>> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
>> Application __init__.py causes the tests to pass for me.
>>
>> Lenna

On Thu, Mar 28, 2013 at 8:57 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> Yes.
> And the reason is this
> :http://stackoverflow.com/questions/2368967/bad-file-descriptor-error
>

Thank you both - I am kicking myself now - maybe I should have
taken another sick day this week instead of returning to work? ;)

Fixed:
https://github.com/biopython/biopython/commit/bba2acbf3d690ad7b99e94ac8ead6763b1d05ab8

I guess no one had bothered to using this option to send stderr
to /dev/null - or if they had never reported this error. The only
thing which puzzles me is why this worked for stdout. Odd.

Cheers,

Peter

From p.j.a.cock at googlemail.com  Fri Mar 29 07:54:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 29 Mar 2013 11:54:33 +0000
Subject: [Biopython-dev] Fwd: [biopython] Fix Biopython installation with
	pip (#172)
In-Reply-To: <biopython/biopython/pull/172@github.com>
References: <biopython/biopython/pull/172@github.com>
Message-ID: <CAKVJ-_6RBuZiET0bVpWohPj9WOj5Dnt0aX2uR3WJhPr-RxmegA@mail.gmail.com>

Hi Brad,

This sounds sensible in principle - it just needs some hands on testing
on various systems - any volunteers who use PIP and virtual envs?

Thanks,

Peter

---------- Forwarded message ----------
From: Brad Chapman <notifications at github.com>
Date: Fri, Mar 29, 2013 at 11:47 AM
Subject: [biopython] Fix Biopython installation with pip (#172)
To: biopython/biopython <biopython at noreply.github.com>


Hi all;
This is yet another take on making Biopython install nicely with pip
in virtual environments. This avoids adding numpy as an explicit
dependency and instead uses it if present or skips it if not.

The problem with the previous install_requires approach is that pip
doesn't build and install all requirements before setting up
Biopython, so Biopython will fail with a numpy missing error.
Additionally, our old approach drags in numpy so creates a heavyweight
dependency for isolated environments.

The new approach requires users to explicitly install numpy if needed
but doesn't penalize them if it's not present.

I submitted as a pull request for documentation and feedback from
anyone. If y'all agree, merge away. Thanks,
Brad

________________________________

You can merge this Pull Request by running

  git pull https://github.com/chapmanb/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/172

Commit Summary

Improve Biopython installation with pip: avoid including numpy as
dependency when automated. Instead explicitly avoid needing numpy
installed to continue
Add helpful comment on pip dependency management

File Changes

M setup.py (38)

Patch Links:

https://github.com/biopython/biopython/pull/172.patch
https://github.com/biopython/biopython/pull/172.diff

From chapmanb at 50mail.com  Fri Mar  1 02:25:42 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Feb 2013 21:25:42 -0500
Subject: [Biopython-dev] Coming soon: BOSC/Broad Hackathon,
	SciPy Bioinformatics, BOSC Codefest
Message-ID: <87lia7ua8p.fsf@fastmail.fm>


Hi all; 
There are some upcoming coding events and conferences of interest to open source
biology programmers:

- BOSC/Broad Interoperability Hackathon -- This is a two day coding session at
  the Broad Institute in Cambridge, MA on April 7-8 focused on improving tool
  interoperability.
  
  Sign up and details: http://j.mp/XJT6ew
  
- SciPy 2013 -- The Scientific Python conference is June 26-27 in Austin and has
  a Bioinformatics mini-symposia this year. They're doing some great work like
  IPython, NumPy, SciPy and scikit-learn; and this is a nice opportunity to reach a
  new set of like minded programmers and expand the open source bioinformatics
  community.
  
  Bioinformatics mini-symposia: http://j.mp/Z4xxXB
  Abstract details: http://conference.scipy.org/scipy2013/about.php
  
- Codefest at the Bioinformatics Open Source Conference -- This year BOSC is
  taking place in Berlin from July 19-20 and we'll have a two day coding session
  before the conference. This is the 4th year of Codefests and they've proven to
  be a productive and fun time to work collectively on open source projects.

  Sign up and details: http://www.open-bio.org/wiki/Codefest_2013
  BOSC conference: http://www.open-bio.org/wiki/BOSC_2013

Here are the key dates for the events and abstracts:

March   20, 2013: SciPy abstracts due
April  7-8, 2013: BOSC/Broad Interoperability Hackathon, Cambridge, MA
April   12, 2013: BOSC abstracts due
June 24-29, 2013: SciPy in Austin, TX
July 17-18, 2013: Codefest 2013, Berlin
July 19-20, 2013: BOSC 2013, Berlin

Looking forward to seeing everyone this spring and summer for plenty of fun
science and code,
Brad


From chapmanb at 50mail.com  Fri Mar  1 02:36:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Feb 2013 21:36:34 -0500
Subject: [Biopython-dev] [ANN] SciPy2013: Call for abstracts
In-Reply-To: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
	<CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
Message-ID: <87ppzjsv65.fsf@fastmail.fm>


Peter;
Thanks for sending this out. I'm helping with the organization of the
SciPy bioinformatics session thanks to Peter's recommendation and wrote
up a little bit about the types of abstracts that would fit will with
the overall theme of SciPy:

http://j.mp/Z4xxXB

This is a great chance to connect with another open source scientific
community so definitely send in an abstract if this is of interest; the
deadline is coming up next month: March 20th. Austin also has awesome
music and barbecue in addition to science and hacking so lots of reasons
to attend,
Brad


> The new bioinformatics mini-symposium this year makes SciPy 2013
> especially interesting.
>
> Peter
>
> ---------- Forwarded message ----------
> From: *Jonathan Rocher*
> Date: Wednesday, February 27, 2013
> Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
> To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> [Apologies for cross-posts]
>
> Dear all,
>
> The annual SciPy Conference (Scientific Computing with
> Python)<http://conference.scipy.org/scipy2013/about.php> allows
> participants from academic, commercial, and governmental organizations to
> showcase their latest projects, learn from skilled users and developers,
> and collaborate on code development. *The deadline for abstract submissions
> is March 20th, 2013. *
>
> Submissions are welcome that address general Scientific Computing with
> Python, one of the two special themes for this years conference (machine
> learning & reproducible science), or the domain-specific
> mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
> during the conference (Meteorology, climatology, and atmospheric and
> oceanic science, Astronomy and astrophysics, Medical imaging,
> Bio-informatics).
>
> Please submit your abstract at the SciPy 2013 website abstract submission
> form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
> Abstracts will be accepted for posters or presentations. Optional papers to
> be published in the conference proceedings will be requested following
> abstract submission. This year the proceedings will be made available prior
> to the conference to help attendees navigate the conference.
>
> We look forward to an exciting and interesting set of talks, posters, and
> discussions and hope to see you at the conference.
> The SciPy 2013 Program Committee Chairs
>
> Matt McCormick, Kitware, Inc.
> Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From 88whacko at gmail.com  Sat Mar  2 16:49:48 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Sat, 2 Mar 2013 17:49:48 +0100
Subject: [Biopython-dev] New contributor
Message-ID: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>

Hello!
My name is Andrea Rizzi and I'm a master's student in computer science and
computational biology. I would be glad to help you developing biopython.
I've used the library quite extensively but I'm mostly familiar with
handling sequences, MSAs and PDB files.

I've read through the small contributing guide on the wiki and on the
tutorial and I thought I could start with something relatively
straightforward like writing/completing some unit tests (if I understood
correctly there's a fairly strong need for them). I've good knowledge of
both git and unittest. Anyway any task is actually fine to me :) .

If you agree I'll try to look for a module that needs some more testing (or
maybe you have one to suggest me), otherwise I could just go to the bug
tracker and try to help out fixing some bugs.

-- 
-- Andrea


From p.j.a.cock at googlemail.com  Sun Mar  3 12:00:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 3 Mar 2013 12:00:25 +0000
Subject: [Biopython-dev] Fwd: GSoC 2013 is ON
In-Reply-To: <20130303112326.GA5638@thebird.nl>
References: <20130303112326.GA5638@thebird.nl>
Message-ID: <CAKVJ-_6DDE8LJ-x3XZXd-hgZ5wo7XYsx_11z6+V3Se9F0p-t3w@mail.gmail.com>

Time to start preparations for Google Summer of Code 2013 :)

---------- Forwarded message ----------
From: *Pjotr Prins*
Date: Sunday, March 3, 2013
Subject: GSoC 2013 is ON

Game on! GSoC 2013 is ON. I am running with the OBF project
administration this year for the Google Summer of code (GSoC).
First and foremost I want to thank Robert Buels and others for making
OBF/GSoC a success in the previous three years! This year, Robert, Chris
Fields and Hilmar Lapp will act as backup administrators.

The deadline for the OBF application for GSoC2013 as a mentoring
organisation is Friday March 29! See

  http://www.google-melange.com/gsoc/events/google/gsoc2013

Similar to previous years, each Bio* project needs to update and add project
ideas on the project's individual OBF wiki page and create links from the
main
OBF page at

  http://www.open-bio.org/wiki/Google_Summer_of_Code

(we will update the main information on that page soon).

So, for each of the OBF projects that wants to do GSoC again this
year:

1. Update the list of project ideas on your project's GSoC page (BioPython,
   BioPerl, BioRuby, etc). Add new ones, remove ones that have already been
done
   or no longer relevant, etc. For an example see

     http://bioruby.open-bio.org/wiki/Google_Summer_of_Code

2. Update the final list of project ideas on the main OBF GSoC page
   to match.

     http://www.open-bio.org/wiki/Google_Summer_of_Code

3. Register with gsoc at lists.open-bio.org <javascript:;>

4. Announce it on that list when you are ready :)

Anyone can submit a project idea! Former GSoC students are especially
encouraged to contribute ideas to the mailing lists.

Please have the updates done by Friday March 22nd. The number and quality of
the project ideas are part of the evaluation process for whether OBF is
accepted as a Summer of Code organisation again this year, so let's come up
with some good ones!

Pj. (Pjotr Prins)

Important dates:

  * March 22nd: Finalise project ideas
  * March 29th: Deadline OBF mentoring organisation submission to Google

http://www.open-bio.org/wiki/Google_Summer_of_Code


From saketkc at gmail.com  Mon Mar  4 10:59:26 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Mon, 4 Mar 2013 16:29:26 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
Message-ID: <CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>

Hi,

I have updated the code here :
https://github.com/saketkc/biopython/tree/bwa_wrapper

I have added unittests for the wrapper. And yes, this did help me in
fixing a lot of minor bugs in my original wrapper.

@Peter :  Is this 'pull request' ready ?

Thanks

Saket

On 19 February 2013 19:55, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>
>> Thanks Peter.
>>
>> I will add that. Any pointers to what would be a good reference test_aba.py
>> file in Tests/ directory for writing unit tests for this ?
>>
>> I have worked on BDD before but Unit Tests are new for me, so it may take
>> some time.I plan to finish it the coming week once my university
>> examinations are done
>>
>> Thanks
>>
>> Saket
>
> There's a chapter in the Tutorial about our test framework. In this
> case existing command line tool wrappers are the best reference,
> e.g. test_Emboss.py or test_Muscle.py
>
> Also if you want to use doctests and have them included in the
> test suite, add the module to the list in Tests/run_tests.py - however
> this does not handle optional dependencies (other than NumPy).
> Therefore all the application wrapper doctests to date have carefully
> avoided actually invoking the command line - and instead most
> print the string representation instead. This allows us to check
> the example use cases should run (and catches silly errors in
> the examples like a typo in an argument name).
>
> Thanks,
>
> Peter


From saketkc at gmail.com  Tue Mar  5 17:26:57 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 5 Mar 2013 22:56:57 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>

I had this idea of an online biopython shell on the lines of  bioruby shell :
http://bioruby.open-bio.org/wiki/BioRubyOnRails


On 13 February 2013 07:38, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC?
>
> Best,
> -Michiel.
>
> --- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>> From: Peter Cock <p.j.a.cock at googlemail.com>
>> Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
>> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
>> Date: Tuesday, February 12, 2013, 12:51 PM
>> Hello all,
>>
>> Google recently confirmed they will be running Google Summer
>> of Code 2013,
>> and we (Biopython and the other Bio* projects) would hope to
>> be accepted again
>> under the Open Bioinformatics Foundation as in previous
>> years:
>> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>>
>> It would be great to start coming up with potential project
>> ideas, both larger
>> pieces of work suitable for GSoC but also smaller tasks for
>> other project
>> students, or 'low hanging fruit' for potential contributors
>> to cut
>> their teeth on.
>>
>> See also http://biopython.org/wiki/Active_projects
>> and the ideas list there.
>>
>> Regards,
>>
>> Peter
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Fri Mar  8 16:08:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 8 Mar 2013 16:08:46 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
Message-ID: <CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>

On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> I had this idea of an online biopython shell on the lines of  bioruby shell :
> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>

That screenshot makes me think of http://ipython.org/ - is that similar?

Peter


From redmine at redmine.open-bio.org  Fri Mar  8 16:49:48 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 8 Mar 2013 16:49:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3422] (New) Missing
Message-ID: <redmine.issue-3422.20130308164948@redmine.open-bio.org>


Issue #3422 has been reported by Jared Sampson.

----------------------------------------
Bug #3422: Missing 
https://redmine.open-bio.org/issues/3422

Author: Jared Sampson
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/bookdoc_130101.dtd


When using Entrez.efetch to retrieve an XML file, I get the following warning about a missing DTD: bookdoc_130101.dtd

===

/path/to/my/virtualenv/lib/python2.7/site-packages/Bio/Entrez/Parser.py:522: UserWarning: Unable to load DTD file bookdoc_130101.dtd.

Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez.
Though most of NCBI's DTD files are included in the Biopython distribution,
sometimes you may find that a particular DTD file is missing. While we can
access the DTD file through the internet, the parser is much faster if the
required DTD files are available locally.

For this purpose, please download bookdoc_130101.dtd from

http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/bookdoc_130101.dtd

and save it either in directory

/path/to/my/virtualenv/lib/python2.7/site-packages/Bio/Entrez/DTDs

or in directory

/Users/me/.biopython/Bio/Entrez/DTDs

in order for Bio.Entrez to find it.

Alternatively, you can save bookdoc_130101.dtd in the directory
Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython.

Please also inform the Biopython developers about this missing DTD, by
reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing
list and emailing us, so that we can include it with the next release of
Biopython.

Proceeding to access the DTD file through the internet...

  warnings.warn(message)

===

Also, the bugzilla.open-bio.org URL mentioned comes up empty.

Thanks, 
Jared Sampson


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From saketkc at gmail.com  Fri Mar  8 18:30:03 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 9 Mar 2013 00:00:03 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
	<CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
Message-ID: <CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>

It is essentially an online RoR based application that allows you to
try bioruby through your browser without the need of a bioruby native
install . I was thinking of a django/flask application that would
essentially be  a  playground for trying out biopython


Saket

On 08/03/2013, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>> I had this idea of an online biopython shell on the lines of  bioruby
>> shell :
>> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>>
>
> That screenshot makes me think of http://ipython.org/ - is that similar?
>
> Peter
>


From chapmanb at 50mail.com  Sat Mar  9 16:06:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 09 Mar 2013 11:06:34 -0500
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAEDHeivFbREH_QgaV2DGSVEe-h7C8F5CW8PM3ovQSiU22mJ3pA@mail.gmail.com>
	<CAKVJ-_40-y0QAY1hr363tffnzMDJObToa_QSiw=_pqhkQrHuPA@mail.gmail.com>
	<CAEDHeiuBKPo8TppiNccWg4QK7saovmFiDaK1=3YKTa2q5YSa_Q@mail.gmail.com>
Message-ID: <87wqtgeewl.fsf@fastmail.fm>


Saket and Peter;
What you're describing is what Ipython provides, a web-based way to edit
and interact with Python code. There are some projects that build on top
of it to provide more of a playground environment like you're describing:

http://continuum.io/wakari.html
https://github.com/Exhibitionist/Exhibitionist

Hope this helps,
Brad


> It is essentially an online RoR based application that allows you to
> try bioruby through your browser without the need of a bioruby native
> install . I was thinking of a django/flask application that would
> essentially be  a  playground for trying out biopython
>
>
> Saket
>
> On 08/03/2013, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Tue, Mar 5, 2013 at 5:26 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>> I had this idea of an online biopython shell on the lines of  bioruby
>>> shell :
>>> http://bioruby.open-bio.org/wiki/BioRubyOnRails
>>>
>>
>> That screenshot makes me think of http://ipython.org/ - is that similar?
>>
>> Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From xbello at gmail.com  Tue Mar 12 09:36:35 2013
From: xbello at gmail.com (Xabier Bello)
Date: Tue, 12 Mar 2013 10:36:35 +0100
Subject: [Biopython-dev] Consumer of "KW" in embl format
Message-ID: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>

Hi:

I don't know if this is the right way to do this. The code:

records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
for record in records:
    print record.annotations["keywords"]

Doesn't work

I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():

elif line_type == 'KW':
    consumer.keywords(data.rstrip(";"))

And now it seems to parse the keyword lines.

Regards.


From p.j.a.cock at googlemail.com  Tue Mar 12 09:54:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 09:54:51 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
Message-ID: <CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:36 AM, Xabier Bello <xbello at gmail.com> wrote:
> Hi:
>
> I don't know if this is the right way to do this. The code:
>
> records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
> for record in records:
>     print record.annotations["keywords"]
>
> Doesn't work
>
> I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():
>
> elif line_type == 'KW':
>     consumer.keywords(data.rstrip(";"))
>
> And now it seems to parse the keyword lines.
>
> Regards.

Good idea, although it needs a little more generalisation for handling
multiple keywords - a list of strings seems sensible here. Quoting
ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt

<quote>
3.4.6  The KW Line
The KW (KeyWord) lines provide information which can be used to generate
cross-reference indexes of the sequence entries based on functional,
structural, or other categories deemed important.
The format for a KW line is:
     KW   keyword[; keyword ...].
More than one keyword may be listed on each KW line; the keywords are
separated by semicolons, and the last keyword is followed by a full
stop. Keywords may consist of more than one word, and they may contain
embedded blanks and stops. A keyword is never split between lines.
An example of a keyword line is:
     KW   beta-glucosidase.
The keywords are ordered alphabetically; the ordering implies no hierarchy
of importance or function.  If an entry has no keywords assigned to it,
it will contain a single KW line like this:
     KW   .
</quote>

Likewise the GenBank parser should support the KEYWORDS line
too - and then writing the keywords out again too.

Is this something you'd like to work on, or should I do it?

(If you are interested in getting involved in Biopython development
this seems like a nice project to start with - not too complicated, but
large enough to make creating a fork on GitHub and your own
enhancement branch a good idea.)

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Mar 12 10:02:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 10:02:15 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
Message-ID: <CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:54 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 12, 2013 at 9:36 AM, Xabier Bello <xbello at gmail.com> wrote:
>> Hi:
>>
>> I don't know if this is the right way to do this. The code:
>>
>> records = SeqIO.parse(open("MyFile.embl", "r"), "embl")
>> for record in records:
>>     print record.annotations["keywords"]
>>
>> Doesn't work
>>
>> I've added to Bio/GenBank/Scanner.py, in _feed_header_lines():
>>
>> elif line_type == 'KW':
>>     consumer.keywords(data.rstrip(";"))
>>
>> And now it seems to parse the keyword lines.
>>
>> Regards.
>
> Good idea, although it needs a little more generalisation for handling
> multiple keywords - a list of strings seems sensible here. Quoting
> ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt
>
> <quote>
> 3.4.6  The KW Line
> The KW (KeyWord) lines provide information which can be used to generate
> cross-reference indexes of the sequence entries based on functional,
> structural, or other categories deemed important.
> The format for a KW line is:
>      KW   keyword[; keyword ...].
> More than one keyword may be listed on each KW line; the keywords are
> separated by semicolons, and the last keyword is followed by a full
> stop. Keywords may consist of more than one word, and they may contain
> embedded blanks and stops. A keyword is never split between lines.
> An example of a keyword line is:
>      KW   beta-glucosidase.
> The keywords are ordered alphabetically; the ordering implies no hierarchy
> of importance or function.  If an entry has no keywords assigned to it,
> it will contain a single KW line like this:
>      KW   .
> </quote>
>
> Likewise the GenBank parser should support the KEYWORDS line
> too - and then writing the keywords out again too.
>
> Is this something you'd like to work on, or should I do it?

To clarify - Biopython should already be reading and writing any
KEYWORDS line in GenBank files - the same data structure should
be used for EMBL files (your suggestion looks good, but an explicit
unit test covering single and multiple keywords would be ideal),
and then the EMBL writer updated to write this. i.e. code added in
Bio/SeqIO/InsdcIO.py

Peter


From p.j.a.cock at googlemail.com  Tue Mar 12 10:58:39 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 10:58:39 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
Message-ID: <CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>

On Tue, Mar 12, 2013 at 10:12 AM, Xabier Bello <xbello at gmail.com> wrote:
> I think I'm not that close to the Biopython code.
>
> I found a problem (I needed to read the Keywords), and solved it quick and
> dirty. In fact, it doesn't read multiline KW. I'm not sure I could implement
> that in a fair amount of time.
>
> Regards.

No problem - I've committed your fix, a basic test, and extended this for
multiple KW lines. As discussed I've thanked you in the NEWS file too.

https://github.com/biopython/biopython/commit/fc036dcdac22252a366647823a0c7c317c303313
https://github.com/biopython/biopython/commit/606ea9360d262d21c3e01eda66c4cf9118880d46

Updating the EMBL writer in Bio/SeqIO/InsdcIO.py should be a nice
small task for any volunteer wanting to make a first contribution...

(Potential Google Summer of Code students - Hint hint ;) )

Thank you Xabier,

Peter


From p.j.a.cock at googlemail.com  Tue Mar 12 14:40:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 14:40:16 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
	<CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>
	<CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
Message-ID: <CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>

On Tue, Mar 12, 2013 at 11:35 AM, Xabier Bello <xbello at gmail.com> wrote:
> Lets try it:
>
> Line 997:
> def _write_keywords(self, record):
>     #Put the keywords right after DE line.
>     self._write_multi_line("KW", "%s." % "; ".join(
>         record.annotations["keywords"]))
>     self.handle.write("XX\n")

Looks good - although there is a potential problem here with long keywords
where this does not avoid splitting a single keyword over multiple KW lines
(as specified in the EMBL specification). This is a corner case though...

> Line 1070:
> if "keywords" in record.annotations:
>     self._write_keywords(record)
>
> Note to self: learn to make diff patches and forks on github.

Good plan :)

Meanwhile, I committed that change:
https://github.com/biopython/biopython/commit/41470eac55a665d1cb1c7e73ebfd3c1df98af5ad

I added a little more testing, from which I think we may need to
do some work with some of the other EMBL fields like dbxrefs:
https://github.com/biopython/biopython/commit/07639dde32083f4f024616292a5c736e85770a4e

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Mar 12 15:13:23 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Mar 2013 15:13:23 +0000
Subject: [Biopython-dev] Consumer of "KW" in embl format
In-Reply-To: <CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>
References: <CAAGUp7UYfnSTga_gV=HP_yVjJXB1N=AazjDbr8_17pBUaJ_vAg@mail.gmail.com>
	<CAKVJ-_6nuW=sduSdf8zM-qtLkOWEO-b=XwF8=dqgrs1YCyrHnQ@mail.gmail.com>
	<CAKVJ-_7dbgJP8Hxaqp1q_5nazUytsohVvSwpvUOFH9DnDFjF5A@mail.gmail.com>
	<CAAGUp7Vvg+dizLYGC+T+M+iGkQk2Z0S+r9Y2s47irgDj87S3Vg@mail.gmail.com>
	<CAKVJ-_7VNEJNvpYxQV+FP9kvrt2C_tPYfY3YAeaLsPxqmgR+LA@mail.gmail.com>
	<CAAGUp7V6J=SWRMR6c-G9k3D=_9eQ6mwvmUKxLAO45nOxcQu2Kg@mail.gmail.com>
	<CAKVJ-_43m0H6ZWVjdCiZgdW=J+4=1dCioKxPiRPZwdVjc1j4Jg@mail.gmail.com>
Message-ID: <CAKVJ-_6T=4SNWw2=u=cGkV5Y68bhYqy0s9_P_Dg5AAH2B_dy7w@mail.gmail.com>

On Tue, Mar 12, 2013 at 2:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Mar 12, 2013 at 11:35 AM, Xabier Bello <xbello at gmail.com> wrote:
>> Lets try it:
>>
>> Line 997:
>> def _write_keywords(self, record):
>>     #Put the keywords right after DE line.
>>     self._write_multi_line("KW", "%s." % "; ".join(
>>         record.annotations["keywords"]))
>>     self.handle.write("XX\n")
>
> Looks good - although there is a potential problem here with long keywords
> where this does not avoid splitting a single keyword over multiple KW lines
> (as specified in the EMBL specification). This is a corner case though...

OK, not such a rare case:

$ python test_SeqIO_features.py
...
======================================================================
ERROR: test_cor6 (__main__.TestWriteRead)
Write and read back cor6_6.gb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_SeqIO_features.py", line 1105, in test_cor6
    write_read(os.path.join("GenBank", "cor6_6.gb"), "gb")
  File "test_SeqIO_features.py", line 35, in write_read
    compare_records(gb_records, gb_records2)
  File "test_SeqIO_features.py", line 110, in compare_records
    if not compare_record(old,new,expect_minor_diffs):
  File "test_SeqIO_features.py", line 101, in compare_record
    % (key, old.annotations[key], new.annotations[key]))
ValueError: Annotation mis-match for keywords:
['antifreeze protein homology', 'cold-regulated gene', 'cor6.6 gene',
'KIN1 homology']
['antifreeze protein homology', 'cold-regulated gene', 'cor6.6 gene',
'KIN1', 'homology']

----------------------------------------------------------------------

I'll fix this later today...

Peter


From clements at galaxyproject.org  Tue Mar 12 22:01:49 2013
From: clements at galaxyproject.org (Dave Clements)
Date: Tue, 12 Mar 2013 15:01:49 -0700
Subject: [Biopython-dev] 2013 Galaxy Community Conference (GCC2013),
	30 June - 2 July, Oslo
Message-ID: <CA+He-X_a6gvMXcqiXPLHmTz0j==rGYzHDjZYeDBLqcPMdEf5KQ@mail.gmail.com>

Hello all,

 We are pleased to announce that early
registration<http://wiki.galaxyproject.org/Events/GCC2013/Register>
 and paper and poster abstract
submission<http://wiki.galaxyproject.org/Events/GCC2013/Abstracts> are
now open for the 2013 Galaxy Community Conference
(GCC2013)<http://wiki.galaxyproject.org/Events/GCC2013>
.  GCC2013 will be held 30 June through July 2 in Oslo Norway, at the
University
of Oslo <http://uio.no/>.

GCC2013 <http://wiki.galaxyproject.org/Events/GCC2013> is an opportunity to
participate in two full days of presentations, discussions, poster
sessions, keynotes, lightning talks and breakouts, *all about
high-throughput biology and the tools that support it*. The conference also
includes a Training
Day<http://wiki.galaxyproject.org/Events/GCC2013/TrainingDay> for
the second year in a row, this year with more in-depth topic coverage, more
concurrent sessions, and more topics.

If you are a biologist or bioinformatician performing or enabling
high-throughput biological research, then please consider attending.
 GCC2013 is aimed at:

   - Bioinformatics tool developers and data providers
   - Workflow developers and power bioinformatics users
   - Sequencing and Bioinformatics core staff
   - Data archival and analysis reproducibility specialists

*Early registration <http://wiki.galaxyproject.org/Events/GCC2013/Register>*
 *saves up to 75% off regular registration costs,* and is very affordable,
with combined registration (Training
Day<http://wiki.galaxyproject.org/Events/GCC2013/TrainingDay> +
main meeting) starting at ~ ?95 for post-docs and students.  Registering
early also assures you a spot in the Training Day workshops you want to
attend.  Once a Training Day session becomes full, it will be closed to new
registrations.  Early registration closes 24 May.

*Abstract submission<http://wiki.galaxyproject.org/Events/GCC2013/Abstracts>
* for oral presentations closes 12 April, and for posters on 3 May.  Please
consider presenting your work. If you are working with big biological data,
then the people at this meeting want to hear about your work.

Thanks, and hope to see you in Oslo!

The GCC2013 Organizing
Committee<http://wiki.galaxyproject.org/Events/GCC2013/Organizers>

PS: And please help get the word
out<http://wiki.galaxyproject.org/Events/GCC2013/Promotion>
!

-- 
http://galaxyproject.org/GCC2013
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://wiki.galaxyproject.org/


From anaryin at gmail.com  Wed Mar 13 11:09:29 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 13 Mar 2013 12:09:29 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
Message-ID: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>

Hello all,

I updated the GSOC page on the wiki to be more organized:
http://biopython.org/wiki/GSOC

If no one opposes, I'll replace the current page
(here<http://biopython.org/wiki/Google_Summer_of_Code>)
with it, just in time for GSOC 2013.

Best,

Jo?o

PS. sorry for the spamming but I posted this 5 days ago in the non dev list
and got no answers so..


2013/3/8 Jo?o Rodrigues <anaryin at gmail.com>

> Small update: http://biopython.org/wiki/GSOC
>
> If ok, We can just link the normal one for this one. I kept it separate
> just in case.
>
>
> 2013/3/4 Peter Cock <p.j.a.cock at googlemail.com>
>
>> On Sun, Mar 3, 2013 at 11:07 PM, Jo?o Rodrigues <anaryin at gmail.com>
>> wrote:
>> > Hello all,
>> >
>> > Does any oppose to a refreshment of our GSOC
>> > page<http://biopython.org/wiki/Google_Summer_of_Code>based on the
>> > BioRuby
>> > page <http://bioruby.open-bio.org/wiki/Google_Summer_of_Code>? It
>> could use
>> > a facelift before the new round of projects/students come in.
>> >
>> > Best,
>> >
>> > Jo?o
>>
>> A good idea - see also the GSoC discussions on the biopython-dev
>> list about potential project ideas.
>>
>> Thanks,
>>
>> Peter
>>
>
>


From mikael.trellet at gmail.com  Wed Mar 13 11:17:17 2013
From: mikael.trellet at gmail.com (Mikael Trellet)
Date: Wed, 13 Mar 2013 12:17:17 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
Message-ID: <CAMHOhY1VqM0vyJ1NtEscnbBxv3s=gsdKeEJhWCgRWtUyFCNU2Q@mail.gmail.com>

It's well-formated and looks nice for me, the improvement from the former
one is signifcant so I would agree to update the page.

Good work ;)

Mikael


On Wed, Mar 13, 2013 at 12:09 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> Hello all,
>
> I updated the GSOC page on the wiki to be more organized:
> http://biopython.org/wiki/GSOC
>
> If no one opposes, I'll replace the current page
> (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> with it, just in time for GSOC 2013.
>
> Best,
>
> Jo?o
>
> PS. sorry for the spamming but I posted this 5 days ago in the non dev list
> and got no answers so..
>
>
> 2013/3/8 Jo?o Rodrigues <anaryin at gmail.com>
>
> > Small update: http://biopython.org/wiki/GSOC
> >
> > If ok, We can just link the normal one for this one. I kept it separate
> > just in case.
> >
> >
> > 2013/3/4 Peter Cock <p.j.a.cock at googlemail.com>
> >
> >> On Sun, Mar 3, 2013 at 11:07 PM, Jo?o Rodrigues <anaryin at gmail.com>
> >> wrote:
> >> > Hello all,
> >> >
> >> > Does any oppose to a refreshment of our GSOC
> >> > page<http://biopython.org/wiki/Google_Summer_of_Code>based on the
> >> > BioRuby
> >> > page <http://bioruby.open-bio.org/wiki/Google_Summer_of_Code>? It
> >> could use
> >> > a facelift before the new round of projects/students come in.
> >> >
> >> > Best,
> >> >
> >> > Jo?o
> >>
> >> A good idea - see also the GSoC discussions on the biopython-dev
> >> list about potential project ideas.
> >>
> >> Thanks,
> >>
> >> Peter
> >>
> >
> >
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


-- 
--------------------------------------------
Mikael TRELLET,
- Groupe VENISE, CNRS LIMSI
91403 Orsay CEDEX
- LBT/IBPC,
75005 Paris
France
+33650607172


From p.j.a.cock at googlemail.com  Wed Mar 13 12:04:28 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 13 Mar 2013 12:04:28 +0000
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
Message-ID: <CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>

On Wed, Mar 13, 2013 at 11:09 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hello all,
>
> I updated the GSOC page on the wiki to be more organized:
> http://biopython.org/wiki/GSOC
>
> If no one opposes, I'll replace the current page
> (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> with it, just in time for GSOC 2013.
>
> Best,
>
> Jo?o

Sounds sensible, and you can set a direct on GSOC to
Google_Summer_of_Code by replacing the content with:

#REDIRECT [[link]]

Peter


From anaryin at gmail.com  Wed Mar 13 13:22:23 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 13 Mar 2013 14:22:23 +0100
Subject: [Biopython-dev] [Biopython] Updating GSOC page?
In-Reply-To: <CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>
References: <CAJ9sUYOgNRxES1GkJdEmDTpeo=XbhJvE__eYKEnQ1-OsLWg0AA@mail.gmail.com>
	<CAKVJ-_4Q4DRgXEOc9S1-0kB=U3YPaA=T30RzzHp-s=ZE_mizzg@mail.gmail.com>
	<CAJ9sUYNM5BwhyFkvM5nPJwJxDM6m4rmNvZTqED0Cnfhc_VCcwQ@mail.gmail.com>
	<CAJ9sUYNXSmTZ6izM=fH8KXoMLJgwHauP8vgvTc-OR=bPH4jBeQ@mail.gmail.com>
	<CAKVJ-_7hGEk14sumE25JyJBPsh-OeA-jMUaV7edUbBtLVZVnUw@mail.gmail.com>
Message-ID: <CAJ9sUYOjtScLgizcD9j=RWhdmUjRA=UPaTz72Kvc2WscyuGecw@mail.gmail.com>

Done, thanks.

http://biopython.org/wiki/Google_Summer_of_Code
http://biopython.org/wiki/GSOC


2013/3/13 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Mar 13, 2013 at 11:09 AM, Jo?o Rodrigues <anaryin at gmail.com>
> wrote:
> > Hello all,
> >
> > I updated the GSOC page on the wiki to be more organized:
> > http://biopython.org/wiki/GSOC
> >
> > If no one opposes, I'll replace the current page
> > (here<http://biopython.org/wiki/Google_Summer_of_Code>)
> > with it, just in time for GSOC 2013.
> >
> > Best,
> >
> > Jo?o
>
> Sounds sensible, and you can set a direct on GSOC to
> Google_Summer_of_Code by replacing the content with:
>
> #REDIRECT [[link]]
>
> Peter
>


From mjldehoon at yahoo.com  Wed Mar 13 14:44:55 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 13 Mar 2013 07:44:55 -0700 (PDT)
Subject: [Biopython-dev] New contributor
In-Reply-To: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
Message-ID: <1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Andrea,

Welcome to Biopython!
It's great that you want to contribute.
Writing & finishing some unit tests sounds like a good idea, and of course bug fixing is always welcome.
Other options are to look at orphan modules in Biopython (modules without active maintainers, without documentation, or without unit tests).
Once you decide what specifically you want to work on, it's good to let us know on the mailing list, to see if anybody else is working on the same.

Good luck!

Best,
-Michiel. 


--- On Sat, 3/2/13, Andrea Rizzi <88whacko at gmail.com> wrote:

> From: Andrea Rizzi <88whacko at gmail.com>
> Subject: [Biopython-dev] New contributor
> To: biopython-dev at biopython.org
> Date: Saturday, March 2, 2013, 11:49 AM
> Hello!
> My name is Andrea Rizzi and I'm a master's student in
> computer science and
> computational biology. I would be glad to help you
> developing biopython.
> I've used the library quite extensively but I'm mostly
> familiar with
> handling sequences, MSAs and PDB files.
> 
> I've read through the small contributing guide on the wiki
> and on the
> tutorial and I thought I could start with something
> relatively
> straightforward like writing/completing some unit tests (if
> I understood
> correctly there's a fairly strong need for them). I've good
> knowledge of
> both git and unittest. Anyway any task is actually fine to
> me :) .
> 
> If you agree I'll try to look for a module that needs some
> more testing (or
> maybe you have one to suggest me), otherwise I could just go
> to the bug
> tracker and try to help out fixing some bugs.
> 
> -- 
> -- Andrea
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From eric.talevich at gmail.com  Wed Mar 13 18:32:25 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 13 Mar 2013 14:32:25 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>

On Tue, Feb 12, 2013 at 9:08 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> It would be great to have better support for microarray analysis in
> Biopython. Something like lumi/limma in R. Perhaps this is an option for
> the GSoC?
>
> Best,
> -Michiel.
>

I like Michiel's idea, and I'll suggest two more:

1. Codon alignment & analysis:
- PAL2NAL-style conversion of unaligned nucleic acid sequences and a
protein sequence alignment to a codon alignment. (Previously discussed)
- dN/dS and the related functions needed to calculate it.
- Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
codon alignments, including validation (testing for frame shifts etc.)

2. Phylo enhancements:
2a. Tree drawing:
- A proper draw_unrooted function to perform radial layout, with an
optional "iterations" argument to use Felsenstein's Equal Daylight
algorithm -- I feel this layout approach is neglected in most libraries.
- Better matplotlib/pylab integration, so the plot components can be
tweaked using matplotlib functions.
- Other common layout approaches, e.g. circular.
2b. A "Phylo.consensus" module:
- strict consensus, like Bio.Nexus already implements.
- other consensus methods, time permitting.
2c. A "Phylo.distance" module:
- Robinson-Foulds distance -- though others might be working on this
already.
2d. Simple tree inference:
- Straightforward algorithms exist for neighbor-joining and parsimony tree
estimation. For small alignments (and perhaps medium-sized ones with PyPy),
it would be nice to run these without an external program, e.g. to
construct a guide tree for another algorithm or quickly view a phylogenetic
clustering of sequences.

Any interest in either of these? Shall I add them to the wiki?

-Eric


--- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: [Biopython-dev] Project ideas for GSoC (or other student
> projects)
> > To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Tuesday, February 12, 2013, 12:51 PM
> > Hello all,
> >
> > Google recently confirmed they will be running Google Summer
> > of Code 2013,
> > and we (Biopython and the other Bio* projects) would hope to
> > be accepted again
> > under the Open Bioinformatics Foundation as in previous
> > years:
> > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
> >
> > It would be great to start coming up with potential project
> > ideas, both larger
> > pieces of work suitable for GSoC but also smaller tasks for
> > other project
> > students, or 'low hanging fruit' for potential contributors
> > to cut
> > their teeth on.
> >
> > See also http://biopython.org/wiki/Active_projects
> > and the ideas list there.
> >
> > Regards,
> >
> > Peter
>


From p.j.a.cock at googlemail.com  Wed Mar 13 21:16:27 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 13 Mar 2013 21:16:27 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
	<CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
Message-ID: <CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>

On Monday, March 4, 2013, Saket Choudhary wrote:

> Hi,
>
> I have updated the code here :
> https://github.com/saketkc/biopython/tree/bwa_wrapper
>
> I have added unittests for the wrapper. And yes, this did help me in
> fixing a lot of minor bugs in my original wrapper.
>
> @Peter :  Is this 'pull request' ready ?
>
> Thanks
>
> Saket
>
>
Sorry I've not had time to test this yet - and have
been off ill today as well. The basic approach you've
taken seems sound, and a good basis for other
samtools style tools.

Peter


From p.j.a.cock at googlemail.com  Thu Mar 14 11:25:41 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Mar 2013 11:25:41 +0000
Subject: [Biopython-dev] Fwd: [biopython] Add the ability to parse CEL
 version 4 files from Affy (#168)
In-Reply-To: <biopython/biopython/pull/168@github.com>
References: <biopython/biopython/pull/168@github.com>
Message-ID: <CAKVJ-_71jbGz2y2WFquRYQxX2AeANMc6Ek=RwdcQLK5jSNdKJA@mail.gmail.com>

Who would be the best person to review this? Michael?

Peter

---------- Forwarded message ----------
From: *Jeff Hammerbacher*
Date: Thursday, March 14, 2013
Subject: [biopython] Add the ability to parse CEL version 4 files from Affy
(#168)
To: biopython/biopython <biopython at noreply.github.com>


Hey,

I noticed that Biopython was missing the ability to parse binary CEL files
(version 4), so I've added a rough implementation. I've kept TODOs in the
code and a main method to demonstrate example use. I realize these are not
best practices for a mature library, but this corner of Biopython (the Affy
module) seems quite immature, so I figured I'd leave the code in this state
to indicate to others that there is much room for improvement.

I have not contributed to this project before, so please let me know how to
get this pull request in shape for a commit.

Thanks,
Jeff
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/hammer/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/168
Commit Summary

   - Add the ability to parse CEL version 4 files from Affy.

File Changes

   - *A* Bio/Affy/CelFileV4.py<https://github.com/biopython/biopython/pull/168/files#diff-0>(186)

Patch Links:

   - https://github.com/biopython/biopython/pull/168.patch
   - https://github.com/biopython/biopython/pull/168.diff


From mjldehoon at yahoo.com  Fri Mar 15 13:09:18 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 06:09:18 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
Message-ID: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi everybody,

I looked at the mmCIF parser again, and it turned out that the Python standard library contains a shlex lexical analyzer module that makes mmCIF parsing straightforward without relying on flex or PLY. I uploaded a modified version of MMCIF2Dict.py to the git repository. This parser does the exact same thing as the flex-based parser, but is in pure Python. If you're interested, have a look at MMCIF2Dict.py in the git repository; comments and suggestions are welcome.

If there are no objections, I think we can remove everything in Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF file is represented in Biopython. I think there are more Pythonic ways to store the contents of an mmCIF file in a Python object.

Best,
-Michiel.

--- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>, "Lenna Peterson" <arklenna at gmail.com>
> Date: Saturday, February 16, 2013, 5:42 AM
> On Sat, Feb 16, 2013 at 2:46 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Lenna,
> >
> > Maybe we are confusing each other..
> > I am looking for a solution that (a) doesn't introduce
> new dependencies,
> 
> +1
> 
> > (b) is pure-Python so it can run on Jython,
> 
> +1 And on PyPy (which to me is more interesting that Jython)
> etc.
> 
> > and (c) if that is not possible and we do need to use
> C, then that C code
> > should be understandable so that it can be debugged if
> necessary.
> >
> > I was suggesting to clean up lex.yy.c so that we can at
> least achieve (c).
> 
> This does mean we essentially give up on ever regenerating
> the lex.yy.c
> file every again - could that be a problem if Flex itself
> changes much?
> 
> > The alternative is to start from the PLY-based parser
> and remove the
> > dependency on PLY.
> >
> > Best,
> > -Michiel.
> 
> Peter
> 


From anaryin at gmail.com  Fri Mar 15 13:20:16 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 15 Mar 2013 14:20:16 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
	<1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>

Hi Michiel,

Speaking without really checking the code.. What we perhaps should have it
the parsers, whatever they are, populating the same type of object in the
end (PDBParser and mmCIFParser). Is this the current status of the mmCIF?

Best,

Jo?o


2013/3/15 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi everybody,
>
> I looked at the mmCIF parser again, and it turned out that the Python
> standard library contains a shlex lexical analyzer module that makes mmCIF
> parsing straightforward without relying on flex or PLY. I uploaded a
> modified version of MMCIF2Dict.py to the git repository. This parser does
> the exact same thing as the flex-based parser, but is in pure Python. If
> you're interested, have a look at MMCIF2Dict.py in the git repository;
> comments and suggestions are welcome.
>
> If there are no objections, I think we can remove everything in
> Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF
> file is represented in Biopython. I think there are more Pythonic ways to
> store the contents of an mmCIF file in a Python object.
>
> Best,
> -Michiel.
>
> --- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > From: Peter Cock <p.j.a.cock at googlemail.com>
> > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> > To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> > Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>, "Lenna
> Peterson" <arklenna at gmail.com>
> > Date: Saturday, February 16, 2013, 5:42 AM
> > On Sat, Feb 16, 2013 at 2:46 AM,
> > Michiel de Hoon <mjldehoon at yahoo.com>
> > wrote:
> > > Hi Lenna,
> > >
> > > Maybe we are confusing each other..
> > > I am looking for a solution that (a) doesn't introduce
> > new dependencies,
> >
> > +1
> >
> > > (b) is pure-Python so it can run on Jython,
> >
> > +1 And on PyPy (which to me is more interesting that Jython)
> > etc.
> >
> > > and (c) if that is not possible and we do need to use
> > C, then that C code
> > > should be understandable so that it can be debugged if
> > necessary.
> > >
> > > I was suggesting to clean up lex.yy.c so that we can at
> > least achieve (c).
> >
> > This does mean we essentially give up on ever regenerating
> > the lex.yy.c
> > file every again - could that be a problem if Flex itself
> > changes much?
> >
> > > The alternative is to start from the PLY-based parser
> > and remove the
> > > dependency on PLY.
> > >
> > > Best,
> > > -Michiel.
> >
> > Peter
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Fri Mar 15 13:21:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 15 Mar 2013 13:21:50 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
	<1363352958.71694.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4GbeJBy2qe8Qz=b5qg4uaMFwxBU21YuubnH=Y+8VqOKg@mail.gmail.com>

On Fri, Mar 15, 2013 at 1:09 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> I looked at the mmCIF parser again, and it turned out that the Python
> standard library contains a shlex lexical analyzer module that makes mmCIF
> parsing straightforward without relying on flex or PLY. I uploaded a
> modified version of MMCIF2Dict.py to the git repository. This parser does
> the exact same thing as the flex-based parser, but is in pure Python. If
> you're interested, have a look at MMCIF2Dict.py in the git repository;
> comments and suggestions are welcome.

That makes MMCIF2Dict look a lot shorter :)
https://github.com/biopython/biopython/commit/b2bafdfcd67c738f91722495bb732297b7936828

> If there are no objections, I think we can remove everything in
> Bio.PDB.mmCIF. Also I'm a bit unhappy with how the information in an mmCIF
> file is represented in Biopython. I think there are more Pythonic ways to
> store the contents of an mmCIF file in a Python object.
>
> Best,
> -Michiel.

Do you think we need a deprecation cycle for Bio.PDB.mmCIF? It has
been available by default on Debian etc where the dependency was
taken care of by the packagers.

I've never used this code so perhaps Eric or Jo?o's perspective would be
more helpful than mine.

Peter


From mjldehoon at yahoo.com  Fri Mar 15 15:08:43 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 08:08:43 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_4GbeJBy2qe8Qz=b5qg4uaMFwxBU21YuubnH=Y+8VqOKg@mail.gmail.com>
Message-ID: <1363360123.26690.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi all,

--- On Fri, 3/15/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Do you think we need a deprecation cycle for Bio.PDB.mmCIF?
> It has been available by default on Debian etc where the
> dependency was taken care of by the packagers.

Probably not. The Bio.PDB.mmCIF module was essentially a private module used by Bio.PDB.MMCIF2Dict, whose usage is unchanged. Also, AFAICT the Bio.PDB.mmCIF module is not documented anywhere. And finally, all this module does is tokenize the mmCIF file, so probably not something an end user would be interested in.

I am not a heavy user of Bio.PDB myself, so feel free to correct me if I am wrong.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Fri Mar 15 15:28:48 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 15 Mar 2013 15:28:48 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
	<1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7gK0+=H-H4LFOsduLqSHqahD3mELOxHeGsSgxV0-Khaw@mail.gmail.com>

On Fri, Mar 15, 2013 at 3:22 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e.
> how to use the Bio.PDB module) on the Biopython website with additional
> information on Bio.PDB, including some information on things that are not in
> the main Biopython Tutorial. Perhaps this is a good time to integrate this
> FAQ into the main documentation?
>

Both are LaTeX documents so this shouldn't be too hard to do.

Peter


From mjldehoon at yahoo.com  Fri Mar 15 15:22:30 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 08:22:30 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
Message-ID: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Jo?o,

--- On Fri, 3/15/13, Jo?o Rodrigues <anaryin at gmail.com> wrote:
What we perhaps should have it the parsers, whatever they are, populating the same type of object in the end (PDBParser and mmCIFParser).
I think that there are two options:

1) PDBParser and mmCIFParser both produce Structure objects, with any additional information found in mmCIF files stored as additional attributes of Structure objects (and the same thing for PDB files);

2) We make a module mmCIF with a function mmCIF.read that reads an mmCIF file and stores the information in a mmCIF.Record object that is optimized for storing mmCIF information. The mmCIFParser uses mmCIF.read, and pulls out the necessary information from the mmCIF.Record object to create a Structure object (which is free of mmCIF-specific stuff). Users can make Structure objects if that is all they need, or use mmCIF.read if they want to have all information in an mmCIF file.

Currently the situation is closer to (2), with MMCIF2Dict playing the role of mmCIF.read, but I don't like much the way MMCIF2Dict stores information.

Since I am not a power user of Bio.PDB, other people may have more insight in whether (1) or (2) (or something completely different) is best. 
Is this the current status of the mmCIF?
I just replaced the flex-dependent part of mmCIF by pure Python code, but I didn't change the functionality or usage of the mmCIF code. So the current status is still the same as described in the documentation.

Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e. how to use the Bio.PDB module) on the Biopython website with additional information on Bio.PDB, including some information on things that are not in the main Biopython Tutorial. Perhaps this is a good time to integrate this FAQ into the main documentation?

Best,
-Michiel 


From jacobs at bioinformed.com  Fri Mar 15 15:40:38 2013
From: jacobs at bioinformed.com (Kevin Jacobs)
Date: Fri, 15 Mar 2013 08:40:38 -0700
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
	<CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>
	<CAEDHeitOkC8bPB1HceASHz2=hXZHT3Q1f0n0p77BGnaM8ZCdow@mail.gmail.com>
	<CAKVJ-_4Qck=EaUuJVo-hGp5f5Rk1ua3tqPdEExaumzQ512wO1A@mail.gmail.com>
Message-ID: <CAPipXkKS8c3CCMQef8zn2a2oSFuj=XbTKGaKFWRUCcVZ8iWt4A@mail.gmail.com>

FYI, I am working on a direct Cython wrapper around the new BWA-MEM
aligner, which will allow API-level access to Heng Li's extremely
impressive new algorithm.  It is still in early development and is missing
many bells and whistles, but will be shaping up in the next few weeks.

Test program:

import bwamem

mem = bwamem.MEMAligner('ref/human_g1k_v37.fasta')
a = mem.align('TCACGACGCTCTTCCGATCTGTT...GTGCATTCTCTGGTCAGACAGCCAAGG')
a = a[0]

print 'ref id =',a.rid
print 'pos    =',a.pos
print 'CIGAR  =',a.cigar.to_string()


Output (correct):

ref id = 0
pos    = 115250385
CIGAR  = 17N134M


On Wed, Mar 13, 2013 at 2:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Monday, March 4, 2013, Saket Choudhary wrote:
>
> > Hi,
> >
> > I have updated the code here :
> > https://github.com/saketkc/biopython/tree/bwa_wrapper
> >
> > I have added unittests for the wrapper. And yes, this did help me in
> > fixing a lot of minor bugs in my original wrapper.
> >
> > @Peter :  Is this 'pull request' ready ?
> >
> > Thanks
> >
> > Saket
> >
> >
> Sorry I've not had time to test this yet - and have
> been off ill today as well. The basic approach you've
> taken seems sound, and a good basis for other
> samtools style tools.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From anaryin at gmail.com  Fri Mar 15 15:53:41 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 15 Mar 2013 16:53:41 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAJ9sUYN3kwRYLi1Bgc1N5AzMrqPd5bnNRZvBRqdPH0+xafz+Jg@mail.gmail.com>
	<1363360950.87852.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>

Hi Michiel,


> 1) PDBParser and mmCIFParser both produce Structure objects, with any
> additional information found in mmCIF files stored as additional attributes
> of Structure objects (and the same thing for PDB files);
>

This approach has a few advantages. First and most obvious, converting one
file format to another seamlessly. Second, reducing the code to something
easier to maintain and to extend too. The disadvantage is that the
Structure objects might become a bit too bloated. On the other hand, we can
make them lighter and take advantage of Python's dynamic attributes (if I
need a b-factor, I just add atom.bfactor). This would also help a lot with
the current parser which is quite "sluggish" for some purposes and bring a
lot more flexibility (parsing pqr files, mol2 files, etc). All we'd need
would be a parser for each file format and a generic container to have the
backbone of the structure and extend is as we need. A simple flag for the
parser type would make checking if function X can be used on this
particular structure easier too.


>
> 2) We make a module mmCIF with a function mmCIF.read that reads an mmCIF
> file and stores the information in a mmCIF.Record object that is optimized
> for storing mmCIF information. The mmCIFParser uses mmCIF.read, and pulls
> out the necessary information from the mmCIF.Record object to create a
> Structure object (which is free of mmCIF-specific stuff). Users can make
> Structure objects if that is all they need, or use mmCIF.read if they want
> to have all information in an mmCIF file.
>

I'm completely unfamiliar with mmCIF files.. how much more information do
they have than a PDB file? And what kind of information is useful to
extract from them?

Speaking of which, we have a Biopython Structural Bioinformatics FAQ (i.e.
> how to use the Bio.PDB module) on the Biopython website with additional
> information on Bio.PDB, including some information on things that are not
> in the main Biopython Tutorial. Perhaps this is a good time to integrate
> this FAQ into the main documentation?


We could also update it a bit because it's been a while and there are some
different things here and there. And additions too.

Best,

Jo?o


From bartek at rezolwenta.eu.org  Fri Mar 15 23:06:57 2013
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sat, 16 Mar 2013 00:06:57 +0100
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>

Hi All,
I would add one more (old) idea for a GSoC pool, i.e. adding support
for different biological ontologies to biopython.

This was already discussed some time ago
(http://www.biopython.org/w/index.php?title=Gene_Ontology&redirect=no)
mostly in the context of gene ontology, and to some extent this is
addressed by the development of GOAtools
(https://github.com/tanghaibao/goatools), but I think it would be
worth to have a decent support for OBO-file-based ontologies (not only
gene ontology, I'm also interested myself in anatomical ontologies,
there are also other available at obofoundry.org) in biopython.

I think it would need to include support for IO operations on both OBO
and annotation files, as well as statistical enrichment measures and
potentially some visualisation.

Would anyone be interested in co-mentoring this project? There is one
student in my department who would be interested in applying to GSoC
for this project, but I think it would be great if other people joined
the discussion on the functionality and having more people involved is
always better...

best
Bartek Wilczynski

On Wed, Mar 13, 2013 at 7:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Feb 12, 2013 at 9:08 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:
>
>> It would be great to have better support for microarray analysis in
>> Biopython. Something like lumi/limma in R. Perhaps this is an option for
>> the GSoC?
>>
>> Best,
>> -Michiel.
>>
>
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:
> - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
> protein sequence alignment to a codon alignment. (Previously discussed)
> - dN/dS and the related functions needed to calculate it.
> - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
> codon alignments, including validation (testing for frame shifts etc.)
>
> 2. Phylo enhancements:
> 2a. Tree drawing:
> - A proper draw_unrooted function to perform radial layout, with an
> optional "iterations" argument to use Felsenstein's Equal Daylight
> algorithm -- I feel this layout approach is neglected in most libraries.
> - Better matplotlib/pylab integration, so the plot components can be
> tweaked using matplotlib functions.
> - Other common layout approaches, e.g. circular.
> 2b. A "Phylo.consensus" module:
> - strict consensus, like Bio.Nexus already implements.
> - other consensus methods, time permitting.
> 2c. A "Phylo.distance" module:
> - Robinson-Foulds distance -- though others might be working on this
> already.
> 2d. Simple tree inference:
> - Straightforward algorithms exist for neighbor-joining and parsimony tree
> estimation. For small alignments (and perhaps medium-sized ones with PyPy),
> it would be nice to run these without an external program, e.g. to
> construct a guide tree for another algorithm or quickly view a phylogenetic
> clustering of sequences.
>
> Any interest in either of these? Shall I add them to the wiki?
>
> -Eric
>
>
> --- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> > From: Peter Cock <p.j.a.cock at googlemail.com>
>> > Subject: [Biopython-dev] Project ideas for GSoC (or other student
>> projects)
>> > To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
>> > Date: Tuesday, February 12, 2013, 12:51 PM
>> > Hello all,
>> >
>> > Google recently confirmed they will be running Google Summer
>> > of Code 2013,
>> > and we (Biopython and the other Bio* projects) would hope to
>> > be accepted again
>> > under the Open Bioinformatics Foundation as in previous
>> > years:
>> > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>> >
>> > It would be great to start coming up with potential project
>> > ideas, both larger
>> > pieces of work suitable for GSoC but also smaller tasks for
>> > other project
>> > students, or 'low hanging fruit' for potential contributors
>> > to cut
>> > their teeth on.
>> >
>> > See also http://biopython.org/wiki/Active_projects
>> > and the ideas list there.
>> >
>> > Regards,
>> >
>> > Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


--
Bartek Wilczynski


From mjldehoon at yahoo.com  Sat Mar 16 02:38:48 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Mar 2013 19:38:48 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
Message-ID: <1363401528.82829.YahooMailClassic@web164001.mail.gq1.yahoo.com>


--- On Fri, 3/15/13, Jo?o Rodrigues <anaryin at gmail.com> wrote:
I'm completely unfamiliar with mmCIF files.. how much more information do they have than a PDB file?
These are two examples from the Biopython tests:

https://github.com/biopython/biopython/blob/master/Tests/PDB/1A8O.cif
https://github.com/biopython/biopython/blob/master/Tests/PDB/1LCD.cif
And what kind of information is useful to extract from them?
I think we should extract all information from these files, and let the user decide which parts are useful.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Sat Mar 16 14:38:22 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Mar 2013 14:38:22 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
Message-ID: <CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>

On Wed, Dec 5, 2012 at 6:41 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi David,
>
> I've been experimenting with your pull request, thank you:
> https://github.com/biopython/biopython/pull/116
>

Hi again David,

I've not used your code as is, but have started by pulling out
and generalising what I felt was the least contentious part:

https://github.com/biopython/biopython/commit/087712510421ec7f655a7981926a757aa93e9177

This means that label_position = start, middle, end (and some
historic aliases defined in the linear drawer code) now work
on circular GenomeDiagrams. I have made the default None,
which gives the current behaviour (as 'start' on linear, the
more complicated to explain vertical bottom on circular).

Support for allowing the default label orientation to be radially
consistent all round the circle (rather than the current flipping
for the left/right halves which assumes the output is kept
vertical) would be nice, but the thing I am most keen on is the
inside/outside of the track label placement. Hopefully I'll have
time to finish that this weekend...

Peter


From p.j.a.cock at googlemail.com  Sat Mar 16 20:37:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Mar 2013 20:37:12 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
	<CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
Message-ID: <CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>

On Sat, Mar 16, 2013 at 2:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Dec 5, 2012 at 6:41 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi David,
>>
>> I've been experimenting with your pull request, thank you:
>> https://github.com/biopython/biopython/pull/116
>>
>
> Hi again David,
>
> I've not used your code as is, but have started by pulling out
> and generalising what I felt was the least contentious part:
>
> https://github.com/biopython/biopython/commit/087712510421ec7f655a7981926a757aa93e9177
>
> This means that label_position = start, middle, end (and some
> historic aliases defined in the linear drawer code) now work
> on circular GenomeDiagrams. I have made the default None,
> which gives the current behaviour (as 'start' on linear, the
> more complicated to explain vertical bottom on circular).
>
> Support for allowing the default label orientation to be radially
> consistent all round the circle (rather than the current flipping
> for the left/right halves which assumes the output is kept
> vertical) would be nice, but the thing I am most keen on is the
> inside/outside of the track label placement. Hopefully I'll have
> time to finish that this weekend...

Here's a version on a branch which addresses the label placement
by adding a label_strand argument, where +1 means the label is
on the forward strand side of the track (above or outside), while
-1 means the reverse strand side of the track (below or inside),
and the default is to follow the strand of the feature being draw.
This seemed to me quite an intuitive arrangement:

https://github.com/peterjc/biopython/tree/label_strand

This branch also (without making it optional) switches circular
diagram feature labels to be "outside" the sigil like the linear
diagram, rather than "insider" the sigil. This does tend to take
up more space (which would explain the original motivation),
but rarely gives a very legible result except with a box sigil
and a very small/short label which falls completely within the
sigil. This could be made a user option if there is demand...
my inclination is not to (the API is already quite complex).

David, I will email you an updated version of your example
script using this branch for you to look at. It allows me to
recreate the same effect as your code (bar the orientation
changes which I have not at this point incorporated).

David & Leighton, what do you think of this label idea?

Peter


From p.j.a.cock at googlemail.com  Mon Mar 18 11:58:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 18 Mar 2013 11:58:49 +0000
Subject: [Biopython-dev] Modifications to CircularDrawer
In-Reply-To: <CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>
References: <959CFF5060375249824CC633DDDF896F1C0C58B1@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<E72D33BF424829408854FEB604A6959B6E47307C@DUEXC02.ad.hutton.ac.uk>
	<959CFF5060375249824CC633DDDF896F1C0C5C9B@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_7bf_n8dPXvw+pH+edo9aCWVBuiXRL_Gjx62b3faRgjJg@mail.gmail.com>
	<959CFF5060375249824CC633DDDF896F1C0C5E48@AMSPRD0410MB351.eurprd04.prod.outlook.com>
	<CAKVJ-_5LatE+JT04d0dcf_WiTRE9tEUNPw6oPC004H=TTqB43A@mail.gmail.com>
	<CAKVJ-_7LjEdGtm3T26L3wwkraN9C+11mLvjogDBXAK+EDjGeDQ@mail.gmail.com>
	<CAKVJ-_4Fcb5FqnmH+LjRquDBOUDYMgOi_TOG-JGt6Nk6gFD9vw@mail.gmail.com>
	<CAKVJ-_7C0Oqk4MXdNPy-ONCDOHF2gRTAWF5pe6sYQ80ej+0WgQ@mail.gmail.com>
Message-ID: <CAKVJ-_6RMyKfxzjUi9spf6NTSAgo5HNW37KeroDV1Mtcbndidw@mail.gmail.com>

On Sat, Mar 16, 2013 at 8:37 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> David & Leighton, what do you think of this label idea?
>
> Peter

>From discussion off list, my branch seems positively
accepted by both, and so I've applied that to the master.
I probably will need to update some images in the Tutorial...

We appear to agree that label orientation is an aesthetic
judgement, and therefore a user option to control this on
circular diagrams would be nice - but I've not done this (yet)
and remain cautious about further complicating this bit of
the code & while trying to have a consistent API between
the linear and circular drawers.

See also: https://github.com/biopython/biopython/pull/116

Peter


From chapmanb at 50mail.com  Mon Mar 18 16:49:33 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 18 Mar 2013 12:49:33 -0400
Subject: [Biopython-dev] SciPy Bioinformatics symposium: abstracts due
	Wednesday Mar 20th
Message-ID: <87y5dkvejm.fsf@fastmail.fm>


Hi all;
I'm helping organize a bioinformatics mini-symposium as part of SciPy 2013:

Bioinformatics mini-symposia: http://j.mp/Z4xxXB
SciPy info: http://conference.scipy.org/scipy2013/about.php

This is a great chance for the Python bioinformatics community to connect
with the wider Python scientific computing world. SciPy will feature programmers
working on IPython reproducible research, scikit-learn machine learning
approaches, large scale computing problems with NumPy and lots more relevant to
bioinformatics work.

This year there will a special symposium track dedicated to bioinformatics and
I'd like to encourage everyone to submit abstracts. The deadline is this
Wednesday, March 20th:

http://conference.scipy.org/scipy2013/speaking_overview.php
http://conference.scipy.org/scipy2013/speaking_submission.php

SciPy takes place June 24-29th in Austin, TX. I'm looking forward to seeing lots of
bioinformatics people there. Please feel free to write me if you have any
questions,
Brad


From 88whacko at gmail.com  Wed Mar 20 18:10:14 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Wed, 20 Mar 2013 19:10:14 +0100
Subject: [Biopython-dev] New contributor
In-Reply-To: <1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>

Thank you for your welcome Michiel!

I will looking for a good project to work on in the next few days and I
will let you know soon. Meanwhile I've started to read some code to become
familiar with the modules and I bumped into few small bugs concerning the
Seq objects, in particular I found:

1) a duplicated test method name (one test in test_Seq_objs.py wasn't
performed);
2) an error in Alphabet._case_less().

I've also expanded a little bit the documentation and I've substituted
tostring() method with the suggested str() method in a function of
MutableSeq. The branch is located here

https://github.com/andrrizzi/biopython/tree/seq-branch

I'm not sure if it is more comfortable for you to merge this kind of
commits from a git branch or it is more advisable to open a ticket and
create a patch. Anyway if you think this small commits may be useful, feel
free to use them.

Best,
Andrea


2013/3/13 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi Andrea,
>
> Welcome to Biopython!
> It's great that you want to contribute.
> Writing & finishing some unit tests sounds like a good idea, and of course
> bug fixing is always welcome.
> Other options are to look at orphan modules in Biopython (modules without
> active maintainers, without documentation, or without unit tests).
> Once you decide what specifically you want to work on, it's good to let us
> know on the mailing list, to see if anybody else is working on the same.
>
> Good luck!
>
> Best,
> -Michiel.
>
>
>
> --- On Sat, 3/2/13, Andrea Rizzi <88whacko at gmail.com> wrote:
>
> > From: Andrea Rizzi <88whacko at gmail.com>
> > Subject: [Biopython-dev] New contributor
> > To: biopython-dev at biopython.org
> > Date: Saturday, March 2, 2013, 11:49 AM
> > Hello!
> > My name is Andrea Rizzi and I'm a master's student in
> > computer science and
> > computational biology. I would be glad to help you
> > developing biopython.
> > I've used the library quite extensively but I'm mostly
> > familiar with
> > handling sequences, MSAs and PDB files.
> >
> > I've read through the small contributing guide on the wiki
> > and on the
> > tutorial and I thought I could start with something
> > relatively
> > straightforward like writing/completing some unit tests (if
> > I understood
> > correctly there's a fairly strong need for them). I've good
> > knowledge of
> > both git and unittest. Anyway any task is actually fine to
> > me :) .
> >
> > If you agree I'll try to look for a module that needs some
> > more testing (or
> > maybe you have one to suggest me), otherwise I could just go
> > to the bug
> > tracker and try to help out fixing some bugs.
> >
> > --
> > -- Andrea
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>


-- 
-- Andrea


From p.j.a.cock at googlemail.com  Thu Mar 21 12:17:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 12:17:51 +0000
Subject: [Biopython-dev] New contributor
In-Reply-To: <CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
Message-ID: <CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>

On Wed, Mar 20, 2013 at 6:10 PM, Andrea Rizzi <88whacko at gmail.com> wrote:
> Thank you for your welcome Michiel!
>
> I will looking for a good project to work on in the next few days and I
> will let you know soon. Meanwhile I've started to read some code to become
> familiar with the modules and I bumped into few small bugs concerning the
> Seq objects, in particular I found:
>
> 1) a duplicated test method name (one test in test_Seq_objs.py wasn't
> performed);
> 2) an error in Alphabet._case_less().

Well spotted - changes applied to the master, thanks.

> I've also expanded a little bit the documentation and I've substituted
> tostring() method with the suggested str() method in a function of
> MutableSeq. The branch is located here
>
> https://github.com/andrrizzi/biopython/tree/seq-branch
>
> I'm not sure if it is more comfortable for you to merge this kind of
> commits from a git branch or it is more advisable to open a ticket and
> create a patch. Anyway if you think this small commits may be useful, feel
> free to use them.

If you're happy on GitHub, a pull request is simplest. I've looked
at these changes one by one and applied and/or commented
on them.

(We're debating moving our issue tracker from RedMine to
GitHub, which would make things a little easier in future).

Thank you!

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 16:11:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:11:44 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
Message-ID: <CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>

On Fri, Mar 15, 2013 at 11:06 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hi All,
> I would add one more (old) idea for a GSoC pool, i.e. adding support
> for different biological ontologies to biopython.
>
> This was already discussed some time ago
> (http://www.biopython.org/w/index.php?title=Gene_Ontology&redirect=no)
> mostly in the context of gene ontology, and to some extent this is
> addressed by the development of GOAtools
> (https://github.com/tanghaibao/goatools), but I think it would be
> worth to have a decent support for OBO-file-based ontologies (not only
> gene ontology, I'm also interested myself in anatomical ontologies,
> there are also other available at obofoundry.org) in biopython.
>
> I think it would need to include support for IO operations on both OBO
> and annotation files, as well as statistical enrichment measures and
> potentially some visualisation.
>
> Would anyone be interested in co-mentoring this project? There is one
> student in my department who would be interested in applying to GSoC
> for this project, but I think it would be great if other people joined
> the discussion on the functionality and having more people involved is
> always better...
>
> best
> Bartek Wilczynski

That's a good idea - I would have used this recently with some GO
stuff (e.g. given a GO term, is it a molecular function, biological
process, or cellular compartment - can solve this easily by traversing
up any branch of the DAG).

Right now we need to put this list of ideas on the wiki page (ready
for combining into the OBF page which will be shown to Google
to make our case for taking part in the GSoC 2013 program).
http://biopython.org/wiki/Google_Summer_of_Code

If any of you as a potential mentor want to put up an outline
proposal, even better.

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 16:29:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:29:29 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
Message-ID: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>

On Thu, Mar 21, 2013 at 4:11 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Right now we need to put this list of ideas on the wiki page (ready
> for combining into the OBF page which will be shown to Google
> to make our case for taking part in the GSoC 2013 program).
> http://biopython.org/wiki/Google_Summer_of_Code
>
> If any of you as a potential mentor want to put up an outline
> proposal, even better.
>

I've been wondering about potential GSoC projects which I'd
be interested in mentoring (or co-mentoring), and thus far I've
only got one outline idea.

I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
functionality (which does whole record parsing on demand)
and extending this with lazy-loading or lazy-parsing (which
has precedent in our BioSQL wrappers). For example, with
whole genome FASTA files you may never need to load the
entire sequence, but using an index system like tabix (or
even actually using a tabix index) Biopython could provide
a lazy-loading Seq object which extracts only the sequence
region of interest on demand.

The same idea applies to richer file formats too, like EMBL
and GenBank. Here lazy loading the sequence is actually
easier (the number of bases per line is strictly defined),
but you can apply the same ideas to lazy loading features
too. This means indexing both the sequence and the feature
table.

Likewise, this makes sense for GTF/GFF/GFF3 where you
would index the features, and also if present index the
embedded FASTA sequence at the end of the file. Clearly
handling this would ideally build on Lenna and Brad's work
with the underlying parser.

With what I have in mind, there are two technical sides to
this. First, the index format (binning strategies etc) for
which we should review tabix and BAM's indexing and its
planned replacement CSI (able to handle longer references).

Second, to avoid code duplication, this would mean some
re-factoring of the existing parser code to ensure that if
a record is loaded in full via the traditional API, it would
go though the same code as if it were loaded via the new
lazy loading approach. Potentially the existing parsers
could optionally also become lazy loaders (contingent
on this requiring ownership of the file handle as it will
use seek and tell to move the file pointer). That in theory
could make our parsers much faster (depending on the
overheads) for tasks where only a minority of the data
is ever used. I've had some fun chats with Pjotr Prins
from BioRuby about this at a CodeFest/BOSC meeting.

Brad and Lenna, I've CC'd you explicitly as I'm guessing
from the GFF work you are most likely to have considered
some of these issues.

Does this sound like something worth exploring further,
and worth proposing as an outline GSoC project? I think
it would be quite a challenging project - but like last year,
it is something I would like to try myself if I had the time.

Regards,

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 17:01:51 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:01:51 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CAKVJ-_5fX+dFL1CJH47cjsz2gEU7HGi83OZyb3O2iCDQUiSwww@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:

Already up on the wiki :)

>
> 2. Phylo enhancements:
> 2a. Tree drawing:
> - A proper draw_unrooted function to perform radial layout, with an optional
> "iterations" argument to use Felsenstein's Equal Daylight algorithm -- I
> feel this layout approach is neglected in most libraries.
> - Better matplotlib/pylab integration, so the plot components can be tweaked
> using matplotlib functions.
> - Other common layout approaches, e.g. circular.
> 2b. A "Phylo.consensus" module:
> - strict consensus, like Bio.Nexus already implements.
> - other consensus methods, time permitting.
> 2c. A "Phylo.distance" module:
> - Robinson-Foulds distance -- though others might be working on this
> already.
> 2d. Simple tree inference:
> - Straightforward algorithms exist for neighbor-joining and parsimony tree
> estimation. For small alignments (and perhaps medium-sized ones with PyPy),
> it would be nice to run these without an external program, e.g. to construct
> a guide tree for another algorithm or quickly view a phylogenetic clustering
> of sequences.

One more idea for a sub-task?

2e. Using multiple trees for bootstrapping a master tree. Take the master
tree and for each edge you have a partition of the leaves, which can be
used as a dictionary hash (e.g. as a binary representation). Then for
each of the bootstrap runs, look at each edge, compute the hash for
that split of the leaves, and increment the count. Then at the end, you
have a dictionary of counts which are the branch bootstrap supports.

I wrote that once in Python some time back, and used it to take a set
of boot strap trees generated on a cluster and give the support values
to the master tree.

>
> Any interest in either of these? Shall I add them to the wiki?
>

They both seem worth posting on the wiki, although we may not have
enough mentors for both to go ahead :(

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 16:55:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 16:55:30 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
Message-ID: <CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> I like Michiel's idea, and I'll suggest two more:
>
> 1. Codon alignment & analysis:
> - PAL2NAL-style conversion of unaligned nucleic acid sequences and a protein
> sequence alignment to a codon alignment. (Previously discussed)

e.g. https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py

> - dN/dS and the related functions needed to calculate it.
> - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage of
> codon alignments, including validation (testing for frame shifts etc.)

http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis

I see you've started fleshing this idea out on the wiki, which is great.
Right now it seems a little on the light weight side - or is that deliberate
(to see if a student can take this idea and come up with a solid
project proposal in this area)? Things like model selection might
be a fun extension - I can think of a local expert who would be
great to get involved on the science side if he's interested.

Alternatively this could include doing some more general work
on the alignment object - for instance per-column-annotation
for things like a consensus sequence - or an array-of-char
implementation as an alternative to the list-of-SeqRecords
we have now (with its poor column access speed).

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 17:29:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:29:44 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>
Message-ID: <CAKVJ-_6Na2QWDKaVO5wK+QDKTv=Vh-6PGi33hmShz-qErVRkSg@mail.gmail.com>

On Tue, Feb 12, 2013 at 6:29 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> It's more or less a 'low hanging fruit', but I've been thinking
> perhaps it may be useful if we have our own interface to the HMMER3
> online service? The corresponding SearchIO parsers may be written for
> this as well (they return different formats for which we haven't any
> parsers currently).

Worth adding to the projects list here (or filing an enhancement bug)
http://biopython.org/wiki/Active_projects#Project_ideas - but not
enough to base a whole GSoC project around.

> And I think there are more things being worked on, not yet mentioned
> in the wiki:
>
> 1. Porting our docs to Sphinx[1]
> 2. Converting some/all of the print and compare tests to unit tests.
> For example, our Bio.Seq's tests are still print and compare tests.
>
> regards,
> Bow
>
> [1] See the original feature request here:
> https://redmine.open-bio.org/issues/3221
> https://redmine.open-bio.org/issues/3220
> https://redmine.open-bio.org/issues/3219

I don't think a purely documentation focused project is eligible
for GSoC. But both ideas make sense separately from GSoC.

Regards,

Peter


From p.j.a.cock at googlemail.com  Thu Mar 21 17:36:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:36:24 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
Message-ID: <CAKVJ-_7VQ3AS=A-VJ9XxzDTwx0qvrCBEQdKmJwHf8NyXXu3aLg@mail.gmail.com>

On Thu, Mar 21, 2013 at 4:29 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Mar 21, 2013 at 4:11 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Right now we need to put this list of ideas on the wiki page (ready
>> for combining into the OBF page which will be shown to Google
>> to make our case for taking part in the GSoC 2013 program).
>> http://biopython.org/wiki/Google_Summer_of_Code
>>
>> If any of you as a potential mentor want to put up an outline
>> proposal, even better.
>>
>
> I've been wondering about potential GSoC projects which I'd
> be interested in mentoring (or co-mentoring), and thus far I've
> only got one outline idea.
>
> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
> functionality (which does whole record parsing on demand)
> and extending this with lazy-loading or lazy-parsing (which
> has precedent in our BioSQL wrappers). For example, with
> whole genome FASTA files you may never need to load the
> entire sequence, but using an index system like tabix (or
> even actually using a tabix index) Biopython could provide
> a lazy-loading Seq object which extracts only the sequence
> region of interest on demand.
>
> The same idea applies to richer file formats too, like EMBL
> and GenBank. ...
>
> Likewise, this makes sense for GTF/GFF/GFF3 ...

P.S. An example use case, http://www.biostars.org/p/64363/

Part of this work could include enhancements to the SeqRecord
handling of SeqFeatures - offering more than just the current
simple list - for example lookup by ID, dbxref, or position. That
would be nice to have now with the current in-memory parsers.

An old but still relevant example usecase:
http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features

Regards,

Peter


From eric.talevich at gmail.com  Thu Mar 21 17:42:19 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 21 Mar 2013 13:42:19 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
Message-ID: <CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>

On Thu, Mar 21, 2013 at 12:55 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > I like Michiel's idea, and I'll suggest two more:
> >
> > 1. Codon alignment & analysis:
> > - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
> protein
> > sequence alignment to a codon alignment. (Previously discussed)
>
> e.g.
> https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py


Well, check you out. Would you be interested in mentoring this project?


> > - dN/dS and the related functions needed to calculate it.
> > - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage
> of
> > codon alignments, including validation (testing for frame shifts etc.)
>
>
> http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis
>
> I see you've started fleshing this idea out on the wiki, which is great.
> Right now it seems a little on the light weight side - or is that
> deliberate
> (to see if a student can take this idea and come up with a solid
> project proposal in this area)? Things like model selection might
> be a fun extension - I can think of a local expert who would be
> great to get involved on the science side if he's interested.
>

I put up a quick sketch to avoid locking the wiki page for too long, but
also deliberately left it vague to see where the applicants take it. Model
selection would be cool, I added it. Local expert, also great.


> Alternatively this could include doing some more general work
> on the alignment object - for instance per-column-annotation
> for things like a consensus sequence - or an array-of-char
> implementation as an alternative to the list-of-SeqRecords
> we have now (with its poor column access speed).
>
> Peter
>

I wonder if that's something we could just do incrementally -- change the
MultipleSeqAlignment class to store a list-of-lists-of chars (or
list-of-strings), a list of SeqRecord-like husks (all the annotations, but
without the Seq itself) for each row, a list of column annotations, and a
single alphabet for the whole alignment.

How do you suppose the speed of that would compare to the current
list-of-SeqRecords, and also to that of a wrapped NumPy matrix? Would it be
a significant enough speed improvement to justify both replacing the
current implementation, and to make the NumPy approach less tempting (given
PyPy's progress toward including a compliant implementation)?
Alternatively, we could post a GSoC project for creating a separate
TurboAlignment class/module based on NumPy which would be mostly
interchangeable and interconvertible with the pure-Python version in the
Biopython core.

Speaking of which, should we also post the idea of storing sequences as an
efficient byte array, BioJava-style?

-Eric


From p.j.a.cock at googlemail.com  Thu Mar 21 17:59:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 21 Mar 2013 17:59:10 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CAKVJ-_7CNkGiKS2=ipQu5EzmKifeeOMEqfa7X2yuW+SOkVQfQw@mail.gmail.com>
	<CAMC681k8VoX3pe+XxXF1P8nSSR5tuMpVooKJtQEdsFksL9m2=w@mail.gmail.com>
Message-ID: <CAKVJ-_60s40p0PNihxayxFEWRg+jx=KbxRsOO5z=AXO0hgq8Mw@mail.gmail.com>

On Thu, Mar 21, 2013 at 5:42 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Thu, Mar 21, 2013 at 12:55 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Wed, Mar 13, 2013 at 6:32 PM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> > I like Michiel's idea, and I'll suggest two more:
>> >
>> > 1. Codon alignment & analysis:
>> > - PAL2NAL-style conversion of unaligned nucleic acid sequences and a
>> > protein
>> > sequence alignment to a codon alignment. (Previously discussed)
>>
>> e.g.
>> https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py
>
> Well, check you out. Would you be interested in mentoring this project?
>

If I'm not primary mentor on another project, I'd be open to co-mentoring
something on the alignment side.

>> > - dN/dS and the related functions needed to calculate it.
>> > - Possible AlignIO or MultipleSeqAlignment tweaks to take full advantage
>> > of
>> > codon alignments, including validation (testing for frame shifts etc.)
>>
>>
>> http://biopython.org/wiki/Google_Summer_of_Code#Codon_alignment_and_analysis
>>
>> I see you've started fleshing this idea out on the wiki, which is great.
>> Right now it seems a little on the light weight side - or is that
>> deliberate
>> (to see if a student can take this idea and come up with a solid
>> project proposal in this area)? Things like model selection might
>> be a fun extension - I can think of a local expert who would be
>> great to get involved on the science side if he's interested.
>
>
> I put up a quick sketch to avoid locking the wiki page for too long, but
> also deliberately left it vague to see where the applicants take it. Model
> selection would be cool, I added it. Local expert, also great.

If he's available and willing, yes. I've not mentioned this to him
yet so no promises - the idea only occurred to me while writing
that email ;)

>>
>> Alternatively this could include doing some more general work
>> on the alignment object - for instance per-column-annotation
>> for things like a consensus sequence - or an array-of-char
>> implementation as an alternative to the list-of-SeqRecords
>> we have now (with its poor column access speed).
>>
>> Peter
>
>
> I wonder if that's something we could just do incrementally -- change the
> MultipleSeqAlignment class to store a list-of-lists-of chars (or
> list-of-strings), a list of SeqRecord-like husks (all the annotations, but
> without the Seq itself) for each row, a list of column annotations, and a
> single alphabet for the whole alignment.
>
> How do you suppose the speed of that would compare to the current
> list-of-SeqRecords, and also to that of a wrapped NumPy matrix? Would it be
> a significant enough speed improvement to justify both replacing the current
> implementation, and to make the NumPy approach less tempting (given PyPy's
> progress toward including a compliant implementation)? Alternatively, we
> could post a GSoC project for creating a separate TurboAlignment
> class/module based on NumPy which would be mostly interchangeable and
> interconvertible with the pure-Python version in the Biopython core.

When I said array-of-char I did have NumPy in mind, and PyPy does now
cope with two or more dimensional arrays in NumPyPy. Note that NumPy
handles both row and column orientated arrays with a simple class init
option, so this can easily be setup to favour row or column access.

Last time I did anything with the alignment object where column access
was a bottleneck (calculating mutual information between columns), I
just loaded all the columns into memory as a list of strings, and computed
on that. It worked very nicely.

> Speaking of which, should we also post the idea of storing sequences as an
> efficient byte array, BioJava-style?

I'd wondered about that (in combination with the discussion about strict
alphabet checking), but is there enough for a whole GSoC project?
Related to this one could look at something with k-mer hashes...

(Its good to see lots of possible project ideas bouncing around)

Peter


From chapmanb at 50mail.com  Fri Mar 22 12:48:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 22 Mar 2013 08:48:34 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
Message-ID: <87zjxvsiql.fsf@fastmail.fm>


Peter;

> I've been wondering about potential GSoC projects which I'd
> be interested in mentoring (or co-mentoring), and thus far I've
> only got one outline idea.
>
> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
> functionality (which does whole record parsing on demand)
> and extending this with lazy-loading or lazy-parsing (which
> has precedent in our BioSQL wrappers). For example, with
> whole genome FASTA files you may never need to load the
> entire sequence, but using an index system like tabix (or
> even actually using a tabix index) Biopython could provide
> a lazy-loading Seq object which extracts only the sequence
> region of interest on demand.

This sounds incredibly useful. It's definitely worthwhile writing up if
you'll have time this summer to mentor it.

> Likewise, this makes sense for GTF/GFF/GFF3 where you
> would index the features, and also if present index the
> embedded FASTA sequence at the end of the file.

I'm cc'ing Ryan, who has been thinking about similar work as part of
gffutils. We're planning now on an approach that takes the BCBio.GFF
parsing and rolls it into gffutils so we can parse, index in a SQLite
database and expose as Biopython objects. Here is some initial
discussion and planning:

https://github.com/daler/gffutils/issues/2
https://docs.google.com/document/d/15l_yZ_pge22ETw-pz2g4NWRAUAccmr1MYPmqXbj1Jl8/edit?usp=sharing

Brad


From dalerr at niddk.nih.gov  Fri Mar 22 16:20:45 2013
From: dalerr at niddk.nih.gov (Ryan Dale)
Date: Fri, 22 Mar 2013 12:20:45 -0400
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <87zjxvsiql.fsf@fastmail.fm>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
	<1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>
	<CAMC681kHT8j6bK_byXzjr4F5T5kVdcW6AkU=cgszjx=rokoNEg@mail.gmail.com>
	<CABHxouWGVH6R=pXTVp0yHAm62Gfu+sKSNrhLR+g3ydvoeO02bw@mail.gmail.com>
	<CAKVJ-_74KnM9H5O1vb2hAwTm3HUN2u70NY2KZ_EGDYhm9Jc4zA@mail.gmail.com>
	<CAKVJ-_45Y559rqS=A4Ho-NAAMdsjUk1JA5JDVwidhdRLJC8rHQ@mail.gmail.com>
	<87zjxvsiql.fsf@fastmail.fm>
Message-ID: <514C84DD.9070306@niddk.nih.gov>

Hi Brad & Peter -


On 03/22/2013 08:48 AM, Brad Chapman wrote:
> Peter;
>
>> I've been wondering about potential GSoC projects which I'd
>> be interested in mentoring (or co-mentoring), and thus far I've
>> only got one outline idea.
>>
>> I'm interested in taking the Bio.SeqIO.index(...) / index_db(...)
>> functionality (which does whole record parsing on demand)
>> and extending this with lazy-loading or lazy-parsing (which
>> has precedent in our BioSQL wrappers). For example, with
>> whole genome FASTA files you may never need to load the
>> entire sequence, but using an index system like tabix (or
>> even actually using a tabix index) Biopython could provide
>> a lazy-loading Seq object which extracts only the sequence
>> region of interest on demand.
> This sounds incredibly useful. It's definitely worthwhile writing up if
> you'll have time this summer to mentor it.

Agreed - a general, lazy-loading/lazy-parsing, indexed mechanism for 
accessing data annotation-like file formats would be fantastic.

>> Likewise, this makes sense for GTF/GFF/GFF3 where you
>> would index the features, and also if present index the
>> embedded FASTA sequence at the end of the file.
> I'm cc'ing Ryan, who has been thinking about similar work as part of
> gffutils. We're planning now on an approach that takes the BCBio.GFF
> parsing and rolls it into gffutils so we can parse, index in a SQLite
> database and expose as Biopython objects. Here is some initial
> discussion and planning:
>
> https://github.com/daler/gffutils/issues/2
> https://docs.google.com/document/d/15l_yZ_pge22ETw-pz2g4NWRAUAccmr1MYPmqXbj1Jl8/edit?usp=sharing

As Peter pointed out on the GitHub issues page, what he has in mind is 
more general than just GFF/GTF, and I see gffutils as extending upon a 
specific subset of the functionality he proposes.

For example, there are common use-cases that I think make sense for a 
GFF/GTF-only library (say, adding new annotations for introns, as 
inferred from the isoform + exon annotations) that might not be readily 
generalizable to all annotation-like file formats. But if this general 
indexing approach were already available, then gffutils could just be a 
wrapper around that, adding the specific GFF/GTF functionality as 
another layer.

Then again . . . currently gffutils imports GFF data into a sqlite3 
database, so data are persistent and both read/write.  For the 
intron-inferring example, we simply add new records to the db, but with 
an indexing approach, the file would presumably have to be re-indexed 
before reading again.  So how you'd like to use your GFF files 
(read-only vs read/write) would influence which strategy you'd chooses.

So I think there's actually smaller-than-expected overlap between 
gffutils and Peter's general indexing idea, and in the context of GSoC, 
I'm not sure you'd have to take gffutils into account.  But gffutils 
would certainly benefit from general indexing, especially when 
retrieving sequences for features!

-ryan


From mjldehoon at yahoo.com  Tue Mar 26 13:21:35 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 26 Mar 2013 06:21:35 -0700 (PDT)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
Message-ID: <1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi all,


Speaking of which, we have a Biopython Structural
 Bioinformatics FAQ (i.e. how to use the Bio.PDB module) on the Biopython website with additional information on Bio.PDB, including some information on things that are not in the main Biopython Tutorial. Perhaps this is a good time to integrate this FAQ into the main documentation?


We could also update it a bit because it's been a while and there are some different things here and there. And additions too.


I went over the Biopython Structural Bioinformatics FAQ and integrated it into the main Biopython tutorial; see

biopython.org/DIST/docs/tutorial/Tutorial-dev.html
or
biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

Though I think everything is there, it may be good if somebody more experienced with Bio.PDB were to look it over to see if it still makes sense.

In addition, I converted the Biopython Structural Bioinformatics FAQ to our wiki format and added it to our wiki documentation; see
http://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ
This wiki now contains the exact same information (except for some minor updates/fixes) as the PDF with the Biopython Structural Bioinformatics FAQ that we have on the Biopython website.

I guess with this we can remove the lyx/tex source code of the Biopython Structural Bioinformatics FAQ from the git repository, as well as the PDF from the Biopython website. Any objections?

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Tue Mar 26 13:53:52 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Mar 2013 13:53:52 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
	<1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>

On Tue, Mar 26, 2013 at 1:21 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> I guess with this we can remove the lyx/tex source code of the Biopython Structural Bioinformatics FAQ from the git repository, as well as the PDF from the Biopython website. Any objections?
>

Good work Michiel :)

I would suggest making a final revision to the Biopython Structural
Bioinformatics
FAQ to explain this document is now obsolete, and where the information has
moved to. Commit that to git, and put the final PDF online replacing the current
version. That way anyone looking at the PDF online (or the git
history) will have
a clear route to finding the current information.

Thanks,

Peter


From anaryin at gmail.com  Tue Mar 26 13:54:55 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Mar 2013 14:54:55 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>
References: <CAJ9sUYNQMu1x-4vx0gqQX4=hmAAro7GE=Lr2cPOgL7kr792O6g@mail.gmail.com>
	<1364304095.69042.YahooMailClassic@web164002.mail.gq1.yahoo.com>
	<CAKVJ-_7n6RzA6e5ZV2Mn0BtyrcsD--MxSa6f3U8rLEYM9E6DLw@mail.gmail.com>
Message-ID: <CAJ9sUYN5CBMYrpXpK=pB_v5tVbT6mjLNs_9oZ+R+-FR6RWwa-A@mail.gmail.com>

Great work!

I'll go over it in the next few days.


2013/3/26 Peter Cock <p.j.a.cock at googlemail.com>

> On Tue, Mar 26, 2013 at 1:21 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >
> > I guess with this we can remove the lyx/tex source code of the Biopython
> Structural Bioinformatics FAQ from the git repository, as well as the PDF
> from the Biopython website. Any objections?
> >
>
> Good work Michiel :)
>
> I would suggest making a final revision to the Biopython Structural
> Bioinformatics
> FAQ to explain this document is now obsolete, and where the information has
> moved to. Commit that to git, and put the final PDF online replacing the
> current
> version. That way anyone looking at the PDF online (or the git
> history) will have
> a clear route to finding the current information.
>
> Thanks,
>
> Peter
>


From lara.vignotto at gmail.com  Wed Mar 27 14:09:50 2013
From: lara.vignotto at gmail.com (Lara Vignotto)
Date: Wed, 27 Mar 2013 15:09:50 +0100
Subject: [Biopython-dev] [GSoC] Further info about Codon alignment idea
Message-ID: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>

Hello,
I'm a student from Italy. I'm attending the first year of Biotechnology at
the University of Udine, and I'm interested about the Codon alignment and
analysis project proposed fot the Google Summer of Code 2013.
Since I would like to know if I have got the skills required to contribute,
can you tell me more about the project?

Regards,
Lara Vignotto


From 88whacko at gmail.com  Thu Mar 28 10:39:07 2013
From: 88whacko at gmail.com (Andrea Rizzi)
Date: Thu, 28 Mar 2013 11:39:07 +0100
Subject: [Biopython-dev] New contributor
In-Reply-To: <CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>
References: <CANKZbvM0tveKZL1dm8RRphh4gyPiDcuHpbLkm1k37TKvqaibpA@mail.gmail.com>
	<1363185895.3324.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CANKZbvNmQy25k9D3PtbGf+sLNZRQ3K8_K9qGXdnhvoV_J1w5nA@mail.gmail.com>
	<CAKVJ-_43AKSoGRPhW1M6vYnbk+HedYeTLEKUxo5VGT9A12u6BA@mail.gmail.com>
Message-ID: <CANKZbvPB549up+O7Q4fGmmOX2EBshYtxBbRA6jmqrMsrVkD7yg@mail.gmail.com>

Thank you for the great feedback Peter.

I'll write a test case for Bio.Alphabet then since I couldn't find any.
When it's ready I'll request a pull.

Thank you again!
Andrea


2013/3/21 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Mar 20, 2013 at 6:10 PM, Andrea Rizzi <88whacko at gmail.com> wrote:
> > Thank you for your welcome Michiel!
> >
> > I will looking for a good project to work on in the next few days and I
> > will let you know soon. Meanwhile I've started to read some code to
> become
> > familiar with the modules and I bumped into few small bugs concerning the
> > Seq objects, in particular I found:
> >
> > 1) a duplicated test method name (one test in test_Seq_objs.py wasn't
> > performed);
> > 2) an error in Alphabet._case_less().
>
> Well spotted - changes applied to the master, thanks.
>
> > I've also expanded a little bit the documentation and I've substituted
> > tostring() method with the suggested str() method in a function of
> > MutableSeq. The branch is located here
> >
> > https://github.com/andrrizzi/biopython/tree/seq-branch
> >
> > I'm not sure if it is more comfortable for you to merge this kind of
> > commits from a git branch or it is more advisable to open a ticket and
> > create a patch. Anyway if you think this small commits may be useful,
> feel
> > free to use them.
>
> If you're happy on GitHub, a pull request is simplest. I've looked
> at these changes one by one and applied and/or commented
> on them.
>
> (We're debating moving our issue tracker from RedMine to
> GitHub, which would make things a little easier in future).
>
> Thank you!
>
> Peter
>


From p.j.a.cock at googlemail.com  Thu Mar 28 13:39:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 13:39:57 +0000
Subject: [Biopython-dev] [GSoC] Further info about Codon alignment idea
In-Reply-To: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>
References: <CAJnvgQhTTioX4+pZOmf7=XbuRBhfUEwjSXsQZQmzcZwkRcT+MQ@mail.gmail.com>
Message-ID: <CAKVJ-_7LfAmbnLvUXtLaS+KmQ+3MXtVf5MLWcHbEWxmmfvSNQA@mail.gmail.com>

On Wed, Mar 27, 2013 at 2:09 PM, Lara Vignotto <lara.vignotto at gmail.com> wrote:
> Hello,
> I'm a student from Italy. I'm attending the first year of Biotechnology at
> the University of Udine, and I'm interested about the Codon alignment and
> analysis project proposed fot the Google Summer of Code 2013.
> Since I would like to know if I have got the skills required to contribute,
> can you tell me more about the project?
>
> Regards,
> Lara Vignotto

Hi Lara,

Welcome and thank you for your interest in taking part in GSoC 2013.

The background discussion to the outline idea on the wiki was here:
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010449.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010471.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010474.html
http://lists.open-bio.org/pipermail/biopython-dev/2013-March/010475.html
(I think that was all the posts - check the archive to be sure).

The text of the wiki is hopefully enough to spark your interest - what
we're really like to see is a student intrigued by the idea and driven
to expand the topic into a full project proposal. If for example your
current course work included some phylogenetics that might help
give you perspective about what is useful and worth adding to
Biopython. You should probably also have a look at the NESCent
GSoC project ideas if it is the phylogenetic side that really interest
you - in previous years Biopython has mentored GSoC students
with NESCent:
http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013

You would also need to be competent with Python - although if you
also know and love Perl or Ruby (etc) there might be a mentor
willing to supervise a related project with BioPerl or BioRuby -
that's good too from the wider OBF and Bio* perspective.

For tree traversal some back ground reading on things like
breadth first search and other algorithms for 'walking' the
tree would be a good idea (see also the Python os.path
module for 'walking' a file system tree).

I'm sure there will be other technical things to learn about and use,
depending on where a GSoC project based on this idea went.

Did that help? Is there something more specific I can try to
answer?

Regards,

Peter


From p.j.a.cock at googlemail.com  Thu Mar 28 15:44:11 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 15:44:11 +0000
Subject: [Biopython-dev] Fwd: [biopython] Custom GenBank locus length (#171)
In-Reply-To: <biopython/biopython/pull/171@github.com>
References: <biopython/biopython/pull/171@github.com>
Message-ID: <CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>

For those not getting the pull request emails from GitHub,

---------- Forwarded message ----------
From: Marco Galardini <notifications at github.com>
Date: Thu, Mar 28, 2013 at 3:19 PM
Subject: [biopython] Custom GenBank locus length (#171)
To: biopython/biopython <biopython at noreply.github.com>


Instead of an exception, raise a warning, so the file is saved and the
user can decide to correct the error.

I don't know if this is a good pratice, but I have some GenBank files
provided by the JGI/DOE with locus names longer than 16 chars, so I
guess that providing a warning to the user instead of a complete
failure could be better.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/mgalardini/biopython patch-1

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/171

Commit Summary

Custom GenBank locus length

File Changes

M Bio/SeqIO/InsdcIO.py (4)

Patch Links:

https://github.com/biopython/biopython/pull/171.patch
https://github.com/biopython/biopython/pull/171.diff


From marco.galardini at unifi.it  Thu Mar 28 15:54:38 2013
From: marco.galardini at unifi.it (Marco Galardini)
Date: Thu, 28 Mar 2013 16:54:38 +0100
Subject: [Biopython-dev] Fwd: [biopython] Custom GenBank locus length
 (#171)
In-Reply-To: <CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>
References: <biopython/biopython/pull/171@github.com>
	<CAKVJ-_5PEWquDm0_kjC7E=jusOwWL2ZVW066je1YTfUcWWrGdQ@mail.gmail.com>
Message-ID: <515467BE.7090105@unifi.it>

Good afternoon everyone,

Actually, i have been testing a bit more and some other changes may be 
needed (sorry about that, this is my first change to the biopython code).
The assertions on the lines length still fail, so my guess is that 
probably it's not a good idea to try to write down a genbank with 
unusual identifiers (even if they are from JGI!).

Marco

On 03/28/2013 04:44 PM, Peter Cock wrote:
> For those not getting the pull request emails from GitHub,
>
> ---------- Forwarded message ----------
> From: Marco Galardini <notifications at github.com>
> Date: Thu, Mar 28, 2013 at 3:19 PM
> Subject: [biopython] Custom GenBank locus length (#171)
> To: biopython/biopython <biopython at noreply.github.com>
>
>
> Instead of an exception, raise a warning, so the file is saved and the
> user can decide to correct the error.
>
> I don't know if this is a good pratice, but I have some GenBank files
> provided by the JGI/DOE with locus names longer than 16 chars, so I
> guess that providing a warning to the user instead of a complete
> failure could be better.
>
> ________________________________
>
> You can merge this Pull Request by running
>
>    git pull https://github.com/mgalardini/biopython patch-1
>
> Or view, comment on, or merge it at:
>
>    https://github.com/biopython/biopython/pull/171
>
> Commit Summary
>
> Custom GenBank locus length
>
> File Changes
>
> M Bio/SeqIO/InsdcIO.py (4)
>
> Patch Links:
>
> https://github.com/biopython/biopython/pull/171.patch
> https://github.com/biopython/biopython/pull/171.diff
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


-- 
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------


From p.j.a.cock at googlemail.com  Thu Mar 28 18:00:38 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 18:00:38 +0000
Subject: [Biopython-dev] stdout/stderr handling oddity
Message-ID: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>

Hi all,

While looking at the BWA wrapper from Saket Choudhary
https://github.com/biopython/biopython/pull/167 and the
associated enhancement to the __call__ functionality of the
command line wrapper base class, I wrote a couple of unit
tests - which have left me a little puzzled:

https://github.com/biopython/biopython/commit/3f5d4c442424a7ca33ae0bafa60c840e80ae2fda

Could a few of you try running this test_Application.py
file and confirm it works as is, and try uncommenting the
two problem test cases?

(I'm curious if the echo test works as intended on a plain
Windows machine without cygwin installed - I hope so).

Unless anyone else can explain this, I think the next step
is a simple test program which produces predictable
output to both stdout and stderr, just in case this is due
to there being no stderr output in these tests.

e.g. Print integers 1, 2, 3, 4, ..., to some sensible limit,
like 20, where non-primes are on stdout while primes on
stderr.

Peter


From arklenna at gmail.com  Thu Mar 28 20:54:11 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 28 Mar 2013 16:54:11 -0400
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
Message-ID: <CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>

Hi Peter,

On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
Application __init__.py causes the tests to pass for me.

Lenna


From saketkc at gmail.com  Thu Mar 28 20:57:54 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 29 Mar 2013 02:27:54 +0530
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
Message-ID: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>

Yes.
And the reason is this
:http://stackoverflow.com/questions/2368967/bad-file-descriptor-error

On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi Peter,
>
> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
> Application __init__.py causes the tests to pass for me.
>
> Lenna
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From saketkc at gmail.com  Thu Mar 28 21:00:00 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 29 Mar 2013 02:30:00 +0530
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
	<CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
Message-ID: <CAEDHeisqqY2TcQz5wGsDfduMuhGjs61VbBffZx7Mb0+3nseVYA@mail.gmail.com>

Forgot to add : Tested on Ubuntu 12.04

On 29 March 2013 02:27, Saket Choudhary <saketkc at gmail.com> wrote:
> Yes.
> And the reason is this
> :http://stackoverflow.com/questions/2368967/bad-file-descriptor-error
>
> On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
>> Hi Peter,
>>
>> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
>> Application __init__.py causes the tests to pass for me.
>>
>> Lenna
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Thu Mar 28 22:11:11 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 28 Mar 2013 22:11:11 +0000
Subject: [Biopython-dev] stdout/stderr handling oddity
In-Reply-To: <CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
References: <CAKVJ-_65RBkCh-5tV+rmEJmv29FKPW8YhiNdVS5g3WT8weXWkA@mail.gmail.com>
	<CALfq9tK9Mm7faKA6Cx2_ikV7thCPtLxRfFiwbkHGwYjOnMpf9w@mail.gmail.com>
	<CAEDHeis+xQXYhHVe6ziTZ-59Xe6hMGKMZZ5fvB7VNnNvmzaAtQ@mail.gmail.com>
Message-ID: <CAKVJ-_794tzTdsmBiYfcxs0y_5ksjK5rC2qKBUrrUUogwJhLzA@mail.gmail.com>

> On 29 March 2013 02:24, Lenna Peterson <arklenna at gmail.com> wrote:
>> Hi Peter,
>>
>> On Mac OS X, opening os.devnull with mode 'w' on lines 418 and 422 of the
>> Application __init__.py causes the tests to pass for me.
>>
>> Lenna

On Thu, Mar 28, 2013 at 8:57 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> Yes.
> And the reason is this
> :http://stackoverflow.com/questions/2368967/bad-file-descriptor-error
>

Thank you both - I am kicking myself now - maybe I should have
taken another sick day this week instead of returning to work? ;)

Fixed:
https://github.com/biopython/biopython/commit/bba2acbf3d690ad7b99e94ac8ead6763b1d05ab8

I guess no one had bothered to using this option to send stderr
to /dev/null - or if they had never reported this error. The only
thing which puzzles me is why this worked for stdout. Odd.

Cheers,

Peter


From p.j.a.cock at googlemail.com  Fri Mar 29 11:54:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 29 Mar 2013 11:54:33 +0000
Subject: [Biopython-dev] Fwd: [biopython] Fix Biopython installation with
	pip (#172)
In-Reply-To: <biopython/biopython/pull/172@github.com>
References: <biopython/biopython/pull/172@github.com>
Message-ID: <CAKVJ-_6RBuZiET0bVpWohPj9WOj5Dnt0aX2uR3WJhPr-RxmegA@mail.gmail.com>

Hi Brad,

This sounds sensible in principle - it just needs some hands on testing
on various systems - any volunteers who use PIP and virtual envs?

Thanks,

Peter

---------- Forwarded message ----------
From: Brad Chapman <notifications at github.com>
Date: Fri, Mar 29, 2013 at 11:47 AM
Subject: [biopython] Fix Biopython installation with pip (#172)
To: biopython/biopython <biopython at noreply.github.com>


Hi all;
This is yet another take on making Biopython install nicely with pip
in virtual environments. This avoids adding numpy as an explicit
dependency and instead uses it if present or skips it if not.

The problem with the previous install_requires approach is that pip
doesn't build and install all requirements before setting up
Biopython, so Biopython will fail with a numpy missing error.
Additionally, our old approach drags in numpy so creates a heavyweight
dependency for isolated environments.

The new approach requires users to explicitly install numpy if needed
but doesn't penalize them if it's not present.

I submitted as a pull request for documentation and feedback from
anyone. If y'all agree, merge away. Thanks,
Brad

________________________________

You can merge this Pull Request by running

  git pull https://github.com/chapmanb/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/172

Commit Summary

Improve Biopython installation with pip: avoid including numpy as
dependency when automated. Instead explicitly avoid needing numpy
installed to continue
Add helpful comment on pip dependency management

File Changes

M setup.py (38)

Patch Links:

https://github.com/biopython/biopython/pull/172.patch
https://github.com/biopython/biopython/pull/172.diff