From mjldehoon at yahoo.com  Fri Feb  1 00:35:18 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 21:35:18 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
Message-ID: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Peter, Bow,

> > I'm OK with using the setUp and tearDown arguments to
> > doctest.DocTestSuite to do the directory magic, but
> keeping the test files
> > under Tests/.
> 
> As a more elegant version of the Bio._utils.run_doctest()
> function?

Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers).

Best,
-Michiel.

From w.arindrarto at gmail.com  Fri Feb  1 05:29:59 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 11:29:59 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
	<1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>

Hi Michiel, Peter, everyone,

>> > I'm OK with using the setUp and tearDown arguments to
>> > doctest.DocTestSuite to do the directory magic, but
>> keeping the test files
>> > under Tests/.
>>
>> As a more elegant version of the Bio._utils.run_doctest()
>> function?
>
> Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers).

Just to be clear, we are:

* changing all module's doctest file path to use relative paths (with
respect to the module's location),
* replacing the run_doctest() import with a simpler doctest import and
`doctest.testmod()` in each module having this doctest
* resorting to setUp and tearDown in the DocTestSuite in
`run_tests.py` so that each module / submodule can find their test
files
* and refactoring all string functions in Bio._utils to Bio.Phylo and
Bio.SearchIO, so that we can remove Bio._utils,

right?

I'd be happy to give this a shot if everyone feels the same :).

Regards,
Bow

From p.j.a.cock at googlemail.com  Fri Feb  1 06:07:22 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 11:07:22 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
References: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
	<1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
Message-ID: <CAKVJ-_4ZYT37J2aA=JSm0WGMe50gfH6u4G2o6eLTBsEaT9q6LA@mail.gmail.com>

On Fri, Feb 1, 2013 at 10:29 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Michiel, Peter, everyone,
>
>>> > I'm OK with using the setUp and tearDown arguments to
>>> > doctest.DocTestSuite to do the directory magic, but
>>> > keeping the test files
>>> > under Tests/.
>>>
>>> As a more elegant version of the Bio._utils.run_doctest()
>>> function?
>>
>> Exactly. Bow, do you want to give this approach a try?
>> (Assuming that there are no further objections from the other developers).
>
> Just to be clear, we are:
>
> * changing all module's doctest file path to use relative paths (with
> respect to the module's location),
> * replacing the run_doctest() import with a simpler doctest import and
> `doctest.testmod()` in each module having this doctest
> * resorting to setUp and tearDown in the DocTestSuite in
> `run_tests.py` so that each module / submodule can find their test
> files

That wasn't my understanding - I thought we we just talking about
making the Bio._utils.run_doctest() use setUp and tearDown to
take care of the path changes (although I'm not sure if that will
actually be any shorter - we'd find out).

> * and refactoring all string functions in Bio._utils to Bio.Phylo and
> Bio.SearchIO, so that we can remove Bio._utils,

I'm not particularly bothered either way on this. Having misc utilities
like this under Bio.Phylo or Bio.SearchIO makes is clear where they
are used, and makes it easier to compartmentalise functionality.

Regards,

Peter

From mjldehoon at yahoo.com  Fri Feb  1 06:23:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 03:23:15 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
Message-ID: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Bow,

Yes, that is correct.
Responding to Peter's email: Peter, do you agree with this approach?

Best,
-Michiel.

--- On Fri, 2/1/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:

> From: Wibowo Arindrarto <w.arindrarto at gmail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Peter Cock" <p.j.a.cock at googlemail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 5:29 AM
> Hi Michiel, Peter, everyone,
> 
> >> > I'm OK with using the setUp and tearDown
> arguments to
> >> > doctest.DocTestSuite to do the directory
> magic, but
> >> keeping the test files
> >> > under Tests/.
> >>
> >> As a more elegant version of the
> Bio._utils.run_doctest()
> >> function?
> >
> > Exactly. Bow, do you want to give this approach a try?
> (Assuming that there are no further objections from the
> other developers).
> 
> Just to be clear, we are:
> 
> * changing all module's doctest file path to use relative
> paths (with
> respect to the module's location),
> * replacing the run_doctest() import with a simpler doctest
> import and
> `doctest.testmod()` in each module having this doctest
> * resorting to setUp and tearDown in the DocTestSuite in
> `run_tests.py` so that each module / submodule can find
> their test
> files
> * and refactoring all string functions in Bio._utils to
> Bio.Phylo and
> Bio.SearchIO, so that we can remove Bio._utils,
> 
> right?
> 
> I'd be happy to give this a shot if everyone feels the same
> :).
> 
> Regards,
> Bow
> 

From p.j.a.cock at googlemail.com  Fri Feb  1 06:51:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 11:51:16 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
	<CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
	<CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>
Message-ID: <CAKVJ-_7dyse3-+n9kL8+P9hJysCDqS4q_dg7MKTkhfxOrbxo9A@mail.gmail.com>

On Thu, Jan 31, 2013 at 11:38 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list),
> which I have applied to the repository:
> https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74
>
> I've also added a unit test based on Kevin's example:
> https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a
>
> Thank you for reporting this Kevin.
>
> Peter
>
> P.S. Nice to hear from you again Jeff :)
>
> I think your last commit was before we moved from CVS to git, please
> let us know if you want commit access on github.

Thanks again to Kevin for another test case, and a Jeff for another quick
code fix where a trie key exceeded the MAX_KEY_LENGTH buffer:

https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b

Peter

From redmine at redmine.open-bio.org  Fri Feb  1 06:51:51 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 1 Feb 2013 11:51:51 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15070.20130201115151@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Kevin Wu reported a related issue, which we discussed with Jeff Chang (off list), where a key in the trie exceeded 1000 bytes (the original value of MAX_KEY_LENGTH). See:
http://lists.open-bio.org/pipermail/biopython-dev/2013-February/010284.html
https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b

(Ideally we could give a specific ValueError exception here, but nevertheless the current print message is an improvement)
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Fri Feb  1 07:14:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 12:14:49 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
	<1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>

>Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
>> Just to be clear, we are:
>>
>> * changing all module's doctest file path to use relative
>> paths (with
>> respect to the module's location),
>> * replacing the run_doctest() import with a simpler doctest
>> import and
>> `doctest.testmod()` in each module having this doctest
>> * resorting to setUp and tearDown in the DocTestSuite in
>> `run_tests.py` so that each module / submodule can find
>> their test
>> files
>> * and refactoring all string functions in Bio._utils to
>> Bio.Phylo and
>> Bio.SearchIO, so that we can remove Bio._utils,
>>
>> right?
>>
>> I'd be happy to give this a shot if everyone feels the same
>> :).
>>

On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Bow,
>
> Yes, that is correct.
> Responding to Peter's email: Peter, do you agree with this approach?
>
> Best,
> -Michiel.

No. I think we have misunderstood each other on the doctest
discussion :(

If we keep the test files under Tests/ (and I think that is best)
then for example look at this doctest in Bio/SeqRecord.py

        >>> from Bio import SeqIO
        >>> record = SeqIO.read(open("Fasta/sweetpea.nu"),"fasta")
        >>> len(record)
        309

That is currently written to assume it is run from the Tests/
folder. If we write this assuming is it in the Bio/ folder where
the Python file SeqRecord.py lives, it becomes:

        >>> from Bio import SeqIO
        >>> record = SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta")
        >>> len(record)
        309

I think a beginner would find that more confusing. It is also longer
and we already have trouble with some lines exceeding 80 chars.

Ideally there would be a nice way for doctests to specify the folder,
and then we could use a simple filename like "sweetpea.nu" with
no directories at all. But I don't think that is possible without us
making the testing infrastructure even more complicated.

--

If we want to get rid of Bio._utils.run_doctest() (and the whole of
the file Bio/_utils.py) then I would prefer reverting to the old situation
prior to adding the Bio._utils.run_doctest() helper function.

If the repetitive code snippets to run the doctests of a module are a
problem it can be shortened to something less flexible, for example
in Bio/SeqRecord.py could use something very short like this:

if __name__ == "__main__":
    assert os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
    import doctest
    doctest.testmod(verbose=2)

Or, as I suggested before, we can remove these development
convenience hooks completely?

--

On the subject of the string functions in Bio/_utils.py, I have no
objection to moving them back under Bio.SearchIO and/or
Bio.Phylo - which has advantages in terms of modularity (a
good thing for preventing accidental side effects).

Regards,

Peter

From mjldehoon at yahoo.com  Fri Feb  1 08:54:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 05:54:46 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
Message-ID: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Lenna,

> Regarding point (2), is your primary concern namespace clutter or
> importing efficiency? 

Regarding point (2), my primary concern is that a Bio.WWW module would group together modules that don't have much in common with each other. I agree to your point that the category of internet access is more fundamental than the category of parsers. But still, which modules should then go into a Bio.WWW module? Any module whose sole purpose is to use the internet (that would exclude Bio.Entrez)? Any module whose main purpose is to use the internet? This would be unclear; for example, Bio.Entrez may or may not fall in that category, depending on how you use the module. Any module whose functionality includes internet access? Then if one day we add access to the JASPAR database over the internet to Bio.Motif, it would have to move to Bio.WWW.

Currently most modules are organized by theme (Bio.Seq, Bio.Motif, Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one module, one chapter in the documentation, one test of unit tests, one set of doctests, which I think is a huge advantage both in terms of clarity and in terms of user experience.

Best,
-Michiel.

--- On Wed, 1/30/13, Lenna Peterson <arklenna at gmail.com> wrote:

From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] Namespace for online resources?
To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
Date: Wednesday,
 January 30, 2013, 12:10 PM

Michiel,?
You raise an excellent point that separating the modules in this way will complicate doctests.?
Regarding point (2), is your primary concern namespace clutter or importing efficiency??


I still maintain that the category of internet access is more fundamental than the category of parsers. For point (1), if every database is accessed using a WWW submodule, a user will know to look there.

Obviously moving everything would be a lot of work...
Cheers,?
Lenna


On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:


1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?


2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.


3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:


>>> from Bio import Entrez

>>> Entrez.email = "Your.Name.Here at example.org"

>>> handle = Entrez.einfo() # or esearch, efetch, ...

>>> record = Entrez.read(handle)

>>> handle.close()


The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.


Best,

-Michiel.


--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:


> From: Peter Cock <p.j.a.cock at googlemail.com>

> Subject: Re: [Biopython-dev] Namespace for online resources?

> To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>

> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>

> Date: Tuesday, January 29, 2013, 4:11 PM

> On Tue, Jan 29, 2013 at 9:03 PM,

> Peter Cock <p.j.a.cock at googlemail.com>

> wrote:

> > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto

> > <w.arindrarto at gmail.com>

> wrote:

> >> Hi everyone,

> >>

> >> Why was Bio.WWW deprecated in the first place?

> >>

> >

> > The flippant answer is everything under Bio.WWW was

> moved

> > or deprecated:

> > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html

> >

> > I'm trying to identify the discussions prior to that

> covering the moves:

> >

> > Bio.WWW.ExPASy -> Bio.ExPASy

> > Bio.WWW.InterPro -> Bio.InterPro

> > Bio.WWW.NCBI -> Bio.Entrez

> > Bio.WWW.SCOP -> Bio.SCOP

>

> Probably this thread,

> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html

>

> Also a bit more background on the NCBI Entrez side:

> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html

>

> Peter

> _______________________________________________

> Biopython-dev mailing list

> Biopython-dev at lists.open-bio.org

> http://lists.open-bio.org/mailman/listinfo/biopython-dev

>

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Fri Feb  1 09:14:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:14:56 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>

On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Lenna,
>
>> Regarding point (2), is your primary concern namespace clutter or
>> importing efficiency?
>
> Regarding point (2), my primary concern is that a Bio.WWW module would
> group together modules that don't have much in common with each other. I
> agree to your point that the category of internet access is more fundamental
> than the category of parsers. But still, which modules should then go into a
> Bio.WWW module? Any module whose sole purpose is to use the internet (that
> would exclude Bio.Entrez)? Any module whose main purpose is to use the
> internet? This would be unclear; for example, Bio.Entrez may or may not fall
> in that category, depending on how you use the module. Any module whose
> functionality includes internet access? Then if one day we add access to the
> JASPAR database over the internet to Bio.Motif, it would have to move to
> Bio.WWW.
>
> Currently most modules are organized by theme (Bio.Seq, Bio.Motif,
> Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one
> module, one chapter in the documentation, one test of unit tests, one set of
> doctests, which I think is a huge advantage both in terms of clarity and in
> terms of user experience.

Also with the theme approach, most (if not all) the themes are likely to
have some online resources (databases or remote APIs). On those
grounds it makes sense to keep online motif functionality (like weblogo)
under Bio.Motif, and so on.

People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
(which could be a big disruption with lots of code relocation)

People leaning against a Bio.WWW grouping: Michiel, Peter (me)
(which would also be the status quo, so no disruption).

In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.

Regards,

Peter

From mjldehoon at yahoo.com  Fri Feb  1 09:12:38 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 06:12:38 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>
Message-ID: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Peter,

As we misunderstood each other, let me try once to make the case for putting test files in Bio/*. If I fail to convince you, let's either go back to the situation before Bio._utils, or remove the "if __name__ == '__main__':" stuff altogether.

First of all, if we use "if __name__ == '__main__':" to run the docstring tests, then those tests should pass if a user executes the script. Otherwise, we have installed some code that makes no sense outside of the distribution. This is also a problem with the os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
solution, as after installation there is no Tests/ folder any more.

Suppose we make a subdirectory Examples in each module that uses docstring tests which need some data files, and put the data files in the Examples subdirectory. The docstring tests are supposed to be simple (full testing is done by the unittests), so the example data files can be tiny.

The docstring tests can then use
>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta")
which is simple enough.
The unit tests can switch to the appropriate directory when running the docstring tests.
A user, finding the example in the docstring tests, can try out the example directly, since the data file is provided together with the relevant module.
And since the data file is in the subdirectory Examples/, there is still some separation between the code and the data.

Best,
-Michiel.

--- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 7:14 AM
> >Wibowo Arindrarto <w.arindrarto at gmail.com>
> wrote:
> >> Just to be clear, we are:
> >>
> >> * changing all module's doctest file path to use
> relative
> >> paths (with
> >> respect to the module's location),
> >> * replacing the run_doctest() import with a simpler
> doctest
> >> import and
> >> `doctest.testmod()` in each module having this
> doctest
> >> * resorting to setUp and tearDown in the
> DocTestSuite in
> >> `run_tests.py` so that each module / submodule can
> find
> >> their test
> >> files
> >> * and refactoring all string functions in
> Bio._utils to
> >> Bio.Phylo and
> >> Bio.SearchIO, so that we can remove Bio._utils,
> >>
> >> right?
> >>
> >> I'd be happy to give this a shot if everyone feels
> the same
> >> :).
> >>
> 
> On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Bow,
> >
> > Yes, that is correct.
> > Responding to Peter's email: Peter, do you agree with
> this approach?
> >
> > Best,
> > -Michiel.
> 
> No. I think we have misunderstood each other on the doctest
> discussion :(
> 
> If we keep the test files under Tests/ (and I think that is
> best)
> then for example look at this doctest in Bio/SeqRecord.py
> 
> ? ? ? ? >>> from Bio import
> SeqIO
> ? ? ? ? >>> record =
> SeqIO.read(open("Fasta/sweetpea.nu"),"fasta")
> ? ? ? ? >>> len(record)
> ? ? ? ? 309
> 
> That is currently written to assume it is run from the
> Tests/
> folder. If we write this assuming is it in the Bio/ folder
> where
> the Python file SeqRecord.py lives, it becomes:
> 
> ? ? ? ? >>> from Bio import
> SeqIO
> ? ? ? ? >>> record =
> SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta")
> ? ? ? ? >>> len(record)
> ? ? ? ? 309
> 
> I think a beginner would find that more confusing. It is
> also longer
> and we already have trouble with some lines exceeding 80
> chars.
> 
> Ideally there would be a nice way for doctests to specify
> the folder,
> and then we could use a simple filename like "sweetpea.nu"
> with
> no directories at all. But I don't think that is possible
> without us
> making the testing infrastructure even more complicated.
> 
> --
> 
> If we want to get rid of Bio._utils.run_doctest() (and the
> whole of
> the file Bio/_utils.py) then I would prefer reverting to the
> old situation
> prior to adding the Bio._utils.run_doctest() helper
> function.
> 
> If the repetitive code snippets to run the doctests of a
> module are a
> problem it can be shortened to something less flexible, for
> example
> in Bio/SeqRecord.py could use something very short like
> this:
> 
> if __name__ == "__main__":
> ? ? assert os.path.isfile("Fasta/sweetpea.nu"),
> "Run from Tests/ folder"
> ? ? import doctest
> ? ? doctest.testmod(verbose=2)
> 
> Or, as I suggested before, we can remove these development
> convenience hooks completely?
> 
> --
> 
> On the subject of the string functions in Bio/_utils.py, I
> have no
> objection to moving them back under Bio.SearchIO and/or
> Bio.Phylo - which has advantages in terms of modularity (a
> good thing for preventing accidental side effects).
> 
> Regards,
> 
> Peter
> 


From p.j.a.cock at googlemail.com  Fri Feb  1 09:32:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:32:46 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>
	<1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4amfbk6tDLva0eW1TqixF4wNSSoEqx+yE8xJNzT-UYHg@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:12 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> As we misunderstood each other, let me try once to make the case for
> putting test files in Bio/*. If I fail to convince you, let's either go back
> to the situation before Bio._utils, or remove the "if __name__ ==
> '__main__':" stuff altogether.

I'm not convinced about putting test files under Bio/* so lets revert
the use of the helper function Bio._utils.run_doctest(), and if you
wish proceed with removing Bio/_utils.py as well.

Shall I go ahead and revert 8b59d89bb4e282192ddee751e24ceef4afa63528
then remove run_doctest and find_test_dir from Bio/_utils.py now?

> First of all, if we use "if __name__ == '__main__':" to run the docstring
> tests, then those tests should pass if a user executes the script.
> Otherwise, we have installed some code that makes no sense outside of the
> distribution. This is also a problem with the
> os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
> solution, as after installation there is no Tests/ folder any more.

That is a good point, this has always been a weakness of the __main__
hook to run the doctests.

> Suppose we make a subdirectory Examples in each module that uses docstring
> tests which need some data files, and put the data files in the Examples
> subdirectory. The docstring tests are supposed to be simple (full testing is
> done by the unittests), so the example data files can be tiny.
>
> The docstring tests can then use
>>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta")
> which is simple enough.
> The unit tests can switch to the appropriate directory when running the
> docstring tests.
> A user, finding the example in the docstring tests, can try out the
> example directly, since the data file is provided together with the relevant
> module.
> And since the data file is in the subdirectory Examples/, there is still
> some separation between the code and the data.

Did you envision installing the examples subdirectories next to the code
under site-packages? Technically that is doable, but I'm not sure if that
is considered good practice (does anyone know the relevant Debian
policies for example - they're quite keen on this kind of thing?).

I much prefer the simplicity of having all the test files in one place
(under the Tests/ folder) especially as things like simple FASTA files
get used in doctests and unittests for multiple different areas of
Biopython.

Regards,

Peter

From p.j.a.cock at googlemail.com  Fri Feb  1 09:56:02 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:56:02 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
	<1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter and all,
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> We need to say something about this in the NEWS file too.
>
> Done.
>
>> I think it would make sense to add a PendingDeprecationWarning
>> to Bio.Motif now.
>
> Done.

Thanks.

>> Also, if you feel the new Bio.motifs API isn't quite
>> settled yet, adding the new BiopythonExperimentalWarning to
>> that makes sense.
>
> I don't expect big changes in the API, so I think we can do without the
> BiopythonExperimentalWarning. Also we should avoid the situation
> that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a
> BiopythonExperimentalWarning.

Agreed.

>> (And once this is settled, I think we can schedule the
>> release)
>
> We should also check whether we can remove deprecated modules,
> or deprecate modules that are currently declared obsolete. Or has
> somebody done that already?

I went over the list in the DEPRECATED file last month, but a second
check would be a good idea.

Peter

From mjldehoon at yahoo.com  Fri Feb  1 09:53:06 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 06:53:06 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
Message-ID: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Peter and all,

--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> We need to say something about this in the NEWS file too.

Done.

> I think it would make sense to add a PendingDeprecationWarning
> to Bio.Motif now.

Done.

> Also, if you feel the new Bio.motifs API isn't quite
> settled yet, adding the new BiopythonExperimentalWarning to
> that makes sense.

I don't expect big changes in the API, so I think we can do without the BiopythonExperimentalWarning. Also we should avoid the situation that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a BiopythonExperimentalWarning.

> (And once this is settled, I think we can schedule the
> release)

We should also check whether we can remove deprecated modules, or deprecate modules that are currently declared obsolete. Or has somebody done that already?

Best,
-Michiel

From p.j.a.cock at googlemail.com  Fri Feb  1 10:03:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:03:24 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
Message-ID: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>

Hello all,

I think we're overdue for a Biopython release now, and I would
like to do this next week. There are still plenty more additions
and enhancements waiting in the wings, but right now I just
want any remaining bug fixes addressed.

Are there any release blocking issues?

Thanks,

Peter

From w.arindrarto at gmail.com  Fri Feb  1 10:29:09 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 16:29:09 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>

Hi Peter,

> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?

There's still one bug for Bio.SearchIO that I would prefer to be fixed
(https://redmine.open-bio.org/issues/3400). Is it possible to wait a
few more days (no later than next week I hope) to sort this bug out?

Also, since this is our first release with the
BiopythonExperimentalWarning, I was thinking maybe we can include some
modules that have been in the waiting line. One that I can think of
right now is Andrew's MafIO (re: the recent mention as well).
Considering that some people have started using it, maybe we can
release it under a BiopythonExperimentalWarning.

And later down the line, perhaps we can include Brad's GTF/GFF parser
(seeing that this is already included in the wiki, maybe it's a good
time to start considering where to put it)? Brad, if you don't mind,
perhaps we can start working on this as well.

Regards,
Bow

From p.j.a.cock at googlemail.com  Fri Feb  1 10:40:03 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:40:03 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
Message-ID: <CAKVJ-_5UvtrCVkHVYqDdw7xwbX_h7cqPMKsV9UAHJWJY1NM7Mw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:29 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
>
>> I think we're overdue for a Biopython release now, and I would
>> like to do this next week. There are still plenty more additions
>> and enhancements waiting in the wings, but right now I just
>> want any remaining bug fixes addressed.
>>
>> Are there any release blocking issues?
>
> There's still one bug for Bio.SearchIO that I would prefer to be fixed
> (https://redmine.open-bio.org/issues/3400). Is it possible to wait a
> few more days (no later than next week I hope) to sort this bug out?

A few days sure - but that is a small enough issue (and in a clearly
marked 'here be dragons experimental code' section) that I don't think
it should delay the whole release.

> Also, since this is our first release with the
> BiopythonExperimentalWarning, I was thinking maybe we can include some
> modules that have been in the waiting line. One that I can think of
> right now is Andrew's MafIO (re: the recent mention as well).
> Considering that some people have started using it, maybe we can
> release it under a BiopythonExperimentalWarning.
>
> And later down the line, perhaps we can include Brad's GTF/GFF parser
> (seeing that this is already included in the wiki, maybe it's a good
> time to start considering where to put it)? Brad, if you don't mind,
> perhaps we can start working on this as well.

Both examples of things I would like to do *after* shipping
Biopython 1.61, which I feel is already overdue.

Regards,

Peter

From mjldehoon at yahoo.com  Fri Feb  1 10:39:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 07:39:15 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>
Message-ID: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> I went over the list in the DEPRECATED file last month, but
> a second check would be a good idea.

The following were declared obsolete in Biopython 1.60, and can in principle be declared deprecated in Biopython 1.61:

----------
Bio/Blast/Applications.py:
BlastallCommandline
BlastpgpCommandline
RpsBlastCommandline

Bio/Blast/NCBIStandalone.py overall, and specifically:
blastall
blastpgp
rpsblast

Bio/ParserSupport.py overall

Bio/PDB/AbstractPropertyMap.py:
The has_key function in class AbstractPropertyMap

Bio/PDB/FragmentMapper.py:
The has_key function in class FragmentMapper

Bio/UniGene/UniGene.py overall

In BioSQL/BioSeqDatabase.py:
  class DBServer:
     remove_database
  class BioSeqDatabase:
     get_all_primary_ids
     get_Seq_by_primary_id

-----------

These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:

Bio/Align/__init__.py:
  class MultipleSeqAlignment:
     get_column
     add_sequence

Bio/Align/Generic.py:
  class Alignment overall
    get_all_seqs
    get_seq_by_num

Bio/File.py:
  class StringHandle

Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
  class AbstractDrawer:
    _set_xcentre, _set_ycentre

Bio/Graphics/GenomeDiagram/_Graph.py:
  class GraphData:
    _set_centre

Bio/ParserSupport.py:
  SGMLStrippingConsumer

Bio/Seq.py:
  class Seq:
     .data property

Bio/SeqIO/SffIO.py:
  _sff_read_roche_index_xml
 
--------------------

The tostring() method of the class Seq in Bio/Seq.py:
Can we declare this obsolete?

-Michiel

From w.arindrarto at gmail.com  Fri Feb  1 10:47:14 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 16:47:14 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <510BE201.4090002@biotech.uni-tuebingen.de>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>

Hi Peter, Kai,


>> There's still one bug for Bio.SearchIO that I would prefer to be
>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
>> wait a few more days (no later than next week I hope) to sort this
>> bug out?
>
> Sorry for letting this slip for so long, but I never got around to
> write an actual test case.
>
> Bow, did we agree to use optionalcascade for now and then maybe
> refactor? I'm pretty confident the code works as-is, all the BioPython
> issues I've been running into with my production site have been in the
> GenBank/EMBL parsers. :)

Yes, we did :). I meant to do the optionalcascade refactor so the code
is more maintainable (and to prevent a corner case bug). But in
general, the optionalcascade fix looks to be fine. And for code marked
with the BiopythonExperimentalWarning, having a fix without test cases
seems better than no fix at all.

Peter, if you're fine with Kai's fix, I think we can mark this bug
solved and go on with the release. I'll add the test cases and
refactor the code later on.

Regards,
Bow

From p.j.a.cock at googlemail.com  Fri Feb  1 10:51:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:51:07 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>
	<1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7XZ1F9aq7b+DAKyDCr95rAtRBw7aNFqjPksdQJCzJAdw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I went over the list in the DEPRECATED file last month, but
>> a second check would be a good idea.
>
> The following were declared obsolete in Biopython 1.60, and can
> in principle be declared deprecated in Biopython 1.61:
>
> ----------
> Bio/Blast/Applications.py:
> BlastallCommandline
> BlastpgpCommandline
> RpsBlastCommandline

My impression is there is still a sizeable group of people still using
blastall and the rest of legacy BLAST as it is mature reliable code,
while BLAST+ still has some rough edges. But as the NCBI themselves
have now stopped updating legacy BLAST, perhaps the time has come.

So if you want, deprecating the legacy BLAST wrappers seems OK.

> Bio/Blast/NCBIStandalone.py overall, and specifically:
> blastall
> blastpgp
> rpsblast

Given the SearchIO use of the plain text BLAST parser, I think we
agreed to leave that as is in the short term.

The command line calling functions blastall, blastpgp & rpsblast
the same applies as for BlastallCommandline, BlastpgpCommandline
and RpsBlastCommandline (above).

> Bio/ParserSupport.py overall

Given the SearchIO use of the plain text BLAST parser which uses
this, I think we agreed to leave that as is in the short term.

> Bio/PDB/AbstractPropertyMap.py:
> The has_key function in class AbstractPropertyMap
>
> Bio/PDB/FragmentMapper.py:
> The has_key function in class FragmentMapper

The Python dict lost the has_key function in Python 3, so it does
make sense to proceed with those deprecations.

> Bio/UniGene/UniGene.py overall
>

Yes, ready to deprecate.

> In BioSQL/BioSeqDatabase.py:
>   class DBServer:
>      remove_database
>   class BioSeqDatabase:
>      get_all_primary_ids
>      get_Seq_by_primary_id

Yes, ready to deprecate.

Thanks,

Peter

From p.j.a.cock at googlemail.com  Fri Feb  1 11:02:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:02:33 +0000
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
Message-ID: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:40 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
> PS: I'd have replied on the bug tracker, but for some reason I can't
> log in again, even after resetting the password. For some reason
> redmine hates me.
>

Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/
(it was left as a read only legacy listing, but it broke last year when
the old server started to die and isn't really worth fixing).

This was moved over to RedMine, along with all the other OBF
projects. This does have some git integration, but I'm not that
taken with it - and it is yet another service for the OBF team
to maintain.

What do people think of moving over to using GitHub issues?
This would link in very well with pull requests and makes linking
to commits much simpler too. One potential issue is if and how
we could have bug reports sent to the biopython-dev mailing list
(something we touched on recently for pull requests).

A full automated move could be possible (NumPy did this), but I
think a gradual move would be fine - stop filing new issues on
RedMine and use GitHub issues in future. There are only about
100 issues open at the moment anyway, and a manual migration
would also be a good way to review some of the older tickets.

Thoughts?,

Peter

From p.j.a.cock at googlemail.com  Fri Feb  1 11:04:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:04:10 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
Message-ID: <CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:47 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter, Kai,
>
>
>>> There's still one bug for Bio.SearchIO that I would prefer to be
>>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
>>> wait a few more days (no later than next week I hope) to sort this
>>> bug out?
>>
>> Sorry for letting this slip for so long, but I never got around to
>> write an actual test case.
>>
>> Bow, did we agree to use optionalcascade for now and then maybe
>> refactor? I'm pretty confident the code works as-is, all the BioPython
>> issues I've been running into with my production site have been in the
>> GenBank/EMBL parsers. :)
>
> Yes, we did :). I meant to do the optionalcascade refactor so the code
> is more maintainable (and to prevent a corner case bug). But in
> general, the optionalcascade fix looks to be fine. And for code marked
> with the BiopythonExperimentalWarning, having a fix without test cases
> seems better than no fix at all.

That sounds OK for now.

> Peter, if you're fine with Kai's fix, I think we can mark this bug
> solved and go on with the release. I'll add the test cases and
> refactor the code later on.

You mean this patch from https://redmine.open-bio.org/issues/3400 ?:
https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch

I can apply that if you want.

Peter

From redmine at redmine.open-bio.org  Fri Feb  1 11:04:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 1 Feb 2013 16:04:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3405] (New) to_networkx converts
	weights as string
Message-ID: <redmine.issue-3405.20130201160420@redmine.open-bio.org>


Issue #3405 has been reported by Aleksey Kladov.

----------------------------------------
Bug #3405: to_networkx converts weights as string
https://redmine.open-bio.org/issues/3405

Author: Aleksey Kladov
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


in the file /Bio/Phylo/_utils.py in the method add_edge(graph, n1, n2) there is a line

<pre> graph.add_edge(n1, n2, weight=str(n2.branch_length or 1.0)) </pre>


It's strange, because if weights are strings, then you are unable to find shortest paths due to

<pre>TypeError: unsupported operand type(s) for +: 'int' and 'str'</pre>


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 10:40:49 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 16:40:49 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
Message-ID: <510BE201.4090002@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-02-01 16:29, Wibowo Arindrarto wrote:

> There's still one bug for Bio.SearchIO that I would prefer to be
> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
> wait a few more days (no later than next week I hope) to sort this
> bug out?

Sorry for letting this slip for so long, but I never got around to
write an actual test case.

Bow, did we agree to use optionalcascade for now and then maybe
refactor? I'm pretty confident the code works as-is, all the BioPython
issues I've been running into with my production site have been in the
GenBank/EMBL parsers. :)

Cheers,
Kai

PS: I'd have replied on the bug tracker, but for some reason I can't
log in again, even after resetting the password. For some reason
redmine hates me.


- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRC+IBAAoJEKM5lwBiwTTPlisH/1QSF+4jIx2jKycRCys1NPMj
6YwFTdKoGmIDYjEB+qge5PKNIHplN3EsGz6l4bRYZiWbqTlyvb5IUPHgwxFRigXg
VuSnR/k8faSLNuGJpoFezLmZ0yJoLslXztCUJ+HbWXB02K9uzYXovRg8AtfHlnOu
Qd9aNbyX/nzFrsayllTvYy9ZxcQNCH5Lrgm+EWMkuBptcMdBLjqSGkov5iE2g1bV
ItHacrQUPJXVIAMTXW9mSy3AXzTqjOjqfBwXsthLSyXHEv8ppcnIi4bmVX+XS//n
4vc+LdaxzgkENaw4P+60bikkFqey/GFoxaIzLACh4HFupRAjK+6NaUzGYPSEQXM=
=efd0
-----END PGP SIGNATURE-----

From p.j.a.cock at googlemail.com  Fri Feb  1 11:25:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:25:56 +0000
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
In-Reply-To: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
References: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zag-sKzmUE7W-dg86Nqkk4YcBMtr66N2ZZ2OhJgR7Fg@mail.gmail.com>

On Fri, Feb 1, 2013 at 4:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> What do people think of moving over to using GitHub issues?
> This would link in very well with pull requests and makes linking
> to commits much simpler too. One potential issue is if and how
> we could have bug reports sent to the biopython-dev mailing list
> (something we touched on recently for pull requests).
>

I've filled an issue for that (I couldn't find any open issue like it):
https://github.com/gitlabhq/gitlabhq/issues/2884

Peter

From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 11:27:13 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 17:27:13 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <510BEAEF.4070108@biotech.uni-tuebingen.de>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
	<CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
	<510BEAEF.4070108@biotech.uni-tuebingen.de>
Message-ID: <510BECE1.4020306@biotech.uni-tuebingen.de>

On 2013-02-01 17:18, Kai Blin wrote:

> That's not quite it. Let me update my bug3400 branch and submit a
> pull request. Will be ready in a minute.

https://github.com/biopython/biopython/pull/150

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 11:18:55 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 17:18:55 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
	<CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
Message-ID: <510BEAEF.4070108@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-02-01 17:04, Peter Cock wrote:

Hi Peter,

>> Peter, if you're fine with Kai's fix, I think we can mark this
>> bug solved and go on with the release. I'll add the test cases
>> and refactor the code later on.
> 
> You mean this patch from https://redmine.open-bio.org/issues/3400
> ?: 
> https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch
>
>  I can apply that if you want.

That's not quite it. Let me update my bug3400 branch and submit a pull
request. Will be ready in a minute.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRC+rvAAoJEKM5lwBiwTTPYH4H+QGiY5cyN7tFjT2RZGN28Pp8
2t/RbW9bYakVqKHtZR6xXu4QF48jCmHkkER0cMvDuKcWrjko/xAWSGuNqWK59rHe
b7t9CgGywYC9KdhPih+pG5HzKuc9ZP1c2unK/e+c+y8rrFZTUoB1e2AbGqzg163S
qplu0RGv8kSOMXmGVFNj+iZ/AJnN735Tp5gfzFHfudS13kzfqW+Mq1+DlSG1GOwM
Y99kc6Uc5WFHmHME4pDdlLBGyKVd+9LlQnTeApBjWnBDcRBMyXI0HIck6Bw64swH
BvPIz2yq3PEnhvgI0v0A9lO1xR0Yj9wGQGr8XGPLq0UHh0W0O0P1I8YbMCVHkPg=
=kCtp
-----END PGP SIGNATURE-----

From p.j.a.cock at googlemail.com  Fri Feb  1 11:50:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:50:57 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?
>
> Thanks,
>
> Peter

I won't have time to look at it today, but the BLAST+ wrappers
need updating for the BLAST 2.2.27+ release, e.g. new arg
frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).

Any volunteers? This should be a small job...

Peter

From w.arindrarto at gmail.com  Fri Feb  1 12:37:57 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:37:57 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
Message-ID: <CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>

Hi Peter,

>> I think we're overdue for a Biopython release now, and I would
>> like to do this next week. There are still plenty more additions
>> and enhancements waiting in the wings, but right now I just
>> want any remaining bug fixes addressed.
>>
>> Are there any release blocking issues?
>>
>> Thanks,
>>
>> Peter
>
> I won't have time to look at it today, but the BLAST+ wrappers
> need updating for the BLAST 2.2.27+ release, e.g. new arg
> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).
>
> Any volunteers? This should be a small job...

I've submitted a pull request here:
https://github.com/biopython/biopython/pull/151

From w.arindrarto at gmail.com  Fri Feb  1 12:43:23 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:43:23 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
	<CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>
Message-ID: <CADEGkF4AK5npXVtD1LEV=OZFebiDn1O7Yd+qZxdtpPVuer43pg@mail.gmail.com>

> Hi Peter,
>
>>> I think we're overdue for a Biopython release now, and I would
>>> like to do this next week. There are still plenty more additions
>>> and enhancements waiting in the wings, but right now I just
>>> want any remaining bug fixes addressed.
>>>
>>> Are there any release blocking issues?
>>>
>>> Thanks,
>>>
>>> Peter
>>
>> I won't have time to look at it today, but the BLAST+ wrappers
>> need updating for the BLAST 2.2.27+ release, e.g. new arg
>> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).
>>
>> Any volunteers? This should be a small job...
>
> I've submitted a pull request here:
> https://github.com/biopython/biopython/pull/151

Wops, sorry for sending an incomplete mail ~ I wanted to add that some
test_NCBI_BLAST_tools.py doesn't correctly detect my blast
installations (even though I have it). I had to comment out the
"Install BLAST+ ..." notice and the rpsblast test (for some reason it
keeps saying I don't have rpsblast too, even though I do). Anyway,
these are not in the pull request, just something I did when writing
this fix.

Could you confirm that the fixes are OK?

Hope that helps,
Bow

From w.arindrarto at gmail.com  Fri Feb  1 12:48:09 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:48:09 +0100
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
In-Reply-To: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
References: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
Message-ID: <CADEGkF6LAVmLd1SmVp5UNexaWe5irzxD+9NHm2kAvR7r0KmxXA@mail.gmail.com>

>> PS: I'd have replied on the bug tracker, but for some reason I can't
>> log in again, even after resetting the password. For some reason
>> redmine hates me.
>>
>
> Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/
> (it was left as a read only legacy listing, but it broke last year when
> the old server started to die and isn't really worth fixing).
>
> This was moved over to RedMine, along with all the other OBF
> projects. This does have some git integration, but I'm not that
> taken with it - and it is yet another service for the OBF team
> to maintain.
>
> What do people think of moving over to using GitHub issues?
> This would link in very well with pull requests and makes linking
> to commits much simpler too. One potential issue is if and how
> we could have bug reports sent to the biopython-dev mailing list
> (something we touched on recently for pull requests).
>
> A full automated move could be possible (NumPy did this), but I
> think a gradual move would be fine - stop filing new issues on
> RedMine and use GitHub issues in future. There are only about
> 100 issues open at the moment anyway, and a manual migration
> would also be a good way to review some of the older tickets.
>
> Thoughts?,

Moving to GitHub sounds good to me. I'd prefer if we go over the
issues manually (removing the obsolete ones and keeping the current
ones).

As per the bug reports sending to the mailing list, could we perhaps
create our own custom hooks? e.g. anytime a pull request is issued, an
email would be sent (see https://github.com/github/github-services and
http://developer.github.com/v3/repos/hooks/#create-a-hook)

Regards,
Bow

From arklenna at gmail.com  Fri Feb  1 14:05:18 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Fri, 1 Feb 2013 14:05:18 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
Message-ID: <CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>

On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
> (which could be a big disruption with lots of code relocation)
>
> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
> (which would also be the status quo, so no disruption).
>
>
I concede that the potential benefit of refactoring to separate WWW is
outweighed both by potential downsides and the disruption and effort
involved.

In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.
>
>
Populating the top level namespace with a submodule for each web-only
service has the risk of creating too many submodules. Bio.Seq* makes sense,
because the TAIR code pulls data into a Seq. Web services that connect to a
single biopython representation can be organized under that submodule. Web
services that return multiple types of information (e.g. Entrez) are big
enough to logically comprise their own submodule.

Is my interpretation of the biopython classification scheme more or less
correct?

Cheers,

Lenna

From p.j.a.cock at googlemail.com  Fri Feb  1 16:00:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 21:00:10 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
	<CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
Message-ID: <CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>>
>> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
>> (which could be a big disruption with lots of code relocation)
>>
>> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
>> (which would also be the status quo, so no disruption).
>>
>
> I concede that the potential benefit of refactoring to separate WWW is
> outweighed both by potential downsides and the disruption and effort
> involved.
>
>> In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
>> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
>> Bio.Seq* also seems sensible to me, as I wrote at the start of this
>> thread.
>>
>
> Populating the top level namespace with a submodule for each web-only
> service has the risk of creating too many submodules. Bio.Seq* makes sense,
> because the TAIR code pulls data into a Seq. Web services that connect to a
> single biopython representation can be organized under that submodule. Web
> services that return multiple types of information (e.g. Entrez) are big
> enough to logically comprise their own submodule.
>
> Is my interpretation of the biopython classification scheme more or less
> correct?

Yes that sounds about right :)

Of course, the historical muddle of Bio.Seq* is something we've talked
about addressing recently - see this thread from October,
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html

Peter

From natemsutton at yahoo.com  Fri Feb  1 16:54:42 2013
From: natemsutton at yahoo.com (Nate Sutton)
Date: Fri, 1 Feb 2013 13:54:42 -0800 (PST)
Subject: [Biopython-dev] New BioPython member
In-Reply-To: <CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>
References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
	<CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>
Message-ID: <1359755682.16563.YahooMailNeo@web122605.mail.ne1.yahoo.com>

Thanks for the welcome! ?Also, I looked briefly through the code with the files you wrote about and I see the command line app wrapping components you described. ?I appreciate the advice about how the do the wrapper and I am glad to know of that pattern of command line app wrapping that is consistent with code in other places of BioPython. ?Thanks for the other advice including possibly asking for guidance. ?I?ll just give it a shot and hopefully things go smoothly but it being my first BioPython coding I appreciate the support.

Thanks,

Nate


________________________________
 From: Peter Cock <p.j.a.cock at googlemail.com>
To: Nate Sutton <natemsutton at yahoo.com> 
Cc: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
Sent: Wednesday, January 30, 2013 2:31 AM
Subject: Re: [Biopython-dev] New BioPython member
 
On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton <natemsutton at yahoo.com> wrote:
> Dear all,
>
> I just recently joined the BioPython developers group and am
> looking forward to contributing to BioPython!? I have worked for a while
> in programming, genetics, and biology and have
> a m.s. in Biomedical Informatics.? After
> talking with some fellow contributors I have decided to try working on
> https://redmine.open-bio.org/issues/3360 but I will also work on writing
> some documentation on examples from the
> cookbook, especially if I am stuck on the bug.? If anyone wants to work on
> the same things, I?d be glad to hear that, I
> may be slow on the work because I am still learning Python after coming
> from
> other languages.
>
> -Nate

Hi Nate, and welcome.

Eric is in charge of the Bio.Phylo module, but within that the
command line application wrappers under Bio.Phylo.Applications
follow a pattern used elsewhere in Biopython.

To add a wrapper for fasttree http://www.microbesonline.org/fasttree/
have a look at the existing wrappers for PHYML and RAXML, defined in
Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py
(leading underscores mean private modules in Python), which are
exposed to the user via Bio/Phylo/Applications/__init__.py

In this case, I'd suggest putting the new wrapper in a new file,
Bio/Phylo/Applications/_fastree.py

Other similar wrappers existing under Bio.Emboss, Bio.Align, etc.

Don't be shy about asking for guidance on this, or git and github.
Ultimately I'm hoping you'll be able to do is take a fork (personally
copy of the repository) on GitHub, create a new fasttree branch,
commit your enhancements, and make a pull request. If that's
all too much for now, simply writing the new file and letting us
do the git side would be fine.

Regards,

Peter

From k.d.murray.91 at gmail.com  Fri Feb  1 18:59:57 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Sat, 2 Feb 2013 10:59:57 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
	<CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
	<CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>
Message-ID: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>

Hi All,

How about this:
In the vein of Lenna's last email, we create a module WebSeq (or Seq.Web,
or whatever), containing modules whose sole purpose is to get sequences
(Seq/SeqRecord objects) from an internet database. This would i think
provide a good balance between a messy top-level domain full of modules
like Bio.tair, and the absolutisim of having anything vaugly web related in
a single WWW module. It should also provide the unified theme per module
which Michiel talks of, and unit/doctests should be fine, as no modules
will be split (simply moved in their entirety from Bio.x to Bio.WebSeq.x).

>From a quick look, the only candiate (apart from TAIR) for a shift is
TogoWS, and even then I'm not sure, as TogoWS isn't used just for Seq's
(and does not return them).

Regards
Kevin Murray


On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >>
> >>
> >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
> >> (which could be a big disruption with lots of code relocation)
> >>
> >> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
> >> (which would also be the status quo, so no disruption).
> >>
> >
> > I concede that the potential benefit of refactoring to separate WWW is
> > outweighed both by potential downsides and the disruption and effort
> > involved.
> >
> >> In the specific case of Kevin's TAIR code for fetch Arabidopsis
> sequences,
> >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> >> Bio.Seq* also seems sensible to me, as I wrote at the start of this
> >> thread.
> >>
> >
> > Populating the top level namespace with a submodule for each web-only
> > service has the risk of creating too many submodules. Bio.Seq* makes
> sense,
> > because the TAIR code pulls data into a Seq. Web services that connect
> to a
> > single biopython representation can be organized under that submodule.
> Web
> > services that return multiple types of information (e.g. Entrez) are big
> > enough to logically comprise their own submodule.
> >
> > Is my interpretation of the biopython classification scheme more or less
> > correct?
>
> Yes that sounds about right :)
>
> Of course, the historical muddle of Bio.Seq* is something we've talked
> about addressing recently - see this thread from October,
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From mjldehoon at yahoo.com  Fri Feb  1 20:36:03 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 17:36:03 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>
Message-ID: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>

In principle I am OK with this, but is TAIR only used for sequences? Or is it possible / likely that in the future we may want to add other functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, then Bio.Seq.Web is a good option I think.

Best,
-Michiel.

--- On Fri, 2/1/13, Kevin Murray <k.d.murray.91 at gmail.com> wrote:

> From: Kevin Murray <k.d.murray.91 at gmail.com>
> Subject: Re: [Biopython-dev] Namespace for online resources?
> To: "Peter Cock" <p.j.a.cock at googlemail.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 6:59 PM
> Hi All,
> 
> How about this:
> In the vein of Lenna's last email, we create a module WebSeq
> (or Seq.Web,
> or whatever), containing modules whose sole purpose is to
> get sequences
> (Seq/SeqRecord objects) from an internet database. This
> would i think
> provide a good balance between a messy top-level domain full
> of modules
> like Bio.tair, and the absolutisim of having anything vaugly
> web related in
> a single WWW module. It should also provide the unified
> theme per module
> which Michiel talks of, and unit/doctests should be fine, as
> no modules
> will be split (simply moved in their entirety from Bio.x to
> Bio.WebSeq.x).
> 
> >From a quick look, the only candiate (apart from TAIR)
> for a shift is
> TogoWS, and even then I'm not sure, as TogoWS isn't used
> just for Seq's
> (and does not return them).
> 
> Regards
> Kevin Murray
> 
> 
> On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> 
> > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com>
> wrote:
> > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > > wrote:
> > >>
> > >>
> > >> People leaning for a Bio.WWW grouping: Bow,
> Lenna, Kevin
> > >> (which could be a big disruption with lots of
> code relocation)
> > >>
> > >> People leaning against a Bio.WWW grouping:
> Michiel, Peter (me)
> > >> (which would also be the status quo, so no
> disruption).
> > >>
> > >
> > > I concede that the potential benefit of
> refactoring to separate WWW is
> > > outweighed both by potential downsides and the
> disruption and effort
> > > involved.
> > >
> > >> In the specific case of Kevin's TAIR code for
> fetch Arabidopsis
> > sequences,
> > >> Bio.TAIR (lower case?) is consistent with
> current usage. Somewhere under
> > >> Bio.Seq* also seems sensible to me, as I wrote
> at the start of this
> > >> thread.
> > >>
> > >
> > > Populating the top level namespace with a
> submodule for each web-only
> > > service has the risk of creating too many
> submodules. Bio.Seq* makes
> > sense,
> > > because the TAIR code pulls data into a Seq. Web
> services that connect
> > to a
> > > single biopython representation can be organized
> under that submodule.
> > Web
> > > services that return multiple types of information
> (e.g. Entrez) are big
> > > enough to logically comprise their own submodule.
> > >
> > > Is my interpretation of the biopython
> classification scheme more or less
> > > correct?
> >
> > Yes that sounds about right :)
> >
> > Of course, the historical muddle of Bio.Seq* is
> something we've talked
> > about addressing recently - see this thread from
> October,
> > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From k.d.murray.91 at gmail.com  Sat Feb  2 01:00:34 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Sat, 2 Feb 2013 17:00:34 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>
	<1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAH80STXNGY8mxs2Zr4SF5rTaBCzH0b1bHvMJqfpLRJ-Y0sQLSA@mail.gmail.com>

Michiel,
TAIR (http://www.arabidopsis.org/) is primarily a sequence repository. I
have no intention to extend it beyond that, and any other features would
not be easily scriptable, or would be pointless to include in Biopython.

Regards
Kevin Murray


On 2 February 2013 12:36, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> In principle I am OK with this, but is TAIR only used for sequences? Or is
> it possible / likely that in the future we may want to add other
> functionality to TAIR? Anyway, if TAIR is predominantly used for sequences,
> then Bio.Seq.Web is a good option I think.
>
> Best,
> -Michiel.
>
> --- On Fri, 2/1/13, Kevin Murray <k.d.murray.91 at gmail.com> wrote:
>
> > From: Kevin Murray <k.d.murray.91 at gmail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Peter Cock" <p.j.a.cock at googlemail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Friday, February 1, 2013, 6:59 PM
> > Hi All,
> >
> > How about this:
> > In the vein of Lenna's last email, we create a module WebSeq
> > (or Seq.Web,
> > or whatever), containing modules whose sole purpose is to
> > get sequences
> > (Seq/SeqRecord objects) from an internet database. This
> > would i think
> > provide a good balance between a messy top-level domain full
> > of modules
> > like Bio.tair, and the absolutisim of having anything vaugly
> > web related in
> > a single WWW module. It should also provide the unified
> > theme per module
> > which Michiel talks of, and unit/doctests should be fine, as
> > no modules
> > will be split (simply moved in their entirety from Bio.x to
> > Bio.WebSeq.x).
> >
> > >From a quick look, the only candiate (apart from TAIR)
> > for a shift is
> > TogoWS, and even then I'm not sure, as TogoWS isn't used
> > just for Seq's
> > (and does not return them).
> >
> > Regards
> > Kevin Murray
> >
> >
> > On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >
> > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com>
> > wrote:
> > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <
> p.j.a.cock at googlemail.com>
> > > > wrote:
> > > >>
> > > >>
> > > >> People leaning for a Bio.WWW grouping: Bow,
> > Lenna, Kevin
> > > >> (which could be a big disruption with lots of
> > code relocation)
> > > >>
> > > >> People leaning against a Bio.WWW grouping:
> > Michiel, Peter (me)
> > > >> (which would also be the status quo, so no
> > disruption).
> > > >>
> > > >
> > > > I concede that the potential benefit of
> > refactoring to separate WWW is
> > > > outweighed both by potential downsides and the
> > disruption and effort
> > > > involved.
> > > >
> > > >> In the specific case of Kevin's TAIR code for
> > fetch Arabidopsis
> > > sequences,
> > > >> Bio.TAIR (lower case?) is consistent with
> > current usage. Somewhere under
> > > >> Bio.Seq* also seems sensible to me, as I wrote
> > at the start of this
> > > >> thread.
> > > >>
> > > >
> > > > Populating the top level namespace with a
> > submodule for each web-only
> > > > service has the risk of creating too many
> > submodules. Bio.Seq* makes
> > > sense,
> > > > because the TAIR code pulls data into a Seq. Web
> > services that connect
> > > to a
> > > > single biopython representation can be organized
> > under that submodule.
> > > Web
> > > > services that return multiple types of information
> > (e.g. Entrez) are big
> > > > enough to logically comprise their own submodule.
> > > >
> > > > Is my interpretation of the biopython
> > classification scheme more or less
> > > > correct?
> > >
> > > Yes that sounds about right :)
> > >
> > > Of course, the historical muddle of Bio.Seq* is
> > something we've talked
> > > about addressing recently - see this thread from
> > October,
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> > >
> > > Peter
> > > _______________________________________________
> > > Biopython-dev mailing list
> > > Biopython-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> > >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>

From eric.talevich at gmail.com  Sat Feb  2 17:29:57 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 2 Feb 2013 17:29:57 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
Message-ID: <CAMC681=rosLhOPyi3bmaUaj+WC_M573VY4YPvmvdjwu1YsTiLw@mail.gmail.com>

On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Lenna,
> >
> >> Regarding point (2), is your primary concern namespace clutter or
> >> importing efficiency?
> >
> > Regarding point (2), my primary concern is that a Bio.WWW module would
> > group together modules that don't have much in common with each other. I
> > agree to your point that the category of internet access is more
> fundamental
> > than the category of parsers. But still, which modules should then go
> into a
> > Bio.WWW module? Any module whose sole purpose is to use the internet
> (that
> > would exclude Bio.Entrez)? Any module whose main purpose is to use the
> > internet? This would be unclear; for example, Bio.Entrez may or may not
> fall
> > in that category, depending on how you use the module. Any module whose
> > functionality includes internet access? Then if one day we add access to
> the
> > JASPAR database over the internet to Bio.Motif, it would have to move to
> > Bio.WWW.
> >
> > Currently most modules are organized by theme (Bio.Seq, Bio.Motif,
> > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one
> > module, one chapter in the documentation, one test of unit tests, one
> set of
> > doctests, which I think is a huge advantage both in terms of clarity and
> in
> > terms of user experience.
>
> Also with the theme approach, most (if not all) the themes are likely to
> have some online resources (databases or remote APIs). On those
> grounds it makes sense to keep online motif functionality (like weblogo)
> under Bio.Motif, and so on.
>

I agree.
>From an engineering perspective, it's usually best to organize code around
data types. (To be clear: think classes and structures, not ints and
strings.) The SeqIO, AlignIO, SearchIO, Phylo, Motif, PDB, etc. modules
each have a core data type that serves as the "theme" for the sub-package.
Within the sub-package we can have modules for different file formats, data
transformations/manipulations, web servers, and command-line program
wrappers, and keep all the interdependencies within the same small region
of the code base. Since most users will not read the documentation in its
entirety (if at all), this also makes it easier to look up how to do things
with the data type in question.

The core data type for a WWW module would be a network handle, I suppose --
but that's already part of the Python standard library.

I've suggested before that we can justify the current placement of
sequence-related modules at the top level, rather than under a new "Seq"
sub-package, by considering sequences to be the default/implicit data type.
As we've covered, many online resources can serve up several different data
types, although sequences are probably the most common. In terms of
namespace clutter, perhaps I've gotten too used to R, but I don't think
we've reached the point where the number of modules and functions visible
from the top level harms the user experience.


In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.
>

Bio.TAIR or Bio.Seq.TAIR or perhaps Bio.Seq.WWW.TAIR seem sensible to me,
too. No preference on casing.

-Eric

From p.j.a.cock at googlemail.com  Mon Feb  4 07:01:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 12:01:49 +0000
Subject: [Biopython-dev] Deprecations for Biopython 1.61 release;
	Was: Bio.Motif update
Message-ID: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I went over the list in the DEPRECATED file last month, but
>> a second check would be a good idea.
>
> The following were declared obsolete in Biopython 1.60, and can
> in principle be declared deprecated in Biopython 1.61:
>
> ----------
> Bio/Blast/Applications.py:
> BlastallCommandline
> BlastpgpCommandline
> RpsBlastCommandline
>
> Bio/Blast/NCBIStandalone.py overall, and specifically:
> blastall
> blastpgp
> rpsblast
>
> Bio/ParserSupport.py overall
>
> Bio/PDB/AbstractPropertyMap.py:
> The has_key function in class AbstractPropertyMap
>
> Bio/PDB/FragmentMapper.py:
> The has_key function in class FragmentMapper
>
> Bio/UniGene/UniGene.py overall
>
> In BioSQL/BioSeqDatabase.py:
>   class DBServer:
>      remove_database
>   class BioSeqDatabase:
>      get_all_primary_ids
>      get_Seq_by_primary_id
>
> -----------
>
> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:
>
> Bio/Align/__init__.py:
>   class MultipleSeqAlignment:
>      get_column
>      add_sequence
>
> Bio/Align/Generic.py:
>   class Alignment overall
>     get_all_seqs
>     get_seq_by_num
>
> Bio/File.py:
>   class StringHandle
>
> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
>   class AbstractDrawer:
>     _set_xcentre, _set_ycentre
>
> Bio/Graphics/GenomeDiagram/_Graph.py:
>   class GraphData:
>     _set_centre
>
> Bio/ParserSupport.py:
>   SGMLStrippingConsumer
>
> Bio/Seq.py:
>   class Seq:
>      .data property
>
> Bio/SeqIO/SffIO.py:
>   _sff_read_roche_index_xml
>
> --------------------
>
> The tostring() method of the class Seq in Bio/Seq.py:
> Can we declare this obsolete?
>
> -Michiel

Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done:
https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1

Bio/File.py and Bio/ParserSupport.py bits done:
https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a

GenomeDiagram centre setters done:
https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288

Peter

From ben at bendmorris.com  Mon Feb  4 10:17:36 2013
From: ben at bendmorris.com (Ben Morris)
Date: Mon, 4 Feb 2013 10:17:36 -0500
Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo
In-Reply-To: <CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>
References: <CAAzEd5AvRgkr=UYmqwHPH+cBYXCS+5yLHs=bHjCDxN1rY_aGFg@mail.gmail.com>
	<CAMC681=OrHJmfEbxWz=8-qzo2rEVJaqFeqgihiAMVi6No7GBCw@mail.gmail.com>
	<CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
	<CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>
Message-ID: <CAAzEd5CWOJ57YHEAw2-LXBVJ6oc_XHiU3fqJMLHj_jwS9edVhg@mail.gmail.com>

On Fri, Jan 18, 2013 at 8:20 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris <ben at bendmorris.com> wrote:
>>
>> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> >
>> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I've implemented support for two new phylogenetic tree formats: NeXML
>> >> and
>> >> RDF (conforming to the Comparative Data Analysis Ontology).
>> >>
>> >> I noticed that NeXML support was planned, but I didn't see anyone
>> >> working
>> >> on it on GitHub and the feature request hadn't been updated in about a
>> >> year, so I went ahead and implemented a simple version. At first I
>> >> tried
>> >> the generateDS.py approach, but the generated writer doesn't give very
>> >> much
>> >> control over the output, so I ended up writing my own parser/writer
>> >> using
>> >> ElementTree.
>> >>
>> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported
>> >> by
>> >> any other phylogenetic libraries, so I'm not sure how useful this is to
>> >> everyone else. It provides a simple, standards-compliant format that
>> >> can be
>> >> imported to a triple store and supports annotation. We'll be using it
>> >> at
>> >> NESCent so I wanted to make it available to everyone else as well. The
>> >> parser and writer require the Redlands Python bindings.
>> >>
>> >> The code is available in my fork of Biopython,
>> >>
>> >>     https://github.com/bendmorris/biopython
>> >>
>> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts
>> >> and
>> >> see if these contributions would be a good fit for the Biopython
>> >> project.
>> >
>> >
>> >
>> > Thanks for letting us know! I'll try it out soonish. Looking at the code
>> > on your nexml branch, I have a few comments:
>> >
>> > - The parser uses ElementTree.parse rather than iterparse, so in its
>> > current state it would not be able to parse massive files (those larger than
>> > available RAM). Worth fixing eventually?
>>
>> Great point. I rewrote it to use iterparse instead.
>>
>> > - The parser creates Newick.Tree and Newick.Clade objects, which is
>> > nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and
>> > BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you
>> > don't have any additional attributes to attach to those classes at the
>> > moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and
>> > PhyloXMLIO.py.)
>>
>> Went ahead and did this as well.
>
>
> Thanks! Sorry for the pace of this, I'm in the midst of a dissertation.
>
>
>> > - The 'confidence' or 'confidences' attribute isn't used (for e.g.
>> > bootstrap support values). Does NeXML define it?
>>
>> Not that I'm aware of, but I'm not sure. I searched
>> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything.
>> I'm going to ask some people who know more about this than I do.
>
>
> I would like for Bio.Phylo's I/O modules to be able to successfully
> round-trip a file from Newick to phyloXML to NeXML and back to Newick
> without losing support values. I found these two examples of how to add this
> data to a NeXML document by referencing CDAO:
> https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag
> https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements
>
> That's the standard way to store bootstrap supports in NeXML (Hilmar
> confirms). How do your NeXML and CDAO modules interact, if at all? Would the
> CDAO modules be useful to properly support NeXML metadata like
> support/confidence values, or would it be simpler to just hard-code the few
> tags we're specifically interested in?
>
> Relatedly, those look like good test files. I see you've started writing
> NeXML unit tests already; if you would like help with any of this, just let
> me know.
>
> -Eric


No worries! I just returned from a NESCent-sponsored hackathon where
we used BioPython as part of a Virtuoso-backed RDF treestore
(https://github.com/phylotastic/rdf-treestore). Now that I'm back,
I'll work on the bootstrap support values and annotations for NeXML as
I have time.

I think it's probably much easier to just hard-code specific tags for
now. The CDAO module can convert the more readable CDAO prefix names
to OBO numeric identifiers (e.g. cdao:has_Root -> obo:CDAO_0000148)
but other than that I don't see a good way for them to interact.

I gave a short demo of Bio.Phylo at the hackathon, and people were
very impressed. We had some issues with Newick and Nexus parsing as
well, so I'll open issues on the bug tracker.

~Ben

From redmine at redmine.open-bio.org  Mon Feb  4 10:20:38 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 15:20:38 +0000
Subject: [Biopython-dev] [Biopython - Bug #3407] (New) Handling of bootstrap
	support values in Bio.Phylo Newick parser
Message-ID: <redmine.issue-3407.20130204152038@redmine.open-bio.org>


Issue #3407 has been reported by Ben Morris.

----------------------------------------
Bug #3407: Handling of bootstrap support values in Bio.Phylo Newick parser
https://redmine.open-bio.org/issues/3407

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


This was reported to me by Arlin Stoltzfus (quote):


"There is a description of Newick here: 

  http://evolution.genetics.washington.edu/phylip/newicktree.html

and a BNF here: 

  http://evolution.genetics.washington.edu/phylip/newick_doc.html

Note that this allows square-bracketed comments. 

Bootstrap values commonly are represented in 2 ways, one of which is wrong.  The wrong way to represent bootstrap values is to present them as internal node labels.   Labels for internal nodes are given as follows: 

   ((( human: 0.1, chimp:0.1 ) primates: 0.2, (rat:0.1, mouse:0.1) rodents:0.2), cat:0.3 )

where "primates" and "rodents" are internal node labels.  They go between the right paren and the (optional) colon and distance.  If you put numbers in the label position, a graphic renderer may place them on the nodes, which is why some people represent bootstrap values this way. 

However, the preferred way to represent bootstrap values is to make them syntactic comments (enclosed in square brackets) placed after all other node information, i.e., after the optional colon & branch length.   Both examples are shown here: 

((raccoon:19.19959,bear:6.80041)50:0.84600,((sea_lion:11.99700, seal:12.00300)100:7.52973,((monkey:100.85930,cat:47.14069)80:20.59201, weasel:18.87953)75:2.09460)50:3.87382,dog:25.46154);
or
((raccoon:19.19959,bear:6.80041):0.84600[50],((sea_lion:11.99700, seal:12.00300):7.52973[100],((monkey:100.85930,cat:47.14069):20.59201[80], weasel:18.87953):2.09460[75]):3.87382[50],dog:25.46154);

I recommend that you only support the second version, and treat the first version as a case of internal node labels.  

Arlin
-------
Arlin Stoltzfus (arlin at umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org"


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Feb  4 10:26:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 15:26:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3408] (New) Parsing of Nexus
	files generated by TreeBase fails (Bio.Phylo)
Message-ID: <redmine.issue-3408.20130204152631@redmine.open-bio.org>


Issue #3408 has been reported by Ben Morris.

----------------------------------------
Bug #3408: Parsing of Nexus files generated by TreeBase fails (Bio.Phylo)
https://redmine.open-bio.org/issues/3408

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Steps to reproduce: 

Pick a tree on TreeBase (e.g. http://treebase.org/treebase-web/search/study/trees.html?id=12003 or http://treebase.org/treebase-web/search/study/trees.html?id=1029) and click on "download reconstructed NEXUS file."

Attempt to parse the file using Bio.Phylo.read.

Exception:

<pre>Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 62, in read
    tree = tree_gen.next()
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 50, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NexusIO.py", line 39, in parse
    nex = Nexus.Nexus(handle)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 572, in __init__
    self.read(input)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 623, in read
    self._parse_nexus_block(title, contents)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 664, in _parse_nexus_block
    getattr(self,'_'+line.command)(line.options)
AttributeError: 'Nexus' object has no attribute '_link'
</pre>


DendroPy is able to parse the same files.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Feb  4 11:49:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 16:49:07 +0000
Subject: [Biopython-dev] Deprecations for Biopython 1.61 release;
	Was: Bio.Motif update
In-Reply-To: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>
References: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>
Message-ID: <CAKVJ-_5KcSA2Sk4djieEcAP+RyJB7Ek44PSjJV=iykQRcdVeGQ@mail.gmail.com>

On Mon, Feb 4, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> I went over the list in the DEPRECATED file last month, but
>>> a second check would be a good idea.
>>
>> The following were declared obsolete in Biopython 1.60, and can
>> in principle be declared deprecated in Biopython 1.61:
>>
>> ----------
>> Bio/Blast/Applications.py:
>> BlastallCommandline
>> BlastpgpCommandline
>> RpsBlastCommandline
>>
>> Bio/Blast/NCBIStandalone.py overall, and specifically:
>> blastall
>> blastpgp
>> rpsblast
>>
>> Bio/ParserSupport.py overall
>>
>> Bio/PDB/AbstractPropertyMap.py:
>> The has_key function in class AbstractPropertyMap
>>
>> Bio/PDB/FragmentMapper.py:
>> The has_key function in class FragmentMapper
>>
>> Bio/UniGene/UniGene.py overall
>>
>> In BioSQL/BioSeqDatabase.py:
>>   class DBServer:
>>      remove_database
>>   class BioSeqDatabase:
>>      get_all_primary_ids
>>      get_Seq_by_primary_id
>>
>> -----------
>>
>> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:
>>
>> Bio/Align/__init__.py:
>>   class MultipleSeqAlignment:
>>      get_column
>>      add_sequence
>>
>> Bio/Align/Generic.py:
>>   class Alignment overall
>>     get_all_seqs
>>     get_seq_by_num
>>
>> Bio/File.py:
>>   class StringHandle
>>
>> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
>>   class AbstractDrawer:
>>     _set_xcentre, _set_ycentre
>>
>> Bio/Graphics/GenomeDiagram/_Graph.py:
>>   class GraphData:
>>     _set_centre
>>
>> Bio/ParserSupport.py:
>>   SGMLStrippingConsumer
>>
>> Bio/Seq.py:
>>   class Seq:
>>      .data property
>>
>> Bio/SeqIO/SffIO.py:
>>   _sff_read_roche_index_xml
>>
>> --------------------
>>
>> The tostring() method of the class Seq in Bio/Seq.py:
>> Can we declare this obsolete?
>>
>> -Michiel
>
> Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done:
> https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1
>
> Bio/File.py and Bio/ParserSupport.py bits done:
> https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a
>
> GenomeDiagram centre setters done:
> https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288

Michiel already did most of the others,
https://github.com/biopython/biopython/commit/1b2025bee868b0282b913690a999833d13598ea4

I've just removed the Seq object's deprecated data property:
https://github.com/biopython/biopython/commit/e3cf12a1bf28c1cd52e4b5492fb1cd76731b486b

For the Seq object's tostring() method, let's review Bow's pull request
after this release? https://github.com/biopython/biopython/pull/137

Regards,

Peter

From p.j.a.cock at googlemail.com  Mon Feb  4 12:26:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 17:26:44 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
	<1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7EVLdBwX=Gjzdpdo3abj39dPnkytdeNHtkFedzbHMK7w@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter and all,
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> We need to say something about this in the NEWS file too.
>
> Done.
>
>> I think it would make sense to add a PendingDeprecationWarning
>> to Bio.Motif now.
>
> Done.
>
>> Also, if you feel the new Bio.motifs API isn't quite
>> settled yet, adding the new BiopythonExperimentalWarning to
>> that makes sense.
>
> I don't expect big changes in the API, so I think we can do without the
> BiopythonExperimentalWarning. Also we should avoid the situation
> that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a
> BiopythonExperimentalWarning.
>
>> (And once this is settled, I think we can schedule the
>> release)

Hi Michiel,

Rather than having two (very similar) chapters in the Tutorial for
the old Bio.Motif and new Bio.motifs modules, I've downgraded
the old chapter to just a section of the new chapter:

https://github.com/biopython/biopython/commit/ee5cccf6bc661befc924cb7fc2a422c07f3eeee1

There is still a lot of redundant content - would you be able to
shorten this? Or can we just cut it and refer anyone interested
to the tutorial shipped with Biopython 1.60 instead?

I think a summary of the differences  be more useful, to help people
convert from the old module to the new motifs module.

Also, what is the point of the Bio.motifs.create function? Is there
a reason not to initialise a Motif object directly?

Thanks,

Peter

From p.j.a.cock at googlemail.com  Mon Feb  4 12:57:42 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 17:57:42 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?
>
> Thanks,
>
> Peter

Hi all,

I've posted the current tutorial as HTML and PDF online [*],
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

It would be great to have you all re-read chapters you've
contributed to or are familiar with - and fix or report any
more typos etc.

Note that some of the embedded examples in the LaTeX
source are now tested via doctest using test_Tutorial.py,
so if you do make some local edits run that before you
commit them.

Thanks,

Peter

[*] Those URLs used to be updated nightly, something I've
not yet restored since the website was moved from the old
OBF hardware to an Amazon cloud server. The simplest
option here would be to install latex on the server...

From redmine at redmine.open-bio.org  Mon Feb  4 13:14:19 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 18:14:19 +0000
Subject: [Biopython-dev] [Biopython - Bug #3409] (New) Newick parser fails
	to parse Greengenes tree (Bio.Phylo)
Message-ID: <redmine.issue-3409.20130204181419@redmine.open-bio.org>


Issue #3409 has been reported by Ben Morris.

----------------------------------------
Bug #3409: Newick parser fails to parse Greengenes tree (Bio.Phylo)
https://redmine.open-bio.org/issues/3409

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


The file is available here: http://www.evoio.org/wg/evoio/images/f/f9/Greengenes2011.txt (9.2 MB)

The problem may be related to the use of single-quoted node labels which sometimes contain parentheses, e.g. <pre>'p__Fusobacteria; c__Fusobacteria (class); o__Fusobacteriales; f__Fusobacteriaceae':0.11021</pre>

Exception:

<pre>  ...
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 87, in _parse_subtree
    raise NewickError("Parentheses do not match in (sub)tree: " + text)
Bio.Phylo.NewickIO.NewickError: Parentheses do not match in (sub)tree: 139839:0.04507):0.02429
</pre>

Other Newick parsers (ete and dendropy) are able to parse this file.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Mon Feb  4 23:01:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 4 Feb 2013 20:01:26 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_7EVLdBwX=Gjzdpdo3abj39dPnkytdeNHtkFedzbHMK7w@mail.gmail.com>
Message-ID: <1360036886.33220.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Peter,

--- On Mon, 2/4/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Rather than having two (very similar) chapters in the
> Tutorial for the old Bio.Motif and new Bio.motifs modules,
> I've downgraded the old chapter to just a section of
> the new chapter:
... 
> There is still a lot of redundant content - would you be
> able to shorten this?

I think it's OK if it is redundant. Anyway the chapter on the older Bio.Motif will be removed a few releases later.

> I think a summary of the differences?be more useful,
> to help people convert from the old module to the new
> motifs module.

Maybe, but for me it doesn't have a high priority. It's easier to understand the new chapter on Bio.motifs.
 
> Also, what is the point of the Bio.motifs.create function?
> Is there a reason not to initialise a Motif object directly?

There are two ways to initialize a Motif: either to specify the alignment from which the motif is created, or directly from a position-weight matrix. This can be a bit confusing. To separate the two, the Bio.motifs.create function only initializes a Motif from an alignment; some of the motif parsers initialize a Motif from a position-weight matrix. 

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Tue Feb  5 07:32:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:32:47 +0000
Subject: [Biopython-dev] KEGG enhancements
Message-ID: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>

Hi all,

We have a couple of new pull requests for KEGG enhancements,
which we can look at after the imminent Biopython 1.61 release
goes out this week.

Kevin's working on the REST API,
https://github.com/biopython/biopython/pull/152

Leighton's working on KGML and graphics,
https://github.com/biopython/biopython/pull/153

There is a tiny bit of online access code in Leighton's code
which can probably be changed to use Kevin's work - I've
not had time to examine the overlap yet.

Peter


---------- Forwarded message ----------
From: kevin <notifications at github.com>
Date: Mon, Feb 4, 2013 at 8:03 PM
Subject: [biopython] Add KEGG API Querying Support (#152)
To: biopython/biopython <biopython at noreply.github.com>


This adds support to query KEGG's REST API
(http://www.kegg.jp/kegg/docs/keggapi.html) along with simple tests
which ensure that the correct url is hit and documentation in the
cookbook.

This has been discussed on the mailing list in the following thread:
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009981.html.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/kevinwuhoo/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/152

Commit Summary

Added a KEGG API Wrapper
Forgot copyright
Added a general parser and a KEGG section in the tutorial.
Updated querying code and corresponding tests.
Updated documentation to reflect changes in KEGG module.

File Changes

M Bio/KEGG/__init__.py (196)
M Doc/Tutorial.tex (88)
M Tests/output/test_KEGG (41)
M Tests/test_KEGG.py (159)

Patch Links:

https://github.com/biopython/biopython/pull/152.patch
https://github.com/biopython/biopython/pull/152.diff

From p.j.a.cock at googlemail.com  Tue Feb  5 07:33:52 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:33:52 +0000
Subject: [Biopython-dev] KEGG enhancements
In-Reply-To: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
References: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
Message-ID: <CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>

On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> We have a couple of new pull requests for KEGG enhancements,
> which we can look at after the imminent Biopython 1.61 release
> goes out this week.
>
> Kevin's working on the REST API,
> https://github.com/biopython/biopython/pull/152
>
> Leighton's working on KGML and graphics,

Sorry, the correct URL, https://github.com/biopython/biopython/pull/155

Details below,

Peter

---------- Forwarded message ----------
From: Leighton Pritchard <notifications at github.com>
Date: Tue, Feb 5, 2013 at 12:28 PM
Subject: [biopython] KGML files (#155)
To: biopython/biopython <biopython at noreply.github.com>


As we discussed - not an ideal pull request (rebasing added the recent
Biopython changes to the KEGG branch, rather than what was expected),
but if it's workable, here's the code in a way that doesn't seem to
break Biopython ;)

L.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/widdowquinn/biopython kegg

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/155

Commit Summary

First addition of KGML module (with tests)
Moved Bio.KGML to Bio.KEGG.KGML and split KGML tests
Modified comments to indicate TODO
Removed accidentally-committed files
Fix typo in error message
Fix typo in blastall wrapper
Add new Blast 2.2.27+ arguments to wrappers
Ignore new blastx arguments if testing with old BLAST+
BLAST 2.2.27+ dropped -frame_shift_penalty argument
Remove deprecated Bio.File.StringHandle and SGMLStripper
Remove centre setters, add explicit deprecation warning to getters.
Clarify docstrings of deprecated BLAST functions.
Avoid ResourceWarning: unclosed file in these doctests
Close handle in this doctest
Remove the deprecated Seq object's data property
Remove duplicated section labels in Tutorial (in repeated Motifs text)
Downgrade Bio.Motif chapter to a section at the end of the Bio.motifs chapter
Fix a typo
Clarify docstring for obsolete Bio.Motif module
Explain Bio.motifs replaces Bio.Motif in its docstring
Update date in Tutorial
Fix 2 typos.
Add links to SearchIO tutorial files
Update SearchIO tutorial language style
Add links to SearchIO documentation pages
Tutorial specific example files have previously gone under Doc/examples
Update paths in tutorial after moving example files

File Changes

M Bio/Blast/Applications.py (36)
M Bio/Blast/NCBIStandalone.py (21)
M Bio/File.py (65)
M Bio/Graphics/GenomeDiagram/_AbstractDrawer.py (30)
M Bio/Graphics/GenomeDiagram/_Graph.py (14)
A Bio/Graphics/KGML_vis.py (422)
A Bio/KEGG/KGML/KGML_parser.py (184)
A Bio/KEGG/KGML/KGML_pathway.py (766)
A Bio/KEGG/KGML/KGML_scrape.py (109)
A Bio/KEGG/KGML/__init__.py (15)
M Bio/Motif/__init__.py (13)
M Bio/ParserSupport.py (34)
M Bio/Seq.py (33)
M Bio/SeqIO/SffIO.py (1)
M Bio/SeqRecord.py (6)
M Bio/motifs/__init__.py (7)
M DEPRECATED (8)
M Doc/Tutorial.tex (164)
A Doc/examples/my_blast.xml (0)
A Doc/examples/my_blat.psl (0)
A Tests/KEGG/ko01100.kgml (17805)
A Tests/KEGG/ko01100.xml (25176)
A Tests/KEGG/ko01100_mod_original.pdf (98)
A Tests/KEGG/ko01100_original.pdf (98)
A Tests/KEGG/ko01120.xml (11425)
A Tests/KEGG/ko03070.kgml (249)
A Tests/KEGG/ko03070.xml (413)
A Tests/KEGG/ko03070_mod_original.pdf (113)
A Tests/KEGG/ko03070_original.pdf (113)
A Tests/KEGG/map01100.png (0)
A Tests/KEGG/map03070.png (0)
D Tests/Tutorial/README.txt (9)
M Tests/test_File.py (13)
A Tests/test_KGML_graphics.py (138)
A Tests/test_KGML_nographics.py (99)
A Tests/test_KGML_online.py (68)
M Tests/test_NCBI_BLAST_tools.py (9)
M Tests/test_ParserSupport.py (9)
M setup.py (1)

Patch Links:

https://github.com/biopython/biopython/pull/155.patch
https://github.com/biopython/biopython/pull/155.diff

From p.j.a.cock at googlemail.com  Tue Feb  5 07:36:55 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:36:55 +0000
Subject: [Biopython-dev] KEGG enhancements
In-Reply-To: <CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>
References: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
	<CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>
Message-ID: <CAKVJ-_43EXOwqAR86WsSxjGUgZK0-w8Fb2V5um-0hQY-tt5NOw@mail.gmail.com>

On Tue, Feb 5, 2013 at 12:33 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi all,
>>
>> We have a couple of new pull requests for KEGG enhancements,
>> which we can look at after the imminent Biopython 1.61 release
>> goes out this week.
>>
>> Kevin's working on the REST API,
>> https://github.com/biopython/biopython/pull/152
>>
>> Leighton's working on KGML and graphics,
>
> Sorry, the correct URL, https://github.com/biopython/biopython/pull/155
>
> Details below,

See also Leighton's blog posts about this work (with pictures):
http://armchairbiology.blogspot.co.uk/2013/01/keggwatch-part-i.html
http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-ii.html
http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-iii.html

Regards,

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 08:55:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 13:55:20 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>
Message-ID: <CAKVJ-_76PY9oLykHwqGHgJhs1GxTwOiabMQx+r5+PebQbMWtUQ@mail.gmail.com>

Hi all,

I'm going to try and do the release this afternoon, so
no commits to the master branch until further notice
please.

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 09:49:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 14:49:20 +0000
Subject: [Biopython-dev] Biopython 1.61 release
Message-ID: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>

On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> I'm going to try and do the release this afternoon, so
> no commits to the master branch until further notice
> please.
>
> Thanks,
>
> Peter

The release is in progress...

The Windows installers are on the website for some quick
pre-announcement testing. If anyone spots an issue, please
email me ASAP: http://biopython.org/DIST/

Last time we put 'beta' in the Python 3.2 installer to emphasise
this was still not quite reading for prime time. Should we do that
again? How comfortable are we all about encouraging more
use under Python 3?

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 13:14:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:14:24 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
Message-ID: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>

On Tue, Feb 5, 2013 at 2:49 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi all,
>>
>> I'm going to try and do the release this afternoon, so
>> no commits to the master branch until further notice
>> please.
>>
>> Thanks,
>>
>> Peter
>
> The release is in progress...
>
> The Windows installers are on the website for some quick
> pre-announcement testing. If anyone spots an issue, please
> email me ASAP: http://biopython.org/DIST/
>
> Last time we put 'beta' in the Python 3.2 installer to emphasise
> this was still not quite reading for prime time. Should we do that
> again? How comfortable are we all about encouraging more
> use under Python 3?

I'm planning to do the same in terms of putting beta in the
Windows installer for Python 3.2.

After some trouble, I now have the epydoc API files updated
(a manual refresh might be needed to see the changes):
http://biopython.org/DIST/docs/api/

Bow - the `backtick` markup doesn't do anything in epydoc, but
perhaps for the next release we can turn the SearchIO markup
into restructuredtext instead?

I think last time I didn't have the docutils dependency installed
in order for epydoc to try and parse the restructuredtext (used
in Bio.Phylo). Running epydoc also showed a few more epydoc
formatting errors, fixed in git - I will now regenerate the installers,
and tag this in git etc.

Peter

From w.arindrarto at gmail.com  Tue Feb  5 13:22:46 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 5 Feb 2013 19:22:46 +0100
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
Message-ID: <CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>

Hi Peter,

> Bow - the `backtick` markup doesn't do anything in epydoc, but
> perhaps for the next release we can turn the SearchIO markup
> into restructuredtext instead?
>
> I think last time I didn't have the docutils dependency installed
> in order for epydoc to try and parse the restructuredtext (used
> in Bio.Phylo). Running epydoc also showed a few more epydoc
> formatting errors, fixed in git - I will now regenerate the installers,
> and tag this in git etc.

Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText
markup; in hindsight probably not wise since we still rely on epydoc.
Using rSt for the next release sounds good.

On a related not, do we have any solid plans to move out of epydoc
(and into Sphinx?) for the next release?

regards,
Bow

From p.j.a.cock at googlemail.com  Tue Feb  5 13:30:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:30:29 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>
Message-ID: <CAKVJ-_6xHY9G5eE=Bb0mab5K+FF7cseUqwXqEgJhkwLA01DYZQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 6:22 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
>
>> Bow - the `backtick` markup doesn't do anything in epydoc, but
>> perhaps for the next release we can turn the SearchIO markup
>> into restructuredtext instead?
>>
>> I think last time I didn't have the docutils dependency installed
>> in order for epydoc to try and parse the restructuredtext (used
>> in Bio.Phylo). Running epydoc also showed a few more epydoc
>> formatting errors, fixed in git - I will now regenerate the installers,
>> and tag this in git etc.
>
> Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText
> markup; in hindsight probably not wise since we still rely on epydoc.
> Using rSt for the next release sounds good.

Using reStructuredText (like Eric did with Bio.Phylo) would have
been (and is) fine, however you had __docformat__ = 'epytext en'
in the file.

> On a related not, do we have any solid plans to move out of epydoc
> (and into Sphinx?) for the next release?

Not yet - but moving all the docstrings to reStructuredText is a
very good step towards that, and a chance to review/update
all the plain text docstrings in particular to look nicer and be
more consistent.

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 13:57:58 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:57:58 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
Message-ID: <CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>

Hi all,

The Biopython 1.61 release files are live, http://biopython.org/DIST/
and this its tagged on GitHub now, i.e. this commit:
https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5

I've not yet pushed this to PyPI, nor done the announcement.

If anyone would like to write a draft based on the NEWS file
and the previous announcements during the next hour or two,
that would be great. Otherwise I'll do this after dinner...

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 16:30:45 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 21:30:45 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
Message-ID: <CAKVJ-_70VMtqdOFpQe6m9ue0mgbOxOyyhG4kwy77i+PuSM-vEQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> The Biopython 1.61 release files are live, http://biopython.org/DIST/
> and this its tagged on GitHub now, i.e. this commit:
> https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5
>
> I've not yet pushed this to PyPI, nor done the announcement.
>
> If anyone would like to write a draft based on the NEWS file
> and the previous announcements during the next hour or two,
> that would be great. Otherwise I'll do this after dinner...
>
> Thanks,
>
> Peter

Draft text below, based heavily on the NEWS file - any comments?

I'll post the new Tutorial online now, and then update the
Downloads page on the wiki before posting this.

Peter

--

Biopython 1.61 released

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 too follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

Features

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto


From p.j.a.cock at googlemail.com  Tue Feb  5 16:42:06 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 21:42:06 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CALfq9tLHAjX0pjLX_nB+FPkv3X5tJcp+nWRxVEoSc8J1TYhAqg@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
	<CAKVJ-_70VMtqdOFpQe6m9ue0mgbOxOyyhG4kwy77i+PuSM-vEQ@mail.gmail.com>
	<CALfq9tLHAjX0pjLX_nB+FPkv3X5tJcp+nWRxVEoSc8J1TYhAqg@mail.gmail.com>
Message-ID: <CAKVJ-_5MCMT0NwP+5FdN+6K=GwVhfcOkNVpVGF_3Y1+1ej2=ew@mail.gmail.com>

On Tue, Feb 5, 2013 at 9:34 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi Peter,
>
> Looks great. Very small typo: in the last sentence of the paragraph about
> platforms, "Python 3.3 too follow" should be "Python 3.3 to follow".

Thanks Lenna :)

I didn't make an installer for Python 3.3 this afternoon, but I will
tomorrow having heard back from the NumPy 1.7 release manager
that there shouldn't be any problems from compiling against their
release candidate:

http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065369.html

On a related point, NumPy are looking at if they can include
pre-compiled installers for 64bit Windows - once that happens
(and it may have to wait until NumPy 1.8), we will need to look
at this too:

http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065339.html

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 17:05:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:05:25 +0000
Subject: [Biopython-dev] Biopython 1.61 released
Message-ID: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>

Dear Biopythoneers,

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment:

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 to follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

New Features:

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO:

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors:

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto

Thank you all.

Release announcement here (RSS feed available):
http://news.open-bio.org/news/2013/02/biopython-1-61-released/

P.S. You can follow @Biopython on Twitter
https://twitter.com/Biopython


From p.j.a.cock at googlemail.com  Tue Feb  5 17:38:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:38:32 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50FD0F2B.1080606@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>

On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
>> Kai - would you mind retesting with f_loc5 (the rebased branch)?
>
> The location of the feature that caused trouble for me still looks
> correct. I'm currently running some more sequences, but I'm pretty
> confident that the code will work just fine. The tests I added to the
> genbank parser code for all the problem cases I had pass, after all. :)
>
>> Everyone - does it seem sensible to include this now, ready for the
>> upcoming release (*)? Or perhaps just after the release?
>
> I'd perfer having this in the next release if possible, but of course
> if the release after that is coming up within a reasonable time frame,
> that would work as well.
>
> Cheers,
> Kai

Unless anyone objects, I will apply the (rebased) version of this
f_loc4 / f_loc5 branch later this week (now that Biopython 1.61
is out).

This replaces the SeqFeature use of sub_features with a new
CompoundLocation which I think is a far more natural way to
handle join locations in EMBL/GenBank files.

Also, it means we can offer parsing of GenBank/EMBL style
location lines into (Compound)Location objects directly :)

Regards,

Peter

From w.arindrarto at gmail.com  Tue Feb  5 19:03:52 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 6 Feb 2013 01:03:52 +0100
Subject: [Biopython-dev] [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CADEGkF4-H0cc2zC245gaK3AbN8kZRyByD0xe5o8RfX1patj-qA@mail.gmail.com>

Hi Peter,

> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
>
> Please note we are phasing out support for Python 2.5. We will
> continue support for at least one further release (Biopython 1.62).
> This could be extended given feedback from our users. Focusing on
> Python 2.6 and 2.7 only will make writing Python 3 compatible code
> easier.
>
> New Features:
>
> GenomeDiagram has three new sigils (shapes to illustrate features).
> OCTO shows an octagonal shape, like the existing BOX sigil but with
> the corners cut off. JAGGY shows a box with jagged edges at the start
> and end, intended for things like NNNNN regions in draft genomes.
> Finally BIGARROW is like the existing ARROW sigil but is drawn
> straddling the axis. This is useful for drawing vertically compact
> figures where you do not have overlapping genes.
>
> New module Bio.Graphics.ColorSpiral can generate colors along a spiral
> path through HSV color space. This can be used to make arbitrary
> ?rainbow? scales, for example to color features or cross-links on a
> GenomeDiagram figure.
>
> The Bio.SeqIO module now supports reading sequences from PDB files in
> two different ways. The ?pdb-atom? format determines the sequence as
> it appears in the structure based on the atom coordinate section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you can use
> the ?pdb-seqres? format to read the complete protein sequence as it is
> listed in the PDB header, if available.
>
> The Bio.SeqUtils module how has a seq1 function to turn a sequence
> using three letter amino acid codes into one using the more common one
> letter codes. This acts as the inverse of the existing seq3 function.
>
> The multiple-sequence-alignment object used by Bio.AlignIO etc now
> supports an annotation dictionary. Additional support for per-column
> annotation is planned, with addition and splicing to work like that
> for the SeqRecord per-letter annotation.
>
> The Bio.Motif module has been updated and reorganized. To allow for a
> clean deprecation of the old code, the new motif code is stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was added to
> Bio.Motif.
>
> Experimental Code ? SearchIO:
>
> This release also includes Bow?s Google Summer of Code work writing a
> unified parsing framework for NCBI BLAST (assorted formats including
> tabular and XML), HMMER, BLAT, and other sequence searching tools.
> This is currently available with the new BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental. We?re bundling
> it with the main release to get more public feedback, but with the big
> warning that the API is likely to change. In fact, even the current
> name of Bio.SearchIO may change since unless you are familiar with
> BioPerl its purpose isn?t immediately clear.
>
> Contributors:
>
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
>
> Thank you all.
>
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython

Thanks for doing the release! It feels exciting to see SearchIO code
finally live in the distributions :). Hopefully this will result in
more feedback (and then more improvements ~ likewise for the whole
Biopython as well).

Also, thank you as well to everyone who has criticized / commented /
contributed code to the module :).

cheers,
Bow


From mjldehoon at yahoo.com  Tue Feb  5 20:03:30 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 5 Feb 2013 17:03:30 -0800 (PST)
Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Thanks Peter!
Great to see this new code out.

Best,
-Michiel.

--- On Tue, 2/5/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: [Biopython-announce] Biopython 1.61 released
> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" <biopython at lists.open-bio.org>, "Biopython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Tuesday, February 5, 2013, 5:05 PM
> Dear Biopythoneers,
> 
> Source distributions and Windows installers for Biopython
> 1.61 are now
> available from the downloads page on the Biopython website
> and from
> the Python Package Index (PyPI).
> 
> The updated Biopython Tutorial and Cookbook is online
> (PDF).
> 
> Platforms/Deployment:
> 
> We currently support Python 2.5, 2.6 and 2.7 and also test
> under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and
> Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our
> C
> extensions). We are still encouraging early adopters to help
> test on
> these platforms, and have included a ?beta? installer
> for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
> 
> Please note we are phasing out support for Python 2.5. We
> will
> continue support for at least one further release (Biopython
> 1.62).
> This could be extended given feedback from our users.
> Focusing on
> Python 2.6 and 2.7 only will make writing Python 3
> compatible code
> easier.
> 
> New Features:
> 
> GenomeDiagram has three new sigils (shapes to illustrate
> features).
> OCTO shows an octagonal shape, like the existing BOX sigil
> but with
> the corners cut off. JAGGY shows a box with jagged edges at
> the start
> and end, intended for things like NNNNN regions in draft
> genomes.
> Finally BIGARROW is like the existing ARROW sigil but is
> drawn
> straddling the axis. This is useful for drawing vertically
> compact
> figures where you do not have overlapping genes.
> 
> New module Bio.Graphics.ColorSpiral can generate colors
> along a spiral
> path through HSV color space. This can be used to make
> arbitrary
> ?rainbow? scales, for example to color features or
> cross-links on a
> GenomeDiagram figure.
> 
> The Bio.SeqIO module now supports reading sequences from PDB
> files in
> two different ways. The ?pdb-atom? format determines the
> sequence as
> it appears in the structure based on the atom coordinate
> section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you
> can use
> the ?pdb-seqres? format to read the complete protein
> sequence as it is
> listed in the PDB header, if available.
> 
> The Bio.SeqUtils module how has a seq1 function to turn a
> sequence
> using three letter amino acid codes into one using the more
> common one
> letter codes. This acts as the inverse of the existing seq3
> function.
> 
> The multiple-sequence-alignment object used by Bio.AlignIO
> etc now
> supports an annotation dictionary. Additional support for
> per-column
> annotation is planned, with addition and splicing to work
> like that
> for the SeqRecord per-letter annotation.
> 
> The Bio.Motif module has been updated and reorganized. To
> allow for a
> clean deprecation of the old code, the new motif code is
> stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was
> added to
> Bio.Motif.
> 
> Experimental Code ? SearchIO:
> 
> This release also includes Bow?s Google Summer of Code
> work writing a
> unified parsing framework for NCBI BLAST (assorted formats
> including
> tabular and XML), HMMER, BLAT, and other sequence searching
> tools.
> This is currently available with the new
> BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental.
> We?re bundling
> it with the main release to get more public feedback, but
> with the big
> warning that the API is likely to change. In fact, even the
> current
> name of Bio.SearchIO may change since unless you are
> familiar with
> BioPerl its purpose isn?t immediately clear.
> 
> Contributors:
> 
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
> 
> Thank you all.
> 
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
> 
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython
> 
> _______________________________________________
> Biopython-announce mailing list? -? Biopython-announce at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-announce
> 


From mjldehoon at yahoo.com  Tue Feb  5 20:07:53 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 5 Feb 2013 17:07:53 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
Message-ID: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>

With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:

http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html

Or is anybody else already looking at this module?

Best,
-Michiel.

From arklenna at gmail.com  Tue Feb  5 20:31:16 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 5 Feb 2013 20:31:16 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>

Hi Michiel,

I worked on that a bit early last year. See thread on this bug:

https://redmine.open-bio.org/issues/2619

Namely, I determined that the flex headers aren't required to compile the
flex-generated C, which is a great start.

I also started work on a PLY-based pure Python reimplementation. Pull
request here:

https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have
any questions about what I did and I will do my best to remember...

Cheers,

Lenna


On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> With Biopython 1.61 now out, perhaps this is a good time to tackle
> Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like
> to replace this with a plain C module, or perhaps with a pure-Python
> parser. This issue was previously discussed here:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html
>
> Or is anybody else already looking at this module?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From kieran.mace at gmail.com  Tue Feb  5 21:05:19 2013
From: kieran.mace at gmail.com (Kieran Mace)
Date: Tue, 5 Feb 2013 18:05:19 -0800
Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released
In-Reply-To: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>

Hi.  

I'm wondering if the MafIO module is going to be included in this release?

-Kieran

On Feb 5, 2013, at 17:03, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Thanks Peter!
> Great to see this new code out.
> 
> Best,
> -Michiel.
> 
> --- On Tue, 2/5/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
>> From: Peter Cock <p.j.a.cock at googlemail.com>
>> Subject: [Biopython-announce] Biopython 1.61 released
>> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" <biopython at lists.open-bio.org>, "Biopython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
>> Date: Tuesday, February 5, 2013, 5:05 PM
>> Dear Biopythoneers,
>> 
>> Source distributions and Windows installers for Biopython
>> 1.61 are now
>> available from the downloads page on the Biopython website
>> and from
>> the Python Package Index (PyPI).
>> 
>> The updated Biopython Tutorial and Cookbook is online
>> (PDF).
>> 
>> Platforms/Deployment:
>> 
>> We currently support Python 2.5, 2.6 and 2.7 and also test
>> under
>> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and
>> Jython
>> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our
>> C
>> extensions). We are still encouraging early adopters to help
>> test on
>> these platforms, and have included a ?beta? installer
>> for Python 3.2
>> (and Python 3.3 to follow soon) under 32-bit Windows.
>> 
>> Please note we are phasing out support for Python 2.5. We
>> will
>> continue support for at least one further release (Biopython
>> 1.62).
>> This could be extended given feedback from our users.
>> Focusing on
>> Python 2.6 and 2.7 only will make writing Python 3
>> compatible code
>> easier.
>> 
>> New Features:
>> 
>> GenomeDiagram has three new sigils (shapes to illustrate
>> features).
>> OCTO shows an octagonal shape, like the existing BOX sigil
>> but with
>> the corners cut off. JAGGY shows a box with jagged edges at
>> the start
>> and end, intended for things like NNNNN regions in draft
>> genomes.
>> Finally BIGARROW is like the existing ARROW sigil but is
>> drawn
>> straddling the axis. This is useful for drawing vertically
>> compact
>> figures where you do not have overlapping genes.
>> 
>> New module Bio.Graphics.ColorSpiral can generate colors
>> along a spiral
>> path through HSV color space. This can be used to make
>> arbitrary
>> ?rainbow? scales, for example to color features or
>> cross-links on a
>> GenomeDiagram figure.
>> 
>> The Bio.SeqIO module now supports reading sequences from PDB
>> files in
>> two different ways. The ?pdb-atom? format determines the
>> sequence as
>> it appears in the structure based on the atom coordinate
>> section of
>> the file (via Bio.PDB,
>> so NumPy is currently required for this). Alternatively, you
>> can use
>> the ?pdb-seqres? format to read the complete protein
>> sequence as it is
>> listed in the PDB header, if available.
>> 
>> The Bio.SeqUtils module how has a seq1 function to turn a
>> sequence
>> using three letter amino acid codes into one using the more
>> common one
>> letter codes. This acts as the inverse of the existing seq3
>> function.
>> 
>> The multiple-sequence-alignment object used by Bio.AlignIO
>> etc now
>> supports an annotation dictionary. Additional support for
>> per-column
>> annotation is planned, with addition and splicing to work
>> like that
>> for the SeqRecord per-letter annotation.
>> 
>> The Bio.Motif module has been updated and reorganized. To
>> allow for a
>> clean deprecation of the old code, the new motif code is
>> stored in a
>> new module Bio.motifs, and a PendingDeprecationWarning was
>> added to
>> Bio.Motif.
>> 
>> Experimental Code ? SearchIO:
>> 
>> This release also includes Bow?s Google Summer of Code
>> work writing a
>> unified parsing framework for NCBI BLAST (assorted formats
>> including
>> tabular and XML), HMMER, BLAT, and other sequence searching
>> tools.
>> This is currently available with the new
>> BiopythonExperimentalWarning
>> to indicate that this is still somewhat experimental.
>> We?re bundling
>> it with the main release to get more public feedback, but
>> with the big
>> warning that the API is likely to change. In fact, even the
>> current
>> name of Bio.SearchIO may change since unless you are
>> familiar with
>> BioPerl its purpose isn?t immediately clear.
>> 
>> Contributors:
>> 
>> Brandon Invergo
>> Bryan Lunt (first contribution)
>> Christian Brueffer (first contribution)
>> David Cain
>> Eric Talevich
>> Grace Yeo (first contribution)
>> Jeffrey Chang
>> Jingping Li (first contribution)
>> Kai Blin (first contribution)
>> Leighton Pritchard
>> Lenna Peterson
>> Lucas Sinclair (first contribution)
>> Michiel de Hoon
>> Nick Semenkovich (first contribution)
>> Peter Cock
>> Robert Ernst (first contribution)
>> Tiago Antao
>> Wibowo ?Bow? Arindrarto
>> 
>> Thank you all.
>> 
>> Release announcement here (RSS feed available):
>> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>> 
>> P.S. You can follow @Biopython on Twitter
>> https://twitter.com/Biopython
>> 
>> _______________________________________________
>> Biopython-announce mailing list  -  Biopython-announce at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-announce
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Wed Feb  6 03:37:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 6 Feb 2013 08:37:05 +0000
Subject: [Biopython-dev] Biopython 1.61 released
In-Reply-To: <EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>
References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>
Message-ID: <CAKVJ-_6SOm1j0-273iaEqGH46aiXsCA1-Mb7mmOQCfmjjsgk0w@mail.gmail.com>

On Wednesday, February 6, 2013, Kieran Mace wrote:

> Hi.
>
> I'm wondering if the MafIO module is going to be included in this release?
>
> -Kieran


 I'm not promising but I would hope so. There is some
work to be done first with locations and start/end
information in the SeqRecord.

See also the CompoundLocation discussion.

Peter

From mjldehoon at yahoo.com  Wed Feb  6 03:36:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 6 Feb 2013 00:36:26 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
Message-ID: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Lenna,

Thanks for your reply.
Are you planning to continue your work on the PLY-based mmCIF parser?

Best,
-Michiel

--- On Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com> wrote:

From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
Date: Tuesday, February 5, 2013, 8:31 PM

Hi Michiel,?
I worked on that a bit early last year. See thread on this bug:?
https://redmine.open-bio.org/issues/2619


Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.?
I also started work on a PLY-based pure Python reimplementation. Pull request here:


https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember...


Cheers,
Lenna

On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:


With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:


http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html


Or is anybody else already looking at this module?


Best,

-Michiel.

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Wed Feb  6 16:39:04 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 6 Feb 2013 21:39:04 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] (New) Bio.Entrez.efetch
	does not respect the API docs / spec on HTTP verb use (GET vs. POST)
Message-ID: <redmine.issue-3411.20130206213904@redmine.open-bio.org>


Issue #3411 has been reported by Tom McCoy.

----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb  7 05:20:30 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 7 Feb 2013 10:20:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not
	respect the API docs / spec on HTTP verb use (GET vs. POST)
References: <redmine.issue-3411.20130206213904@redmine.open-bio.org>
Message-ID: <redmine.journal-15092.20130207102030@redmine.open-bio.org>


Issue #3411 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

I don't recall that guideline being in the earlier requirements/documentation when Bio.Entrez was first written, but the fix proposed looks sensible.

(Note - do we need to worry about the ids being a string or a list at that point, and therefore how to count the entries?)

P.S. Resetting assignee to default of the dev mailing list.
----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Feb  7 06:33:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:33:25 +0000
Subject: [Biopython-dev] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CAKVJ-_7bXcXkxFQ9Xx0W3CDwd_QzhYiRKpsFHGL9n5YoSFDtXQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.

For those of you wanting to try Biopython on Python 3.3 on Windows,
there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2.

NumPy 1.7 is their first release to support Python 3.3, and the
official release is expected to be near-identical to this second
release candidate, see:
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html

Regards,

Peter


From p.j.a.cock at googlemail.com  Thu Feb  7 06:53:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:53:40 +0000
Subject: [Biopython-dev] [Biopython] Fwd: Bug in bgzf module
In-Reply-To: <CANJ6P8LT9VoanMszVO=aEeFHYD5AzDTJUoDxoEKyJuUxJ6Dx4g@mail.gmail.com>
References: <CANJ6P8KTPF0DCoOGvFfVAXQkwJtZezncpr4HDDTYn4HAQJjUnQ@mail.gmail.com>
	<CANJ6P8LBkbR89pROYfka4P82TFAPvdLSOiwjEr3gxNgxx=wghw@mail.gmail.com>
	<CAKVJ-_48X3YGXN7ky+LmtQf8YFyscm6e0wtJWZs2ZM8yLyj3Bg@mail.gmail.com>
	<CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
	<CAKVJ-_6tvf3U3MJp0O2Cd6sPsgoMP_yaQtWnkG3yro5vsoXneA@mail.gmail.com>
	<CANJ6P8LT9VoanMszVO=aEeFHYD5AzDTJUoDxoEKyJuUxJ6Dx4g@mail.gmail.com>
Message-ID: <CAKVJ-_7=k-MNcJoVkQuuV4fsSK5rSsjsMnMUxnsj65WQkR6fGQ@mail.gmail.com>

On Wed, Feb 6, 2013 at 10:35 PM, Petra Kubincov?
<petra.kubincova at gmail.com> wrote:
> Hi Peter,
>
> based on your unit test for tell method I've created this:
> http://dl.dropbox.com/u/...
> I hope it's at least partially usable.
>
> Regards,
> Petra

Thanks, I turned that into this commit:
https://github.com/biopython/biopython/commit/194bda7cd4bc292b37fd219f1f95a19e1316ac5a

That lead me to notice a special case with offsets on a block
boundary, see this fix and test:
https://github.com/biopython/biopython/commit/fef7659dacaf93ddeb6270103d8ded6fb89414b7

Peter


From p.j.a.cock at googlemail.com  Thu Feb  7 08:30:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 13:30:31 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
	<CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
Message-ID: <CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
>>
>>> Kai - would you mind retesting with f_loc5 (the rebased branch)?
>>
>> The location of the feature that caused trouble for me still looks
>> correct. I'm currently running some more sequences, but I'm pretty
>> confident that the code will work just fine. The tests I added to the
>> genbank parser code for all the problem cases I had pass, after all. :)
>>
>>> Everyone - does it seem sensible to include this now, ready for the
>>> upcoming release (*)? Or perhaps just after the release?
>>
>> I'd perfer having this in the next release if possible, but of course
>> if the release after that is coming up within a reasonable time frame,
>> that would work as well.
>>
>> Cheers,
>> Kai
>
> Unless anyone objects, I will apply the (rebased) version of this
> f_loc4 / f_loc5 branch later this week (now that Biopython 1.61
> is out).
>
> This replaces the SeqFeature use of sub_features with a new
> CompoundLocation which I think is a far more natural way to
> handle join locations in EMBL/GenBank files.
>
> Also, it means we can offer parsing of GenBank/EMBL style
> location lines into (Compound)Location objects directly :)
>
> Regards,
>
> Peter

Applied to master,
https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b

Peter

From kai.blin at biotech.uni-tuebingen.de  Thu Feb  7 09:47:37 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 07 Feb 2013 15:47:37 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
	<CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
	<CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>
Message-ID: <5113BE89.3050303@biotech.uni-tuebingen.de>

On 2013-02-07 14:30, Peter Cock wrote:

Hi Peter,

> Applied to master, 
> https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b

Thanks for that.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From arklenna at gmail.com  Thu Feb  7 13:21:37 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 7 Feb 2013 13:21:37 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
	<1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>

Hi Michiel,

If there are well-defined problems with the PLY parser, I can work on
fixing them. I am not currently working with mmCIF so I am not in the best
position to evaluate where and how the parser needs to be improved.

I am working with X-ray PDB files and I am not sure if my collaborators are
familiar with mmCIF. I have not dealt with NMR files of any type, either.

Cheers,

Lenna


On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Hi Lenna,
>
> Thanks for your reply.
> Are you planning to continue your work on the PLY-based mmCIF parser?
>
> Best,
> -Michiel
>
> --- On *Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com>* wrote:
>
>
> From: Lenna Peterson <arklenna at gmail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, February 5, 2013, 8:31 PM
>
>
> Hi Michiel,
>
> I worked on that a bit early last year. See thread on this bug:
>
> https://redmine.open-bio.org/issues/2619
>
> Namely, I determined that the flex headers aren't required to compile the
> flex-generated C, which is a great start.
>
> I also started work on a PLY-based pure Python reimplementation. Pull
> request here:
>
> https://github.com/biopython/biopython/pull/33
>
> I haven't looked at this code in quite a long time! Let me know if you
> have any questions about what I did and I will do my best to remember...
>
> Cheers,
>
> Lenna
>
>
> On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com<http://mc/compose?to=mjldehoon at yahoo.com>
> > wrote:
>
> With Biopython 1.61 now out, perhaps this is a good time to tackle
> Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like
> to replace this with a plain C module, or perhaps with a pure-Python
> parser. This issue was previously discussed here:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html
>
> Or is anybody else already looking at this module?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org<http://mc/compose?to=Biopython-dev at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>

From anaryin at gmail.com  Thu Feb  7 13:25:37 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Thu, 7 Feb 2013 19:25:37 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
References: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
	<1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
Message-ID: <CAJ9sUYPddfAm-Zp7Wr7y2k1=pqj3wcY5E68Ah1W2P+sL_ojSPQ@mail.gmail.com>

Hi,

In our NMR lab I am pretty sure mmCIF files are not even known.. How widely
used is the format in x-ray labs? I have never seen it outside this mailing
list to be honest.

Best,

Jo?o


From p.j.a.cock at googlemail.com  Fri Feb  8 10:21:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 8 Feb 2013 15:21:46 +0000
Subject: [Biopython-dev] Fwd: [biopython] Newick parser (#156)
In-Reply-To: <biopython/biopython/pull/156@github.com>
References: <biopython/biopython/pull/156@github.com>
Message-ID: <CAKVJ-_5q-n8qvpumBc6OZkAaNDLV4o41LBr+WsWOzGxP0SZV8A@mail.gmail.com>

Eric,

Could you take a look at this please?

Thanks,

Peter

---------- Forwarded message ----------
From: Ben Morris <notifications at github.com>
Date: Fri, Feb 8, 2013 at 3:12 PM
Subject: [biopython] Newick parser (#156)
To: biopython/biopython <biopython at noreply.github.com>


In light of three issues with the Newick parser:

https://redmine.open-bio.org/issues/3409
https://redmine.open-bio.org/issues/3386
https://redmine.open-bio.org/issues/3407

this is a rewrite of the parser from scratch. It supports quoted node
labels and can handle support values either as they were previously handled
or from square-bracketed comments, as requested by Arlin. Additionally,
it's consistently quite fast:

[image: newick_parse_times]<https://f.cloud.github.com/assets/544977/139616/fac0df38-71fe-11e2-91a8-a95ba7c6340b.png>

The unit tests still pass with these changes, and I'm now able to parse
trees that previously raised exceptions.
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/bendmorris/biopython newick

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/156
Commit Summary

   - A more efficient implementation of a Newick parser (linear time vs.
   quadratic) that makes only a single pass over the text and handles quoted
   labels correctly.
   - Implementing support values and fixing issue when external parentheses
   are missing.

File Changes

   - *M* Bio/Phylo/NewickIO.py<https://github.com/biopython/biopython/pull/156/files#diff-0>(198)

Patch Links:

   - https://github.com/biopython/biopython/pull/156.patch
   - https://github.com/biopython/biopython/pull/156.diff

From mjldehoon at yahoo.com  Fri Feb  8 20:42:23 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 8 Feb 2013 17:42:23 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
Message-ID: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

--- On Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com> wrote:
> If
 there are well-defined problems with the PLY parser, I can work on
> fixing them. I am not currently working with mmCIF so I am not in the
> best position to evaluate where and how the parser needs to be improved.
 

I don't know of any problems with the PLY parser, but since it relies on PLY, it would add another dependency to Biopython. On the other hand, a pure-Python solution may be preferable, as it's easier to maintain and runs with Jython. The C implementation is considerably faster, but I doubt that it really matters since the Python (PLY) parser seems to be fast enough.

I see three options then:
1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining C code to Python.
2) Remove the PLY dependency from the PLY-based parser.
3) Write a new pure-Python parser from scratch.

I'm guessing that 1) will be the most straightforward. Other opinions?

Best,
-Michiel.


--- On Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com> wrote:
If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. 


I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either.?
Cheers,

Lenna

On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

Hi Lenna,

Thanks for your reply.

Are you planning to continue your work on the PLY-based mmCIF parser?

Best,
-Michiel

--- On Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com> wrote:


From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)

To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>

Date: Tuesday, February 5, 2013, 8:31 PM

Hi Michiel,?
I worked on that a bit early last year. See thread on this bug:?
https://redmine.open-bio.org/issues/2619


Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.?
I also started work on a PLY-based pure Python reimplementation. Pull request here:


https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember...


Cheers,
Lenna

On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:


With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:


http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html


Or is anybody else already looking at this module?


Best,

-Michiel.

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Sat Feb  9 03:22:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sat, 9 Feb 2013 08:22:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not
	respect the API docs / spec on HTTP verb use (GET vs. POST)
References: <redmine.issue-3411.20130206213904@redmine.open-bio.org>
Message-ID: <redmine.journal-15095.20130209082231@redmine.open-bio.org>


Issue #3411 has been updated by Michiel de Hoon.


Fixed (using a slightly different code); see revision f1836165.
----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Sat Feb  9 06:53:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 11:53:31 +0000
Subject: [Biopython-dev] Deprecating Bio.Index?
Message-ID: <CAKVJ-_67V+7u47d4shLbkC79tdj78SU6H2Q_SGZ8ztYNOcmhUg@mail.gmail.com>

Hello all,

Does anyone still use Bio.Index? I don't think any of Biopython
itself does nowadays, so perhaps we can deprecate this?

https://github.com/biopython/biopython/blob/master/Bio/Index.py

(We should of course ask on the main list first just in case)

Regards,

Peter

From colin.aibn at gmail.com  Sat Feb  9 08:06:13 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sat, 9 Feb 2013 23:06:13 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
Message-ID: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>

Hi everyone,
                  I have a question about the implementation of
high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
output file in XML format I am parsing and this is one of the hits (removed
the alignment details to save space):

        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
          <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
          <Hit_accession>111</Hit_accession>
          <Hit_len>1893</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>3352.79</Hsp_bit-score>
              <Hsp_score>1815</Hsp_score>
              <Hsp_evalue>0</Hsp_evalue>
              <Hsp_query-from>1</Hsp_query-from>
              <Hsp_query-to>1893</Hsp_query-to>
              <Hsp_hit-from>1</Hsp_hit-from>
              <Hsp_hit-to>1893</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>1867</Hsp_identity>
              <Hsp_positive>1867</Hsp_positive>
              <Hsp_gaps>0</Hsp_gaps>
            </Hsp>
            <Hsp>
              <Hsp_num>2</Hsp_num>
              <Hsp_bit-score>399.997</Hsp_bit-score>
              <Hsp_score>216</Hsp_score>
              <Hsp_evalue>2.88061e-111</Hsp_evalue>
              <Hsp_query-from>331</Hsp_query-from>
              <Hsp_query-to>881</Hsp_query-to>
              <Hsp_hit-from>22</Hsp_hit-from>
              <Hsp_hit-to>581</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>452</Hsp_identity>
              <Hsp_positive>452</Hsp_positive>
              <Hsp_gaps>19</Hsp_gaps>
              <Hsp_align-len>565</Hsp_align-len>
            </Hsp>

Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
"Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
the BlastResult, both values are equal to 0:

>>> blast_record[0][0].query_start
0
>>> blast_record[0][0].hit_start
0

However, when I access the end objects for the query and hit, the result
isn't 1892 (zero based 1893) but 1893:

>>> blast_record[0][0].query_end
1893
>>> blast_record[0][0].hit_end
1893

Is this correct? I find it a little confusing that one result is zero-based
and the other one-based.

Thanks
Colin

From p.j.a.cock at googlemail.com  Sat Feb  9 08:16:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 13:16:43 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
Message-ID: <CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>

On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> Hi everyone,
>                   I have a question about the implementation of
> high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> output file in XML format I am parsing and this is one of the hits (removed
> the alignment details to save space):
>
>         <Hit>
>           <Hit_num>1</Hit_num>
>           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
>           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
>           <Hit_accession>111</Hit_accession>
>           <Hit_len>1893</Hit_len>
>           <Hit_hsps>
>             <Hsp>
>               <Hsp_num>1</Hsp_num>
>               <Hsp_bit-score>3352.79</Hsp_bit-score>
>               <Hsp_score>1815</Hsp_score>
>               <Hsp_evalue>0</Hsp_evalue>
>               <Hsp_query-from>1</Hsp_query-from>
>               <Hsp_query-to>1893</Hsp_query-to>
>               <Hsp_hit-from>1</Hsp_hit-from>
>               <Hsp_hit-to>1893</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>1867</Hsp_identity>
>               <Hsp_positive>1867</Hsp_positive>
>               <Hsp_gaps>0</Hsp_gaps>
>             </Hsp>
>             <Hsp>
>               <Hsp_num>2</Hsp_num>
>               <Hsp_bit-score>399.997</Hsp_bit-score>
>               <Hsp_score>216</Hsp_score>
>               <Hsp_evalue>2.88061e-111</Hsp_evalue>
>               <Hsp_query-from>331</Hsp_query-from>
>               <Hsp_query-to>881</Hsp_query-to>
>               <Hsp_hit-from>22</Hsp_hit-from>
>               <Hsp_hit-to>581</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>452</Hsp_identity>
>               <Hsp_positive>452</Hsp_positive>
>               <Hsp_gaps>19</Hsp_gaps>
>               <Hsp_align-len>565</Hsp_align-len>
>             </Hsp>
>
> Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
> the BlastResult, both values are equal to 0:
>
>>>> blast_record[0][0].query_start
> 0
>>>> blast_record[0][0].hit_start
> 0
>
> However, when I access the end objects for the query and hit, the result
> isn't 1892 (zero based 1893) but 1893:
>
>>>> blast_record[0][0].query_end
> 1893
>>>> blast_record[0][0].hit_end
> 1893
>
> Is this correct? I find it a little confusing that one result is zero-based
> and the other one-based.
>
> Thanks
> Colin

Hi Colin,

The SearchIO positions like elsewhere in Biopython should be
using Python style counting. Looking at this one:

               <Hsp_hit-from>1</Hsp_hit-from>
               <Hsp_hit-to>1893</Hsp_hit-to>

That is like a GenBank/EMBL location 1..1893 which in Python string
slicing is [0:1893], so the start has -1 but the end is unchanged. The
nice thing is the length is 1893 and is given as the difference of the
Python slicing style end and start.

Perhaps we need to work on the help text? Any suggestions?

Thanks,

Peter

From colin.aibn at gmail.com  Sat Feb  9 08:54:42 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sat, 9 Feb 2013 23:54:42 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
Message-ID: <CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>

Hi Peter,
             Thanks for getting back to me so quickly.

I'm curious about the benefits of having these values in Python string
slicing format? I haven't come across this very often, I'm used to seeing
values systematically zero or one-based.

Would it be easier to keep the range variables hit_range and hit_range_all
in slicing format and the start and end variables in sequence position
format so that they represent the actual BLAST results?

I had a look at some of the code and I can't see the slicing format
mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be
helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end,
query_start, and query_end so that if people are interested they can have a
look at the files and see what they mean.

Thanks
Colin


On Sat, Feb 9, 2013 at 11:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> > Hi everyone,
> >                   I have a question about the implementation of
> > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> > output file in XML format I am parsing and this is one of the hits
> (removed
> > the alignment details to save space):
> >
> >         <Hit>
> >           <Hit_num>1</Hit_num>
> >           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
> >           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
> >           <Hit_accession>111</Hit_accession>
> >           <Hit_len>1893</Hit_len>
> >           <Hit_hsps>
> >             <Hsp>
> >               <Hsp_num>1</Hsp_num>
> >               <Hsp_bit-score>3352.79</Hsp_bit-score>
> >               <Hsp_score>1815</Hsp_score>
> >               <Hsp_evalue>0</Hsp_evalue>
> >               <Hsp_query-from>1</Hsp_query-from>
> >               <Hsp_query-to>1893</Hsp_query-to>
> >               <Hsp_hit-from>1</Hsp_hit-from>
> >               <Hsp_hit-to>1893</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>1867</Hsp_identity>
> >               <Hsp_positive>1867</Hsp_positive>
> >               <Hsp_gaps>0</Hsp_gaps>
> >             </Hsp>
> >             <Hsp>
> >               <Hsp_num>2</Hsp_num>
> >               <Hsp_bit-score>399.997</Hsp_bit-score>
> >               <Hsp_score>216</Hsp_score>
> >               <Hsp_evalue>2.88061e-111</Hsp_evalue>
> >               <Hsp_query-from>331</Hsp_query-from>
> >               <Hsp_query-to>881</Hsp_query-to>
> >               <Hsp_hit-from>22</Hsp_hit-from>
> >               <Hsp_hit-to>581</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>452</Hsp_identity>
> >               <Hsp_positive>452</Hsp_positive>
> >               <Hsp_gaps>19</Hsp_gaps>
> >               <Hsp_align-len>565</Hsp_align-len>
> >             </Hsp>
> >
> > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects
> from
> > the BlastResult, both values are equal to 0:
> >
> >>>> blast_record[0][0].query_start
> > 0
> >>>> blast_record[0][0].hit_start
> > 0
> >
> > However, when I access the end objects for the query and hit, the result
> > isn't 1892 (zero based 1893) but 1893:
> >
> >>>> blast_record[0][0].query_end
> > 1893
> >>>> blast_record[0][0].hit_end
> > 1893
> >
> > Is this correct? I find it a little confusing that one result is
> zero-based
> > and the other one-based.
> >
> > Thanks
> > Colin
>
> Hi Colin,
>
> The SearchIO positions like elsewhere in Biopython should be
> using Python style counting. Looking at this one:
>
>                <Hsp_hit-from>1</Hsp_hit-from>
>                <Hsp_hit-to>1893</Hsp_hit-to>
>
> That is like a GenBank/EMBL location 1..1893 which in Python string
> slicing is [0:1893], so the start has -1 but the end is unchanged. The
> nice thing is the length is 1893 and is given as the difference of the
> Python slicing style end and start.
>
> Perhaps we need to work on the help text? Any suggestions?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From p.j.a.cock at googlemail.com  Sat Feb  9 09:30:26 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 14:30:26 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
Message-ID: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>

On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> Hi Peter,
>              Thanks for getting back to me so quickly.
>

Thank you - the main reason for including SearchIO in Biopython 1.61
as 'experimental code' is to get wider testing and feedback (hopefully
an approach that will work well and we can use this more in future for
other new code).

> I'm curious about the benefits of having these values in Python string
> slicing format? I haven't come across this very often, I'm used to seeing
> values systematically zero or one-based.

Once you're used to Python slicing it becomes very natural.

> Would it be easier to keep the range variables hit_range and hit_range_all
> in slicing format and the start and end variables in sequence position
> format so that they represent the actual BLAST results?

One reason for this is to be consistent across all the formats supported
in SearchIO, and since Biopython is a Python library following Python
norms seems most natural.

> I had a look at some of the code and I can't see the slicing format
> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be
> helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end,
> query_start, and query_end so that if people are interested they can have a
> look at the files and see what they mean.
>
> Thanks
> Colin

OK, so some clarification with examples in the docstrings is needed.
How about the Tutorial chapter?

Thanks,

Peter

From chapmanb at 50mail.com  Sat Feb  9 09:43:26 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 09 Feb 2013 09:43:26 -0500
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
Message-ID: <87a9rdy2cx.fsf@fastmail.fm>


Colin;

>> I'm curious about the benefits of having these values in Python string
>> slicing format? I haven't come across this very often, I'm used to seeing
>> values systematically zero or one-based.

To clarify further in addition to Peter's response, the 0-based
half-open and 1-based closed systems are the two systems you're
referring to. Python, and most programming languages, use the 0-based
half open indexing approach which is what SearchIO is converting to.
Aaron has a nice response on BioStars while explains the differences in
more details:

http://www.biostars.org/p/6373/#6377

Brad

From colin.aibn at gmail.com  Sat Feb  9 10:18:33 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 01:18:33 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <87a9rdy2cx.fsf@fastmail.fm>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<87a9rdy2cx.fsf@fastmail.fm>
Message-ID: <CAF++dEc0-+QQXxPgiHf9WY=6oK74WjzW9od2E=q_A_xvK67PQQ@mail.gmail.com>

Interesting commentary from Edsger Dijkstra as well:

http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF

if possible, I would definitely add some of these links to either the
tutorial or the code

Colin


On Sun, Feb 10, 2013 at 12:43 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Colin;
>
> >> I'm curious about the benefits of having these values in Python string
> >> slicing format? I haven't come across this very often, I'm used to
> seeing
> >> values systematically zero or one-based.
>
> To clarify further in addition to Peter's response, the 0-based
> half-open and 1-based closed systems are the two systems you're
> referring to. Python, and most programming languages, use the 0-based
> half open indexing approach which is what SearchIO is converting to.
> Aaron has a nice response on BioStars while explains the differences in
> more details:
>
> http://www.biostars.org/p/6373/#6377
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From Markus.Piotrowski at ruhr-uni-bochum.de  Sat Feb  9 10:12:12 2013
From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski)
Date: 9 Feb 2013 16:12:12 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
Message-ID: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>

Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the 
xml result. So query_start and sbjct_start (BTW, not hit_start) return 
the values from <Hsp_query-from> and <Hsp_hit-from>.
Thus, my first guess would be that a search function that can return an 
entity 'query_start' will return the value that is written in the file.

Markus

Am 2013-02-09 15:30, schrieb Peter Cock:
> On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer <colin.aibn at gmail.com> 
> wrote:
>> Hi Peter,
>>              Thanks for getting back to me so quickly.
>>
>
> Thank you - the main reason for including SearchIO in Biopython 1.61
> as 'experimental code' is to get wider testing and feedback 
> (hopefully
> an approach that will work well and we can use this more in future 
> for
> other new code).
>
>> I'm curious about the benefits of having these values in Python 
>> string
>> slicing format? I haven't come across this very often, I'm used to 
>> seeing
>> values systematically zero or one-based.
>
> Once you're used to Python slicing it becomes very natural.
>
>> Would it be easier to keep the range variables hit_range and 
>> hit_range_all
>> in slicing format and the start and end variables in sequence 
>> position
>> format so that they represent the actual BLAST results?
>
> One reason for this is to be consistent across all the formats 
> supported
> in SearchIO, and since Biopython is a Python library following Python
> norms seems most natural.
>
>> I had a look at some of the code and I can't see the slicing format
>> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would 
>> probably be
>> helpful to explain the values in Hsp.py as a ** mark on hsp_start, 
>> hsp_end,
>> query_start, and query_end so that if people are interested they can 
>> have a
>> look at the files and see what they mean.
>>
>> Thanks
>> Colin
>
> OK, so some clarification with examples in the docstrings is needed.
> How about the Tutorial chapter?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From colin.aibn at gmail.com  Sat Feb  9 10:19:26 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 01:19:26 +1000
Subject: [Biopython-dev] Fwd:  SearchIO HSP indexing
In-Reply-To: <CAF++dEeHsJJDfYkQs93hmb8mfATfNJs4dX_4pn8=jECEEF0wUQ@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<CAF++dEeHsJJDfYkQs93hmb8mfATfNJs4dX_4pn8=jECEEF0wUQ@mail.gmail.com>
Message-ID: <CAF++dEd5vRbsX6PC8Ng8SbEqLEe90=+EWn45F1Jn+wG7VzV-Cg@mail.gmail.com>

> Hi Peter,

>  >              Thanks for getting back to me so quickly.
> >
>
> Thank you - the main reason for including SearchIO in Biopython 1.61
> as 'experimental code' is to get wider testing and feedback (hopefully
> an approach that will work well and we can use this more in future for
> other new code).
>
>
I've been using it for a couple months now and i definitely prefer it over
the existing parser.


>  > I'm curious about the benefits of having these values in Python string
> > slicing format? I haven't come across this very often, I'm used to seeing
> > values systematically zero or one-based.
>
> Once you're used to Python slicing it becomes very natural.
>
>
> Would it be easier to keep the range variables hit_range and hit_range_all
> > in slicing format and the start and end variables in sequence position
> > format so that they represent the actual BLAST results?
>
> One reason for this is to be consistent across all the formats supported
> in SearchIO, and since Biopython is a Python library following Python
> norms seems most natural.
>
> > I had a look at some of the code and I can't see the slicing format
> > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably
> be
> > helpful to explain the values in Hsp.py as a ** mark on hsp_start,
> hsp_end,
> > query_start, and query_end so that if people are interested they can
> have a
> > look at the files and see what they mean.
> >
> > Thanks
> > Colin
>
> OK, so some clarification with examples in the docstrings is needed.
> How about the Tutorial chapter?
>
> I would definitely add comments to the Hsp.py file and if there is a
tutorial that people use, I would also update that as that would be the
first place most people would look.

I was wondering if there was any code in SearchIO to align high-scoring
segment pairs against the same hit? I see the fragmentation code but that
seems specific to BLAT results and when I look at the HSPFragments in the
QueryResult object it does not seem to combine multiple HSPs against the
same hit even if they are not overlapping.

Thanks
Colin

From p.j.a.cock at googlemail.com  Sat Feb  9 10:36:34 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 15:36:34 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
Message-ID: <CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>

> Am 2013-02-09 15:30, schrieb Peter Cock:
>> One reason for this is to be consistent across all the formats supported
>> in SearchIO, and since Biopython is a Python library following Python
>> norms seems most natural.

On Sat, Feb 9, 2013 at 3:12 PM, Markus Piotrowski
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml
> result.

Yes, the old Bio.Blast parsers do not try and convert the co-ordinates.
Given they were only handling BLAST output that was a justifiable
option. With Bio.SearchIO we're not just modelling BLAST output
though - it covers multiple formats with different conventions.

Peter

From w.arindrarto at gmail.com  Sat Feb  9 11:56:46 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 9 Feb 2013 17:56:46 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
Message-ID: <CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>

Hi everyone,

Colin, thanks for the feedback! Peter has explained the rationale
behind the decision, so I would like to add that there has been indeed
an explanation of this behavior in the tutorial
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and
the code (https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100).
I do admit that the explanation in the code could be made clearer with
some comments in hsp.py ~ which I can add :).

As for your point about the alignment code:

> I was wondering if there was any code in SearchIO to align high-scoring
> segment pairs against the same hit? I see the fragmentation code but that
> seems specific to BLAT results and when I look at the HSPFragments in the
> QueryResult object it does not seem to combine multiple HSPs against the
> same hit even if they are not overlapping.

SearchIO relies on BLAST to do this ~ which has already grouped each
HSP aligning to the same database sequence in one group (all of which
is accessible through the Hit object). I've always assumed that if two
HSPs came from the same database entry (Hit), they are grouped into
one Hit by BLAST, regardless of whether they overlap or not. Have you
seen any results from BLAST that shows otherwise?

cheers,
Bow

From arklenna at gmail.com  Sat Feb  9 12:14:01 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Sat, 9 Feb 2013 12:14:01 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
	<1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CALfq9tKESkfdBfBFY7RWMF0tpo+cPaX7pcrwQg-OVzmNZvtxTA@mail.gmail.com>

On Fri, Feb 8, 2013 at 8:42 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Hi Lenna,
>
>
> --- On *Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com>* wrote:
> > If there are well-defined problems with the PLY parser, I can work on
> > fixing them. I am not currently working with mmCIF so I am not in the
> > best position to evaluate where and how the parser needs to be improved.
>
> I don't know of any problems with the PLY parser, but since it relies on
> PLY, it would add another dependency to Biopython.
>


> On the other hand, a pure-Python solution may be preferable, as it's
> easier to maintain and runs with Jython.
>


As far as I can tell, PLY works with Jython, discussion on this thread:
http://permalink.gmane.org/gmane.comp.python.ply/402

Not sure about pypy. One option would be to deploy the PLY parser for
non-CPython platforms and tell them to manually install PLY if they want to
use mmCIF. Not ideal, but is that preferred to an explicit dependency?


>
> I see three options then:
> 1) Remove the lex stuff from lex.yy.c, and optionally convert the
> remaining C code to Python.
>

As is, the C compiles cross platform with no dependencies. There is nothing
but lex stuff in lex.yy.c - I'm not quite sure what you mean here.


> 2) Remove the PLY dependency from the PLY-based parser.
> 3) Write a new pure-Python parser from scratch.
>
>
I'm not sure whether there is an appreciable difference between options 2
and 3.

Cheers,

Lenna

From mjldehoon at yahoo.com  Sat Feb  9 22:55:37 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 9 Feb 2013 19:55:37 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tL7caRXcM1BmW-ksphdDP6E=J0bYE-x-rmYVbHtV+CCsA@mail.gmail.com>
Message-ID: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

>--- On Sat, 2/9/13, Lenna Peterson <lennalenna at gmail.com> wrote:
> > 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining
> >? C code to Python.

> As is, the C?compiles cross platform with?no?dependencies.?There is nothing
> but?lex stuff in lex.yy.c - I'm not quite sure what you mean here. 

Currently lex.yy.c contains lots of code that is generated automatically by lex but is not actually needed for the mmCIF parser. I was thinking to remove those parts, and to clean up the remainder so that the code is understandable (allowing us to fix any bugs, or to convert it to pure Python).

Best,
-Michiel


From colin.aibn at gmail.com  Sun Feb 10 02:28:36 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 17:28:36 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
	<CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
Message-ID: <CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>

On Sun, Feb 10, 2013 at 2:56 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> Colin, thanks for the feedback! Peter has explained the rationale
> behind the decision, so I would like to add that there has been indeed
> an explanation of this behavior in the tutorial
> (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and
> the code (
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100
> ).
> I do admit that the explanation in the code could be made clearer with
> some comments in hsp.py ~ which I can add :).
>
> As for your point about the alignment code:
>
> > I was wondering if there was any code in SearchIO to align high-scoring
> > segment pairs against the same hit? I see the fragmentation code but that
> > seems specific to BLAT results and when I look at the HSPFragments in the
> > QueryResult object it does not seem to combine multiple HSPs against the
> > same hit even if they are not overlapping.
>
> SearchIO relies on BLAST to do this ~ which has already grouped each
> HSP aligning to the same database sequence in one group (all of which
> is accessible through the Hit object). I've always assumed that if two
> HSPs came from the same database entry (Hit), they are grouped into
> one Hit by BLAST, regardless of whether they overlap or not. Have you
> seen any results from BLAST that shows otherwise?
>
>
I have a couple of examples where BLAST doesn't combine the HSPs as you
would expect. It seems to mainly occur because the HSP alignments overlap
and to combine them would mean including more gaps in each hsp. For
example, *ftsK* in *E. coli* (ftsK.blast) or *aceF* in *E. coli* (aceF.blast).
In the second case, the first HSP spans the entire query and there are two
additional HSPs that are overlapped by it.

I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the
HSPs somewhat when required but some people are hesitant to use their
method in certain situations (e.g., with tblastn results that overestimate
some of the metrics). They also implement additional functionality so that
the user could do a complete smith-waterman alignment if they wanted to.

Thanks
Colin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aceF.blast
Type: application/octet-stream
Size: 12124 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130210/34bb5bfb/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ftsK.blast
Type: application/octet-stream
Size: 18537 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130210/34bb5bfb/attachment-0003.obj>

From w.arindrarto at gmail.com  Sun Feb 10 10:31:51 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 10 Feb 2013 16:31:51 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
	<CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
	<CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>
Message-ID: <CADEGkF7F=oUSwvr1WrqdZuqB8w7eR25c9hytO4nWPyuU52=tvw@mail.gmail.com>

Hi Colin,

>> As for your point about the alignment code:
>>
>> > I was wondering if there was any code in SearchIO to align high-scoring
>> > segment pairs against the same hit? I see the fragmentation code but
>> > that
>> > seems specific to BLAT results and when I look at the HSPFragments in
>> > the
>> > QueryResult object it does not seem to combine multiple HSPs against the
>> > same hit even if they are not overlapping.
>>
>> SearchIO relies on BLAST to do this ~ which has already grouped each
>> HSP aligning to the same database sequence in one group (all of which
>> is accessible through the Hit object). I've always assumed that if two
>> HSPs came from the same database entry (Hit), they are grouped into
>> one Hit by BLAST, regardless of whether they overlap or not. Have you
>> seen any results from BLAST that shows otherwise?
>>
>
> I have a couple of examples where BLAST doesn't combine the HSPs as you
> would expect. It seems to mainly occur because the HSP alignments overlap
> and to combine them would mean including more gaps in each hsp. For example,
> ftsK in E. coli (ftsK.blast) or aceF in E. coli (aceF.blast). In the second
> case, the first HSP spans the entire query and there are two additional HSPs
> that are overlapped by it.
>
> I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the
> HSPs somewhat when required but some people are hesitant to use their method
> in certain situations (e.g., with tblastn results that overestimate some of
> the metrics). They also implement additional functionality so that the user
> could do a complete smith-waterman alignment if they wanted to.

Thanks for including the files!

At the moment, no, SearchIO doesn't have any code to 'assemble'/'tile'
overlapping HSPs. The fragment bits you're seeing in the BLAT parser
is simply the name we use to refer to noncontiguous blocks inside a
reported HSP.

We may be able to add some functions to return the intervals for such
overlapping HSPs, given a Hit object. But I'm a bit hesitant to go
further than that (i.e. to the point where we merge the statistics of
the each HSP to assign to the assembled HSP). This is mostly because
such assembly seems very specific to the program's statistics and
format (BLAST's merge would be different from BLAT? and BLAST XML's
merge may be different from tabular BLAST). If anything, perhaps these
functions deserve their own space in SearchUtils (taking parallels
from Bio.SeqIO and Bio.SeqUtils)?

regards,
Bow

From redmine at redmine.open-bio.org  Sun Feb 10 17:13:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:13:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for
	module NCBIWWW.qblast
Message-ID: <redmine.issue-3412.20130210221320@redmine.open-bio.org>


Issue #3412 has been reported by Vincent Davis.

----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Feb 10 17:13:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:13:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for
	module NCBIWWW.qblast
Message-ID: <redmine.issue-3412.20130210221320@redmine.open-bio.org>


Issue #3412 has been reported by Vincent Davis.

----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Feb 10 17:40:21 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:40:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (Resolved) Bad URL in docs
	for module NCBIWWW.qblast
References: <redmine.issue-3412.20130210221320@redmine.open-bio.org>
Message-ID: <redmine.journal-15096.20130210224021@redmine.open-bio.org>


Issue #3412 has been updated by Peter Cock.

Status changed from New to Resolved
% Done changed from 0 to 100

The NCBI seem to have broken that link, and if they did setup a redirect for a while it has stopped now.

I'll use http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html instead I think,
https://github.com/biopython/biopython/commit/ae84cc8cb828e868883c75a980fcd83585c338f8

Thanks!
----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: Resolved
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From eric.talevich at gmail.com  Sun Feb 10 21:11:54 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 21:11:54 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
Message-ID: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>

Hi Ben,

I've noticed a couple new characteristics of the Newick parser that I had
questions about.

1. There is no longer a way to tell the parser to treat internal node
labels as confidence values. Lots of files in the wild do record the
support values here, including those generated by RAxML, PhyML, FastTree
and MrBayes, so I'd like to restore this option, and perhaps make it the
default. I think the condition is:

if not (self.values_are_confidence or self.comments_are_confidence or
current_clade.is_terminal()): # parse confidence from node label

Is there an easy way to add this option to the parser? I'm trying to get
this to work in the "else" clause in parse_tree, where unquoted node labels
are handled.


2. Confidence values are required to be between 0.0 and 1.0. Also, support
values recorded as integers are treated as percentages and divided by 100
automatically. The phyloXML spec doesn't have this range requirement. RAxML
scales bootstraps to 100, but PhyML records the raw number of supporting
bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
replicates). So, I'd prefer to leave the confidence values as they are,
requiring only that they be numeric. Thoughts?


Thanks,
Eric

From ben at bendmorris.com  Sun Feb 10 21:39:24 2013
From: ben at bendmorris.com (Ben Morris)
Date: Sun, 10 Feb 2013 21:39:24 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
Message-ID: <CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>

On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Hi Ben,
>
> I've noticed a couple new characteristics of the Newick parser that I had
> questions about.
>
> 1. There is no longer a way to tell the parser to treat internal node labels
> as confidence values. Lots of files in the wild do record the support values
> here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> I'd like to restore this option, and perhaps make it the default. I think
> the condition is:
>
> if not (self.values_are_confidence or self.comments_are_confidence or
> current_clade.is_terminal()): # parse confidence from node label
>
> Is there an easy way to add this option to the parser? I'm trying to get
> this to work in the "else" clause in parse_tree, where unquoted node labels
> are handled.
>
>
> 2. Confidence values are required to be between 0.0 and 1.0. Also, support
> values recorded as integers are treated as percentages and divided by 100
> automatically. The phyloXML spec doesn't have this range requirement. RAxML
> scales bootstraps to 100, but PhyML records the raw number of supporting
> bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> replicates). So, I'd prefer to leave the confidence values as they are,
> requiring only that they be numeric. Thoughts?
>
>
> Thanks,
> Eric

1. One issue is that current_clade.is_terminal() will always be true
at that point because current_clade's children haven't been parsed
yet. Putting the check in the "process_clade" function (which is
called when the closing paren is hit, and therefore all children
should have been parsed) should fix this.

So, if values_are_confidence and comments_are_confidence are both
false and a node label is numeric, it should be treated as confidence,
and clade.name should be set to None - is that correct?

2. This should be as simple as removing current lines 123-127.

~Ben

From eric.talevich at gmail.com  Sun Feb 10 22:30:47 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 22:30:47 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
Message-ID: <CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>

On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi Ben,
> >
> > I've noticed a couple new characteristics of the Newick parser that I had
> > questions about.
> >
> > 1. There is no longer a way to tell the parser to treat internal node
> labels
> > as confidence values. Lots of files in the wild do record the support
> values
> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> > I'd like to restore this option, and perhaps make it the default. I think
> > the condition is:
> >
> > if not (self.values_are_confidence or self.comments_are_confidence or
> > current_clade.is_terminal()): # parse confidence from node label
> >
> > Is there an easy way to add this option to the parser? I'm trying to get
> > this to work in the "else" clause in parse_tree, where unquoted node
> labels
> > are handled.
> >
> >
> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> support
> > values recorded as integers are treated as percentages and divided by 100
> > automatically. The phyloXML spec doesn't have this range requirement.
> RAxML
> > scales bootstraps to 100, but PhyML records the raw number of supporting
> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> > replicates). So, I'd prefer to leave the confidence values as they are,
> > requiring only that they be numeric. Thoughts?
> >
> >
> > Thanks,
> > Eric
>
> 1. One issue is that current_clade.is_terminal() will always be true
> at that point because current_clade's children haven't been parsed
> yet. Putting the check in the "process_clade" function (which is
> called when the closing paren is hit, and therefore all children
> should have been parsed) should fix this.
>
> So, if values_are_confidence and comments_are_confidence are both
> false and a node label is numeric, it should be treated as confidence,
> and clade.name should be set to None - is that correct?
>
> 2. This should be as simple as removing current lines 123-127.
>
> ~Ben
>


Thanks. Here's #2:
https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a

I agree with your assessment of #1, but haven't been able to get it working
yet. I'm leaving Bug #3407 open for now:
https://redmine.open-bio.org/issues/3407

From ben at bendmorris.com  Sun Feb 10 23:04:45 2013
From: ben at bendmorris.com (Ben Morris)
Date: Sun, 10 Feb 2013 23:04:45 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
	<CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
Message-ID: <CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>

On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:
>>
>> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> > Hi Ben,
>> >
>> > I've noticed a couple new characteristics of the Newick parser that I
>> > had
>> > questions about.
>> >
>> > 1. There is no longer a way to tell the parser to treat internal node
>> > labels
>> > as confidence values. Lots of files in the wild do record the support
>> > values
>> > here, including those generated by RAxML, PhyML, FastTree and MrBayes,
>> > so
>> > I'd like to restore this option, and perhaps make it the default. I
>> > think
>> > the condition is:
>> >
>> > if not (self.values_are_confidence or self.comments_are_confidence or
>> > current_clade.is_terminal()): # parse confidence from node label
>> >
>> > Is there an easy way to add this option to the parser? I'm trying to get
>> > this to work in the "else" clause in parse_tree, where unquoted node
>> > labels
>> > are handled.
>> >
>> >
>> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
>> > support
>> > values recorded as integers are treated as percentages and divided by
>> > 100
>> > automatically. The phyloXML spec doesn't have this range requirement.
>> > RAxML
>> > scales bootstraps to 100, but PhyML records the raw number of supporting
>> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
>> > replicates). So, I'd prefer to leave the confidence values as they are,
>> > requiring only that they be numeric. Thoughts?
>> >
>> >
>> > Thanks,
>> > Eric
>>
>> 1. One issue is that current_clade.is_terminal() will always be true
>> at that point because current_clade's children haven't been parsed
>> yet. Putting the check in the "process_clade" function (which is
>> called when the closing paren is hit, and therefore all children
>> should have been parsed) should fix this.
>>
>> So, if values_are_confidence and comments_are_confidence are both
>> false and a node label is numeric, it should be treated as confidence,
>> and clade.name should be set to None - is that correct?
>>
>> 2. This should be as simple as removing current lines 123-127.
>>
>> ~Ben
>
>
>
> Thanks. Here's #2:
> https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a
>
> I agree with your assessment of #1, but haven't been able to get it working
> yet. I'm leaving Bug #3407 open for now:
> https://redmine.open-bio.org/issues/3407
>

I think this should do it:

https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63

I also updated the test case to make sure this is working correctly
and changed the default value of comments_are_confidences from True to
False.

If that looks correct, feel free to pull.

~Ben

From eric.talevich at gmail.com  Sun Feb 10 23:20:20 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 23:20:20 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
	<CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
	<CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>
Message-ID: <CAMC681k3M5QspHJAGHxnZRLWzO46QAvBKpGe=0oFn0i9jYf9wQ@mail.gmail.com>

On Sun, Feb 10, 2013 at 11:04 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:
> >>
> >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com
> >
> >> wrote:
> >> > Hi Ben,
> >> >
> >> > I've noticed a couple new characteristics of the Newick parser that I
> >> > had
> >> > questions about.
> >> >
> >> > 1. There is no longer a way to tell the parser to treat internal node
> >> > labels
> >> > as confidence values. Lots of files in the wild do record the support
> >> > values
> >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes,
> >> > so
> >> > I'd like to restore this option, and perhaps make it the default. I
> >> > think
> >> > the condition is:
> >> >
> >> > if not (self.values_are_confidence or self.comments_are_confidence or
> >> > current_clade.is_terminal()): # parse confidence from node label
> >> >
> >> > Is there an easy way to add this option to the parser? I'm trying to
> get
> >> > this to work in the "else" clause in parse_tree, where unquoted node
> >> > labels
> >> > are handled.
> >> >
> >> >
> >> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> >> > support
> >> > values recorded as integers are treated as percentages and divided by
> >> > 100
> >> > automatically. The phyloXML spec doesn't have this range requirement.
> >> > RAxML
> >> > scales bootstraps to 100, but PhyML records the raw number of
> supporting
> >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> >> > replicates). So, I'd prefer to leave the confidence values as they
> are,
> >> > requiring only that they be numeric. Thoughts?
> >> >
> >> >
> >> > Thanks,
> >> > Eric
> >>
> >> 1. One issue is that current_clade.is_terminal() will always be true
> >> at that point because current_clade's children haven't been parsed
> >> yet. Putting the check in the "process_clade" function (which is
> >> called when the closing paren is hit, and therefore all children
> >> should have been parsed) should fix this.
> >>
> >> So, if values_are_confidence and comments_are_confidence are both
> >> false and a node label is numeric, it should be treated as confidence,
> >> and clade.name should be set to None - is that correct?
> >>
> >> 2. This should be as simple as removing current lines 123-127.
> >>
> >> ~Ben
> >
> >
> >
> > Thanks. Here's #2:
> >
> https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a
> >
> > I agree with your assessment of #1, but haven't been able to get it
> working
> > yet. I'm leaving Bug #3407 open for now:
> > https://redmine.open-bio.org/issues/3407
> >
>
> I think this should do it:
>
>
> https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63
>
> I also updated the test case to make sure this is working correctly
> and changed the default value of comments_are_confidences from True to
> False.
>
> If that looks correct, feel free to pull.
>
> ~Ben
>

Works for me, thanks! I cherry-picked it here:
https://github.com/biopython/biopython/commit/f382f550f49f73301663ad949a6c1e40f5d71c0c

From p.j.a.cock at googlemail.com  Mon Feb 11 06:46:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 11 Feb 2013 11:46:20 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
	<CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
Message-ID: <CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>

On Mon, Jan 7, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> My only significant concern is for Jython users, since this will also
> mean dropping support for Jython 2.5 (which implements the
> Python 2.5 language). The replacement Jython 2.7 is still only
> at the alpha release stage.

Good news for Jython fans, although originally expected last year,
they have now released a beta of Jython 2.7 (which supports the
same language features as C Python 2.7):

http://fwierzbicki.blogspot.co.uk/2013/02/jython-27-beta1-released.html

Hopefully the Biopython unit tests will all be fine under this... and
if so that is good news for phasing out support of Python 2.5.

Regards,

Peter

From tiagoantao at gmail.com  Mon Feb 11 06:50:10 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 11 Feb 2013 11:50:10 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
	<CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
	<CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>
Message-ID: <CAA9RGEPv5G3Eh9Gv72XPEsQB3QvUZjr2R9thwU7C4Yg_GNj+cg@mail.gmail.com>

On Mon, Feb 11, 2013 at 11:46 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Good news for Jython fans, although originally expected last year,
> they have now released a beta of Jython 2.7 (which supports the
> same language features as C Python 2.7):


I am going to setup builldbot now for this. I will set my slave first.
If you have any slaves that you want to add this, please tell me.

Tiago


-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From saketkc at gmail.com  Tue Feb 12 04:51:54 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 12 Feb 2013 15:21:54 +0530
Subject: [Biopython-dev] BWA Wrapper
Message-ID: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>

Hi,

I am writing a bwa wrapper for bio-python. I have infact got the "index"
 option working. However I have a concern:

bwa has these options :


bwa index -a bwtsw database.fasta

bwa aln database.fasta short_read.fastq > aln_sa.sai

bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam

bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam

bwa bwasw database.fasta long_read.fastq > aln.sam


If you read the documentation here<http://bio-bwa.sourceforge.net/bwa.shtml>,
you will see that  "-r" is an option with "aln" command as well as the
"samse" command. In the former it is of type INT and in the latter of type
STR. Now I am not sure how can this be taken care of in the wrapper,
because I also plan to implement a checker_function.  One way is to make a
new class, lets say BwaAlignCommand which will take care of all options
inside the "aln" command and separately implement another class say
"BwaSamseCommand", and implement all the options of the "samse" command.
But I am not sure if that is indeed the correct way of addressing the
problem.


Any pointers on this issue ?


Thanks


Saket Choudhary

Undergraduate Student

IIT Bombay,India

From p.j.a.cock at googlemail.com  Tue Feb 12 12:38:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Feb 2013 17:38:46 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>

On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary <saketkc at gmail.com> wrote:
> Hi,
>
> I am writing a bwa wrapper for bio-python. I have infact got the "index"
>  option working. However I have a concern:
>
> bwa has these options :
>
> bwa index -a bwtsw database.fasta
>
> bwa aln database.fasta short_read.fastq > aln_sa.sai
>
> bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
>
> bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
>
> bwa bwasw database.fasta long_read.fastq > aln.sam
>
>
> If you read the documentation here<http://bio-bwa.sourceforge.net/bwa.shtml>,
> you will see that  "-r" is an option with "aln" command as well as the
> "samse" command. In the former it is of type INT and in the latter of type
> STR. Now I am not sure how can this be taken care of in the wrapper,
> because I also plan to implement a checker_function.  One way is to make a
> new class, lets say BwaAlignCommand which will take care of all options
> inside the "aln" command and separately implement another class say
> "BwaSamseCommand", and implement all the options of the "samse" command.
> But I am not sure if that is indeed the correct way of addressing the
> problem.
>
>
> Any pointers on this issue ?

I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and
write a wrapper class for each of them. This would probably fit under the
Bio.Sequencing.Applications namespace.

Peter

From p.j.a.cock at googlemail.com  Tue Feb 12 12:51:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Feb 2013 17:51:15 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
Message-ID: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>

Hello all,

Google recently confirmed they will be running Google Summer of Code 2013,
and we (Biopython and the other Bio* projects) would hope to be accepted again
under the Open Bioinformatics Foundation as in previous years:
http://lists.open-bio.org/pipermail/gsoc/2013/000196.html

It would be great to start coming up with potential project ideas, both larger
pieces of work suitable for GSoC but also smaller tasks for other project
students, or 'low hanging fruit' for potential contributors to cut
their teeth on.

See also http://biopython.org/wiki/Active_projects and the ideas list there.

Regards,

Peter

From w.arindrarto at gmail.com  Tue Feb 12 13:29:02 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 12 Feb 2013 19:29:02 +0100
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>

Hi everyone,

It's more or less a 'low hanging fruit', but I've been thinking
perhaps it may be useful if we have our own interface to the HMMER3
online service? The corresponding SearchIO parsers may be written for
this as well (they return different formats for which we haven't any
parsers currently).

And I think there are more things being worked on, not yet mentioned
in the wiki:

1. Porting our docs to Sphinx[1]
2. Converting some/all of the print and compare tests to unit tests.
For example, our Bio.Seq's tests are still print and compare tests.

regards,
Bow

[1] See the original feature request here:
https://redmine.open-bio.org/issues/3221
https://redmine.open-bio.org/issues/3220
https://redmine.open-bio.org/issues/3219

From eric.talevich at gmail.com  Tue Feb 12 15:00:11 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 12 Feb 2013 15:00:11 -0500
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CAMC681nJpArQEGLynZ7JP8FsgZ84ZwW+mPK=P1+NqOkc+fr=2w@mail.gmail.com>

On Tue, Feb 12, 2013 at 12:51 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hello all,
>
> Google recently confirmed they will be running Google Summer of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to be accepted
> again
> under the Open Bioinformatics Foundation as in previous years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>
> It would be great to start coming up with potential project ideas, both
> larger
> pieces of work suitable for GSoC but also smaller tasks for other project
> students, or 'low hanging fruit' for potential contributors to cut
> their teeth on.
>

One interesting GSoC project would be to implement support for phylogenetic
placements. The programs pplacer and EPA (part of RAxML) can place sequence
reads from metagenomic samples onto a reference phylogeny:
http://matsen.fhcrc.org/pplacer/
http://sysbio.oxfordjournals.org/content/60/3/291

The output format of those programs has been standardized as something I
suppose we could call the "jplace" format:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031009
http://arxiv.org/abs/1201.3397

It's based on JSON and Newick, with a small extension to Newick that
shouldn't be too hard to support. The GSoC project would be to implement a
parser for this and implement querying as well as integration with the rest
of Bio.Phylo to some reasonable extent. I would be available to mentor this.

In terms of low-hanging fruit, there are some small but important functions
that could be added to Bio.Phylo. My top three: Robinson-Foulds distance,
majority-rules consensus, draw an unrooted tree using Felsenstein's Equal
Daylight algorithm (which starts by computing the layout for a radial tree).

-Eric

From saketkc at gmail.com  Tue Feb 12 15:45:46 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Wed, 13 Feb 2013 02:15:46 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CAEDHeivOFNv9r5FH_F92=oUqXDzXJ5T073V9NxQM6v=L+wCSGw@mail.gmail.com>

Hi,

I was thinking of a Synteny viewer on the lines of
GSV<http://cas-bioinfo.cas.unt.edu/gsv/homepage.php> if
it makes sense .

Saket

On 12 February 2013 23:21, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> Hello all,
>
> Google recently confirmed they will be running Google Summer of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to be accepted
> again
> under the Open Bioinformatics Foundation as in previous years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>
> It would be great to start coming up with potential project ideas, both
> larger
> pieces of work suitable for GSoC but also smaller tasks for other project
> students, or 'low hanging fruit' for potential contributors to cut
> their teeth on.
>
> See also http://biopython.org/wiki/Active_projects and the ideas list
> there.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From sefakilic at gmail.com  Tue Feb 12 18:18:17 2013
From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=)
Date: Tue, 12 Feb 2013 18:18:17 -0500
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <CAMfh4tc8vsC258o5Zc1WkgOPDeAf23ohFVmrpouCON=kB=b=SA@mail.gmail.com>
References: <CAMfh4tc8vsC258o5Zc1WkgOPDeAf23ohFVmrpouCON=kB=b=SA@mail.gmail.com>
Message-ID: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>

Hi all,

I am working on comparative genomics and I frequently use Motif module of
Biopython. One of the most frequent operations that I do is to build a
motif out of sites and search a sequence to find instances that are similar
to the motif [Bio.Motif._Motif.search_instances()].

The problem is that the sequence that instances are searched is huge.
Mostly it is the genome sequence itself, with its reverse complement. For
example, scanning the E.coli genome + its reverse complement with a motif
of length ~20 takes almost a minute in my machine.

To make it faster, I implemented a C version of it and a Python interface
so that you can call it from Python. It is pretty fast, it takes about ~2.5
seconds.

Current implementation can be found at:

https://github.com/sefakilic/yassi

If anyone is interested and it is appropriate, I would like to modify the
current implementation and integrate it into Biopython.

Thanks!

Sefa Kilic

From mjldehoon at yahoo.com  Tue Feb 12 21:06:33 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 12 Feb 2013 18:06:33 -0800 (PST)
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>
Message-ID: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Sefa,

Bio.Motif._Motif.search_instances() searches for exact instances of a motif, but it looks like your code searches for motifs based on its PSSM score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or Bio/motifs/_pwm.c)?

Best,
-Michiel.

--- On Tue, 2/12/13, Sefa K?l?? <sefakilic at gmail.com> wrote:

> From: Sefa K?l?? <sefakilic at gmail.com>
> Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
> To: biopython-dev at biopython.org
> Date: Tuesday, February 12, 2013, 6:18 PM
> Hi all,
> 
> I am working on comparative genomics and I frequently use
> Motif module of
> Biopython. One of the most frequent operations that I do is
> to build a
> motif out of sites and search a sequence to find instances
> that are similar
> to the motif [Bio.Motif._Motif.search_instances()].
> 
> The problem is that the sequence that instances are searched
> is huge.
> Mostly it is the genome sequence itself, with its reverse
> complement. For
> example, scanning the E.coli genome + its reverse complement
> with a motif
> of length ~20 takes almost a minute in my machine.
> 
> To make it faster, I implemented a C version of it and a
> Python interface
> so that you can call it from Python. It is pretty fast, it
> takes about ~2.5
> seconds.
> 
> Current implementation can be found at:
> 
> https://github.com/sefakilic/yassi
> 
> If anyone is interested and it is appropriate, I would like
> to modify the
> current implementation and integrate it into Biopython.
> 
> Thanks!
> 
> Sefa Kilic
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From mjldehoon at yahoo.com  Tue Feb 12 21:08:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 12 Feb 2013 18:08:26 -0800 (PST)
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>

It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC?

Best,
-Michiel.

--- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, February 12, 2013, 12:51 PM
> Hello all,
> 
> Google recently confirmed they will be running Google Summer
> of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to
> be accepted again
> under the Open Bioinformatics Foundation as in previous
> years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
> 
> It would be great to start coming up with potential project
> ideas, both larger
> pieces of work suitable for GSoC but also smaller tasks for
> other project
> students, or 'low hanging fruit' for potential contributors
> to cut
> their teeth on.
> 
> See also http://biopython.org/wiki/Active_projects
> and the ideas list there.
> 
> Regards,
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From sefakilic at gmail.com  Tue Feb 12 21:40:12 2013
From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=)
Date: Tue, 12 Feb 2013 21:40:12 -0500
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>
	<1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAMfh4teE2cK7HJP3KOVSq12erWmgMbd6nSkHvkD22YM4eL88Yg@mail.gmail.com>

Hi Michiel,

Thanks for the reply. It seems that _pwm.c does the same thing, as you
said. I missed that part of the code. However, it seems that it is not
mentioned in the tutorial and it might be useful to mention it there.

Anyway, it was a good practice for re-implementing it. Thank you!

Sefa Kilic


On Tue, Feb 12, 2013 at 9:06 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> Hi Sefa,
>
> Bio.Motif._Motif.search_instances() searches for exact instances of a
> motif, but it looks like your code searches for motifs based on its PSSM
> score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or
> Bio/motifs/_pwm.c)?
>
> Best,
> -Michiel.
>
> --- On Tue, 2/12/13, Sefa K?l?? <sefakilic at gmail.com> wrote:
>
> > From: Sefa K?l?? <sefakilic at gmail.com>
> > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
> > To: biopython-dev at biopython.org
> > Date: Tuesday, February 12, 2013, 6:18 PM
> > Hi all,
> >
> > I am working on comparative genomics and I frequently use
> > Motif module of
> > Biopython. One of the most frequent operations that I do is
> > to build a
> > motif out of sites and search a sequence to find instances
> > that are similar
> > to the motif [Bio.Motif._Motif.search_instances()].
> >
> > The problem is that the sequence that instances are searched
> > is huge.
> > Mostly it is the genome sequence itself, with its reverse
> > complement. For
> > example, scanning the E.coli genome + its reverse complement
> > with a motif
> > of length ~20 takes almost a minute in my machine.
> >
> > To make it faster, I implemented a C version of it and a
> > Python interface
> > so that you can call it from Python. It is pretty fast, it
> > takes about ~2.5
> > seconds.
> >
> > Current implementation can be found at:
> >
> > https://github.com/sefakilic/yassi
> >
> > If anyone is interested and it is appropriate, I would like
> > to modify the
> > current implementation and integrate it into Biopython.
> >
> > Thanks!
> >
> > Sefa Kilic
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>


From saketkc at gmail.com  Thu Feb 14 11:02:21 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Thu, 14 Feb 2013 21:32:21 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
Message-ID: <CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>

Theres one more issue that I have run into . Consider the following command
, the outout generated is written by piping it to a file called aln_sa.sai,


bwa aln database.fasta short_read.fastq > aln_sa.sai

Now if we look into the _call method
here<https://github.com/saketkc/biopython/blob/master/Bio/Application/__init__.py#L372>
,
it takes as its inout a boolean for stdout. So should I modify this so that
it can take 'stdout' as on opened file  instance which I can invoke while
unvoking my BwaAlnCommandLine functions as follwos:

a=BwaAlnCommandLine()
b=a(stdout=open("aln_sa.sai","wb"))


On 12 February 2013 23:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Hi,
> >
> > I am writing a bwa wrapper for bio-python. I have infact got the "index"
> >  option working. However I have a concern:
> >
> > bwa has these options :
> >
> > bwa index -a bwtsw database.fasta
> >
> > bwa aln database.fasta short_read.fastq > aln_sa.sai
> >
> > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
> >
> > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq >
> aln.sam
> >
> > bwa bwasw database.fasta long_read.fastq > aln.sam
> >
> >
> > If you read the documentation here<
> http://bio-bwa.sourceforge.net/bwa.shtml>,
> > you will see that  "-r" is an option with "aln" command as well as the
> > "samse" command. In the former it is of type INT and in the latter of
> type
> > STR. Now I am not sure how can this be taken care of in the wrapper,
> > because I also plan to implement a checker_function.  One way is to make
> a
> > new class, lets say BwaAlignCommand which will take care of all options
> > inside the "aln" command and separately implement another class say
> > "BwaSamseCommand", and implement all the options of the "samse" command.
> > But I am not sure if that is indeed the correct way of addressing the
> > problem.
> >
> >
> > Any pointers on this issue ?
>
> I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and
> write a wrapper class for each of them. This would probably fit under the
> Bio.Sequencing.Applications namespace.
>
> Peter
>

From p.j.a.cock at googlemail.com  Thu Feb 14 11:19:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 16:19:59 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
Message-ID: <CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>

On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> Theres one more issue that I have run into . Consider the following command
> , the outout generated is written by piping it to a file called aln_sa.sai,
>
> bwa aln database.fasta short_read.fastq > aln_sa.sai
>
> Now if we look into the _call method here , it takes as its inout a boolean
> for stdout. So should I modify this so that it can take 'stdout' as on
> opened file  instance which I can invoke while unvoking my BwaAlnCommandLine
> functions as follwos:
>
> a=BwaAlnCommandLine()
> b=a(stdout=open("aln_sa.sai","wb"))

Is that possible?

For complex use of subprocess and pipes, we've previously recommend
the user handle this explicitly themselves, just use str() on the command
line wrapper object to get 'bwa aln database.fasta short_read.fastq' in this
case. There are some examples in the Tutorial with (multiple sequence)
alignment tools.

Peter

From saketkc at gmail.com  Thu Feb 14 12:04:04 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Thu, 14 Feb 2013 22:34:04 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
Message-ID: <CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>

I was thinking of adding one more parameter to the _call function, lets say
'stdout_to_filepath'.
If this is set then I add one more if condition
here<https://github.com/saketkc/biopython/blob/master/Bio/Application/__init__.py#L415>
to set the stdout as


stdout_arg = open(stdout_to_filepath, "w")

I tried it and it did work, but I am not sure if it  this standard can be
incorporated in the biopython codebase ?

Thanks

Saket

On 14 February 2013 21:49, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Theres one more issue that I have run into . Consider the following
> command
> > , the outout generated is written by piping it to a file called
> aln_sa.sai,
> >
> > bwa aln database.fasta short_read.fastq > aln_sa.sai
> >
> > Now if we look into the _call method here , it takes as its inout a
> boolean
> > for stdout. So should I modify this so that it can take 'stdout' as on
> > opened file  instance which I can invoke while unvoking my
> BwaAlnCommandLine
> > functions as follwos:
> >
> > a=BwaAlnCommandLine()
> > b=a(stdout=open("aln_sa.sai","wb"))
>
> Is that possible?
>
> For complex use of subprocess and pipes, we've previously recommend
> the user handle this explicitly themselves, just use str() on the command
> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in
> this
> case. There are some examples in the Tutorial with (multiple sequence)
> alignment tools.
>
> Peter
>

From saketkc at gmail.com  Thu Feb 14 13:52:31 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 15 Feb 2013 00:22:31 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
Message-ID: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>

In short , am I allowed to play with this extra parameter thing as per
the code standards of biopython ?

On 14 February 2013 22:34, Saket Choudhary <saketkc at gmail.com> wrote:
> I was thinking of adding one more parameter to the _call function, lets say
> 'stdout_to_filepath'.
> If this is set then I add one more if condition here  to set the stdout as
>
>
> stdout_arg = open(stdout_to_filepath, "w")
>
> I tried it and it did work, but I am not sure if it  this standard can be
> incorporated in the biopython codebase ?
>
> Thanks
>
> Saket
>
>
> On 14 February 2013 21:49, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com>
>> wrote:
>> > Theres one more issue that I have run into . Consider the following
>> > command
>> > , the outout generated is written by piping it to a file called
>> > aln_sa.sai,
>> >
>> > bwa aln database.fasta short_read.fastq > aln_sa.sai
>> >
>> > Now if we look into the _call method here , it takes as its inout a
>> > boolean
>> > for stdout. So should I modify this so that it can take 'stdout' as on
>> > opened file  instance which I can invoke while unvoking my
>> > BwaAlnCommandLine
>> > functions as follwos:
>> >
>> > a=BwaAlnCommandLine()
>> > b=a(stdout=open("aln_sa.sai","wb"))
>>
>> Is that possible?
>>
>> For complex use of subprocess and pipes, we've previously recommend
>> the user handle this explicitly themselves, just use str() on the command
>> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in
>> this
>> case. There are some examples in the Tutorial with (multiple sequence)
>> alignment tools.
>>
>> Peter
>
>

From arklenna at gmail.com  Thu Feb 14 14:43:18 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 14 Feb 2013 14:43:18 -0500
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
Message-ID: <CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>

>
> On 14 February 2013 22:34, Saket Choudhary <saketkc at gmail.com> wrote:
> > I was thinking of adding one more parameter to the _call function, lets
> say
> > 'stdout_to_filepath'.
> > If this is set then I add one more if condition here  to set the stdout
> as
> >
> >
> > stdout_arg = open(stdout_to_filepath, "w")
> >
> >
>


What's wrong with accepting the stdout string that the current
implementation provides and explicitly writing it to your file?

Cheers,

Lenna

From arklenna at gmail.com  Thu Feb 14 14:52:54 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 14 Feb 2013 14:52:54 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tL7caRXcM1BmW-ksphdDP6E=J0bYE-x-rmYVbHtV+CCsA@mail.gmail.com>
	<1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>

On Sat, Feb 9, 2013 at 10:55 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

>
>
> Currently lex.yy.c contains lots of code that is generated automatically
> by lex but is not actually needed for the mmCIF parser. I was thinking to
> remove those parts, and to clean up the remainder so that the code is
> understandable (allowing us to fix any bugs, or to convert it to pure
> Python).
>

Whoops, failed to reply all. Sorry for the double email, Michiel.

---

But generated C is by definition not understandable or debuggable. The only
function of lex.yy.c is to tokenize the mmCIF input.

All of the communication to Python is handled by MMCIFlexmodule.c, which is
70 lines and a header with 3 statements. In parallel to the PLY version, I
rewrote the C to be object-oriented, which pushed it to 101 lines.

Cheers,

Lenna

From p.j.a.cock at googlemail.com  Thu Feb 14 15:33:37 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 20:33:37 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>
Message-ID: <CAKVJ-_6q3fHd4o-M1LKLxwr7Ab-6w5LW_f5=fbcCLGbUdw6_MQ@mail.gmail.com>

On Thu, Feb 14, 2013 at 7:43 PM, Lenna Peterson <arklenna at gmail.com> wrote:
>
> What's wrong with accepting the stdout string that the current
> implementation provides and explicitly writing it to your file?
>

That is only a good idea for short output, say up to a few kb.

With bwa (and samtools etc), quite often the output defaults
to (or only goes to) stdout - and can be very large. It can also
be binary rather than text, which is an additional complication
with Python 2 vs Python 3 (byte strings versus unicode strings).

See http://bio-bwa.sourceforge.net/bwa.shtml

Peter

From p.j.a.cock at googlemail.com  Thu Feb 14 15:38:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 20:38:59 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
Message-ID: <CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>

On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> In short , am I allowed to play with this extra parameter thing as per
> the code standards of biopython ?

If you can come up with a nice extension to the current interface
for the application wrapper's __call__ method, which is backward
compatible, then we could be convinced.

One idea would be stdout=True and stderr=True are treated as
subprocess.PIPE (as now), and a false value would continue
to mean don't capture the output (send it to /dev/null), but a
(non-empty) string argument could be interpreted as a filename
instead. You might be able to accept a handle, but I'm not sure
if all Python handles would work or not here - it requires some
careful cross platform testing.

Peter

From mjldehoon at yahoo.com  Fri Feb 15 21:46:00 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Feb 2013 18:46:00 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>
Message-ID: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

Maybe we are confusing each other..
I am looking for a solution that (a) doesn't introduce new dependencies, (b) is pure-Python so it can run on Jython, and (c) if that is not possible and we do need to use C, then that C code should be understandable so that it can be debugged if necessary.

I was suggesting to clean up lex.yy.c so that we can at least achieve (c). The alternative is to start from the PLY-based parser and remove the dependency on PLY.

Best,
-Michiel.

--- On Thu, 2/14/13, Lenna Peterson <arklenna at gmail.com> wrote:

> From: Lenna Peterson <arklenna at gmail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, February 14, 2013, 2:52 PM
> On Sat, Feb 9, 2013 at 10:55 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>wrote:
> 
> >
> >
> > Currently lex.yy.c contains lots of code that is
> generated automatically
> > by lex but is not actually needed for the mmCIF parser.
> I was thinking to
> > remove those parts, and to clean up the remainder so
> that the code is
> > understandable (allowing us to fix any bugs, or to
> convert it to pure
> > Python).
> >
> 
> Whoops, failed to reply all. Sorry for the double email,
> Michiel.
> 
> ---
> 
> But generated C is by definition not understandable or
> debuggable. The only
> function of lex.yy.c is to tokenize the mmCIF input.
> 
> All of the communication to Python is handled by
> MMCIFlexmodule.c, which is
> 70 lines and a header with 3 statements. In parallel to the
> PLY version, I
> rewrote the C to be object-oriented, which pushed it to 101
> lines.
> 
> Cheers,
> 
> Lenna
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 

From saketkc at gmail.com  Sat Feb 16 02:08:46 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 16 Feb 2013 12:38:46 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
Message-ID: <CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>

On 15 February 2013 02:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>> In short , am I allowed to play with this extra parameter thing as per
>> the code standards of biopython ?
>
> If you can come up with a nice extension to the current interface
> for the application wrapper's __call__ method, which is backward
> compatible, then we could be convinced.
>
> One idea would be stdout=True and stderr=True are treated as
> subprocess.PIPE (as now), and a false value would continue
> to mean don't capture the output (send it to /dev/null), but a
> (non-empty) string argument could be interpreted as a filename
> instead. You might be able to accept a handle, but I'm not sure
> if all Python handles would work or not here - it requires some
> careful cross platform testing.
>
> Peter


HI Everyone,

I have pushed the wrapper to
https://github.com/saketkc/biopython/tree/bwa_wrapper

Should I send a pull request ? I am in the middle of my University
mid-semester examinations and hence this is not completely tested. I
need to perform some more tests with more parameters after I am done
with my examinations the next week.


I would like to hear comments or have it code-reviewed, since this is
the first time I am contributing to biopython and I might have missed
out on some of the coding practices being followed.

Thanks

Saket

From p.j.a.cock at googlemail.com  Sat Feb 16 05:42:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Feb 2013 10:42:50 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>
	<1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>

On Sat, Feb 16, 2013 at 2:46 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Lenna,
>
> Maybe we are confusing each other..
> I am looking for a solution that (a) doesn't introduce new dependencies,

+1

> (b) is pure-Python so it can run on Jython,

+1 And on PyPy (which to me is more interesting that Jython) etc.

> and (c) if that is not possible and we do need to use C, then that C code
> should be understandable so that it can be debugged if necessary.
>
> I was suggesting to clean up lex.yy.c so that we can at least achieve (c).

This does mean we essentially give up on ever regenerating the lex.yy.c
file every again - could that be a problem if Flex itself changes much?

> The alternative is to start from the PLY-based parser and remove the
> dependency on PLY.
>
> Best,
> -Michiel.

Peter

From saketkc at gmail.com  Sat Feb 16 06:48:43 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 16 Feb 2013 17:18:43 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
Message-ID: <CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>

Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now :

https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23

On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
> On 15 February 2013 02:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>> In short , am I allowed to play with this extra parameter thing as per
>>> the code standards of biopython ?
>>
>> If you can come up with a nice extension to the current interface
>> for the application wrapper's __call__ method, which is backward
>> compatible, then we could be convinced.
>>
>> One idea would be stdout=True and stderr=True are treated as
>> subprocess.PIPE (as now), and a false value would continue
>> to mean don't capture the output (send it to /dev/null), but a
>> (non-empty) string argument could be interpreted as a filename
>> instead. You might be able to accept a handle, but I'm not sure
>> if all Python handles would work or not here - it requires some
>> careful cross platform testing.
>>
>> Peter
>
>
> HI Everyone,
>
> I have pushed the wrapper to
> https://github.com/saketkc/biopython/tree/bwa_wrapper
>
> Should I send a pull request ? I am in the middle of my University
> mid-semester examinations and hence this is not completely tested. I
> need to perform some more tests with more parameters after I am done
> with my examinations the next week.
>
>
> I would like to hear comments or have it code-reviewed, since this is
> the first time I am contributing to biopython and I might have missed
> out on some of the coding practices being followed.
>
> Thanks
>
> Saket

From mjldehoon at yahoo.com  Sat Feb 16 07:09:22 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 16 Feb 2013 04:09:22 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
Message-ID: <1361016562.57361.YahooMailClassic@web164001.mail.gq1.yahoo.com>

--- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> This does mean we essentially give up on ever regenerating
> the lex.yy.c file every again - could that be a problem if Flex
> itself changes much?

The lex.yy.c file was generated by Flex, but otherwise it's independent of it. It doesn't #include Flex's header files, and we don't link it to the Flex libraries. So we can do with it whatever we want.

We may find though that a stripped-down version of lex.yy.c will be rather trivial, and converting it to Python may be straightforward.

Best,
-Michiel.

From tiagoantao at gmail.com  Mon Feb 18 08:57:15 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 18 Feb 2013 13:57:15 +0000
Subject: [Biopython-dev] Support for BioSQL on Java/Jython
Message-ID: <CAA9RGEOfOfWRihvJmEn9heK3g3UHdF605sQpea9NKF1AFKwqwQ@mail.gmail.com>

Dear All,

I have implemented a set of changes to allow for BioSQL support in Jython.
Features:

1. Totally transparent in terms of API. Indeed the existing tests on BioSQL
work out of the box

2. MySQL and PostgreSQL.

3. No sqllite3 support. This library (standard in C-Python) does not exist
in Jython


You can find the changes here:
https://github.com/tiagoantao/biopython/commits/master
(top two commits)

Comments appreciated. If there is no opposition, I will commit these soon
(after incorporating feedback) to the main repo.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Mon Feb 18 12:44:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 18 Feb 2013 17:44:30 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
Message-ID: <CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>

> On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
>> HI Everyone,
>>
>> I have pushed the wrapper to
>> https://github.com/saketkc/biopython/tree/bwa_wrapper
>>
>> Should I send a pull request ? I am in the middle of my University
>> mid-semester examinations and hence this is not completely tested. I
>> need to perform some more tests with more parameters after I am done
>> with my examinations the next week.
>>
>>
>> I would like to hear comments or have it code-reviewed, since this is
>> the first time I am contributing to biopython and I might have missed
>> out on some of the coding practices being followed.
>>
>> Thanks
>>
>> Saket


On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary <saketkc at gmail.com> wrote:
> Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now :
>
> https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23
>

This looks sensible. I think if we are going to extend the __call__ interface
to allow stdout to be a filename, then we should do the same for stderr
as well. Also this needs to be explained in the docstring (and perhaps
also the Tutorial somewhere).

Separately some simple unit tests for the wrapper would be good too
(which can be as much work as the original code itself), and would
be beneficial for cross-platform testing.

Thanks,

Peter

From tiagoantao at gmail.com  Tue Feb 19 06:42:22 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 19 Feb 2013 11:42:22 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
Message-ID: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>

Hi,

I had a cursory look at the documentation for installing Biopython under
Jython and there seems to be none. If it is OK, I would extend the
documentation to cover Jython

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Tue Feb 19 07:01:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 12:01:15 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
Message-ID: <CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>

On Tue, Feb 19, 2013 at 11:42 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I had a cursory look at the documentation for installing Biopython under
> Jython and there seems to be none. If it is OK, I would extend the
> documentation to cover Jython

I just use the usual mantra:

jython setup.py build
jython setup.py test
jython setup.py install

Perhaps there are pitfalls I'm not aware of?

(Updating Doc/install/Installation.tex is still a good idea though)

Peter


From tiagoantao at gmail.com  Tue Feb 19 07:02:18 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 19 Feb 2013 12:02:18 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
	<CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
Message-ID: <CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>

On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Perhaps there are pitfalls I'm not aware of?
>
>
JDBC driver for the new BioSQL code ;)

Tiago


-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Tue Feb 19 07:07:39 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 12:07:39 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
	<CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
	<CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>
Message-ID: <CAKVJ-_6grdoyjBJFi-R6pt69M2A8uduZM8gNMPgDCP=2N3wnSQ@mail.gmail.com>

On Tue, Feb 19, 2013 at 12:02 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>
> On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Perhaps there are pitfalls I'm not aware of?
>>
>
> JDBC driver for the new BioSQL code ;)
>
> Tiago

Good answer :)

Yes, advice on installing optional dependencies like that makes sense.

Peter


From saketkc at gmail.com  Tue Feb 19 08:15:56 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 19 Feb 2013 18:45:56 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
Message-ID: <CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>

On 18 February 2013 23:14, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> > On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
> >> HI Everyone,
> >>
> >> I have pushed the wrapper to
> >> https://github.com/saketkc/biopython/tree/bwa_wrapper
> >>
> >> Should I send a pull request ? I am in the middle of my University
> >> mid-semester examinations and hence this is not completely tested. I
> >> need to perform some more tests with more parameters after I am done
> >> with my examinations the next week.
> >>
> >>
> >> I would like to hear comments or have it code-reviewed, since this is
> >> the first time I am contributing to biopython and I might have missed
> >> out on some of the coding practices being followed.
> >>
> >> Thanks
> >>
> >> Saket
>
>
> On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed
> now :
> >
> >
> https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23
> >
>
> This looks sensible. I think if we are going to extend the __call__
> interface
> to allow stdout to be a filename, then we should do the same for stderr
> as well. Also this needs to be explained in the docstring (and perhaps
> also the Tutorial somewhere).
>
> Separately some simple unit tests for the wrapper would be good too
> (which can be as much work as the original code itself), and would
> be beneficial for cross-platform testing.
>
> Thanks,
>
> Peter
>

Thanks Peter.

I will add that. Any pointers to what would be a good reference test_aba.py
file in Tests/ directory for writing unit tests for this ?

I have worked on BDD before but Unit Tests are new for me, so it may take
some time.I plan to finish it the coming week once my university
examinations are done

Thanks

Saket

From p.j.a.cock at googlemail.com  Tue Feb 19 09:25:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 14:25:40 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
Message-ID: <CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>

On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>
> Thanks Peter.
>
> I will add that. Any pointers to what would be a good reference test_aba.py
> file in Tests/ directory for writing unit tests for this ?
>
> I have worked on BDD before but Unit Tests are new for me, so it may take
> some time.I plan to finish it the coming week once my university
> examinations are done
>
> Thanks
>
> Saket

There's a chapter in the Tutorial about our test framework. In this
case existing command line tool wrappers are the best reference,
e.g. test_Emboss.py or test_Muscle.py

Also if you want to use doctests and have them included in the
test suite, add the module to the list in Tests/run_tests.py - however
this does not handle optional dependencies (other than NumPy).
Therefore all the application wrapper doctests to date have carefully
avoided actually invoking the command line - and instead most
print the string representation instead. This allows us to check
the example use cases should run (and catches silly errors in
the examples like a typo in an argument name).

Thanks,

Peter

From p.j.a.cock at googlemail.com  Sun Feb 24 07:42:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 24 Feb 2013 12:42:47 +0000
Subject: [Biopython-dev] [Biopython] blastdbcmd
In-Reply-To: <5127B8D1.5090705@usp.br>
References: <5127A44E.2030403@usp.br>
	<CAKVJ-_4LosCsm4940My0Y5O6L45a-NqxUa6sziUK-wkKm51mJA@mail.gmail.com>
	<5127B8D1.5090705@usp.br>
Message-ID: <CAKVJ-_69XXfAZpqwMZAPoDmR1Wh=SL54-=uNUysaVK3tWiPt6w@mail.gmail.com>

Great - let us know on the list if you have any questions.

Peter

On Fri, Feb 22, 2013 at 6:28 PM, Frederico Moraes Ferreira
<ferreirafm at usp.br> wrote:
> Hi Peter,
> Yes, I meant a Biopython Blast application for blastdbcmd.
> Thanks for the link.
> Best,
> Fred
>
> Em 22-02-2013 14:23, Peter Cock escreveu:
>
>> On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira
>> <ferreirafm at usp.br>  wrote:
>>>
>>> Hi there Biopythoneers,
>>> As long as I know, there isnt't a blastdbcmd submodule into Biopython.
>>> So,
>>> I've been writing the blast matched sequences ID's to a file, fetching
>>> them
>>> all with a subprocess and reading with SeqIO afterwards. In some cases,
>>> however, I miss a blastdbcmd parser to make things easy. How do you guys
>>> are
>>> dealing with this?
>>> Best,
>>> Fred
>>
>> Are you talking about a command line wrapper for blastdbcmd, to go in
>> Bio/Blast/Applications.py? That seems a good idea.
>>
>> Personally I find the blastdbcmd tool quite handicapped due to the
>> introduction of generated sequence identifiers, and rarely use it:
>>
>> http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html
>>
>> Instead I would use Bio.SeqIO to index the FASTA file used for the
>> database, and get the sequences that way.
>>
>> Peter
>>
>
> --
> Dr. Frederico Moraes Ferreira
> Universidade de S?o Paulo
> Faculdade de Medicina
> Instituto do Cora??o - Imunologia
> Av. Dr. En?as de Carvalho Aguiar, 44
> 05403-900     S?o Paulo - SP
> Brasil
>


From anaryin at gmail.com  Tue Feb 26 11:14:52 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Feb 2013 17:14:52 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
Message-ID: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>

Hello all,

I've modified slightly PDBIO to allow writing of any object of our PDB
representation. Right now it accepts only Models or Structures (IIRC) and
sometimes it's useful to have only a chain or a residue written. I've added
a layer of code that builds the "missing" parts using StructureBuilder.

I pushed it to a branch in my github account:

https://github.com/JoaoRodrigues/biopython/tree/pdbio

I've been using it for a while now so often I completely forgot about it..
Only noticed when I changed computers and the version there could not
handle this. So I guess it should be solid enough.

Cheers,

Jo?o


From eric.talevich at gmail.com  Tue Feb 26 14:35:56 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 26 Feb 2013 14:35:56 -0500
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
Message-ID: <CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>

On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> Hello all,
>
> I've modified slightly PDBIO to allow writing of any object of our PDB
> representation. Right now it accepts only Models or Structures (IIRC) and
> sometimes it's useful to have only a chain or a residue written. I've added
> a layer of code that builds the "missing" parts using StructureBuilder.
>
> I pushed it to a branch in my github account:
>
> https://github.com/JoaoRodrigues/biopython/tree/pdbio
>
> I've been using it for a while now so often I completely forgot about it..
> Only noticed when I changed computers and the version there could not
> handle this. So I guess it should be solid enough.
>
>
Awesome. I support the idea. Could you do a pull request, so TravisCI runs
the tests automatically?

-Eric


From anaryin at gmail.com  Tue Feb 26 14:39:08 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Feb 2013 20:39:08 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
Message-ID: <CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>

There's some discussion about some implementation details:

https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4

What does everyone else think?

Thanks for the input btw. Should I make a test too? I reckon it would be a
good thing to add?


2013/2/26 Eric Talevich <eric.talevich at gmail.com>

> On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com>wrote:
>
>> Hello all,
>>
>> I've modified slightly PDBIO to allow writing of any object of our PDB
>> representation. Right now it accepts only Models or Structures (IIRC) and
>> sometimes it's useful to have only a chain or a residue written. I've
>> added
>> a layer of code that builds the "missing" parts using StructureBuilder.
>>
>> I pushed it to a branch in my github account:
>>
>> https://github.com/JoaoRodrigues/biopython/tree/pdbio
>>
>> I've been using it for a while now so often I completely forgot about it..
>> Only noticed when I changed computers and the version there could not
>> handle this. So I guess it should be solid enough.
>>
>>
> Awesome. I support the idea. Could you do a pull request, so TravisCI runs
> the tests automatically?
>
> -Eric
>


From davidjosephcain at gmail.com  Tue Feb 26 14:47:32 2013
From: davidjosephcain at gmail.com (David Cain)
Date: Tue, 26 Feb 2013 14:47:32 -0500
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
Message-ID: <CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>

I failed to mention this sooner, but I'm an enthusiastic proponent of what
you've done. Your new set_structure() would  be immensely helpful to me, as
I've been using some workarounds to achieve the functionality you've
implemented.

Personally, I think a unit test would be really helpful in ensuring
chain-less residues and the like will save appropriately.


David Cain
+1 (339) 222 4452


On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> There's some discussion about some implementation details:
>
>
> https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4
>
> What does everyone else think?
>
> Thanks for the input btw. Should I make a test too? I reckon it would be a
> good thing to add?
>
>
> 2013/2/26 Eric Talevich <eric.talevich at gmail.com>
>
> > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com
> >wrote:
> >
> >> Hello all,
> >>
> >> I've modified slightly PDBIO to allow writing of any object of our PDB
> >> representation. Right now it accepts only Models or Structures (IIRC)
> and
> >> sometimes it's useful to have only a chain or a residue written. I've
> >> added
> >> a layer of code that builds the "missing" parts using StructureBuilder.
> >>
> >> I pushed it to a branch in my github account:
> >>
> >> https://github.com/JoaoRodrigues/biopython/tree/pdbio
> >>
> >> I've been using it for a while now so often I completely forgot about
> it..
> >> Only noticed when I changed computers and the version there could not
> >> handle this. So I guess it should be solid enough.
> >>
> >>
> > Awesome. I support the idea. Could you do a pull request, so TravisCI
> runs
> > the tests automatically?
> >
> > -Eric
> >
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Tue Feb 26 16:45:00 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Feb 2013 21:45:00 +0000
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
Message-ID: <CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>

On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
>> Should I make a test too? I reckon it would be a good thing to add?
>>

On Tue, Feb 26, 2013 at 7:47 PM, David Cain <davidjosephcain at gmail.com> wrote:
> ...
>
> Personally, I think a unit test would be really helpful in ensuring
> chain-less residues and the like will save appropriately.

Absolutely, +1 on adding a test or two for this new functionality.

And if there is anywhere in the Tutorial or docstrings which would
benefit from mentioning this too, could you update that too please?

Thanks,

Peter


From anaryin at gmail.com  Wed Feb 27 04:25:26 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 27 Feb 2013 10:25:26 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
Message-ID: <CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>

I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ).

The whitespace issue is solved I think. What are the rules exactly? Sorry
if I'm at a bit of a loss here..

I added tests for the save functions (a full structure and a single
residue) as well as one for the chainless residue. I added the suggestion
from David to keep the id in the parent if there is one.

I reverted the commit and added the same (- the whitespace) and another
with tests. If it looks ok, I'll make a pull request (if I can find the
button, never did that..).

Cheers,

Jo?o


2013/2/26 Peter Cock <p.j.a.cock at googlemail.com>

> On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> >> Should I make a test too? I reckon it would be a good thing to add?
> >>
>
> On Tue, Feb 26, 2013 at 7:47 PM, David Cain <davidjosephcain at gmail.com>
> wrote:
> > ...
> >
> > Personally, I think a unit test would be really helpful in ensuring
> > chain-less residues and the like will save appropriately.
>
> Absolutely, +1 on adding a test or two for this new functionality.
>
> And if there is anywhere in the Tutorial or docstrings which would
> benefit from mentioning this too, could you update that too please?
>
> Thanks,
>
> Peter
>


From p.j.a.cock at googlemail.com  Wed Feb 27 11:34:42 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 16:34:42 +0000
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
	<CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
Message-ID: <CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>

On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ).
>
> The whitespace issue is solved I think. What are the rules exactly? Sorry if
> I'm at a bit of a loss here..

PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines

(Currently an aspiration for the Biopython code, rather than a strict
requirement)

> I added tests for the save functions (a full structure and a single residue)
> as well as one for the chainless residue. I added the suggestion from David
> to keep the id in the parent if there is one.
>
> I reverted the commit and added the same (- the whitespace) and another with
> tests. If it looks ok, I'll make a pull request (if I can find the button,
> never did that..).

GitHub have made it quite easy, but the first time is always the hardest.
Good luck, and if you get stuck we can try to help or just pull the commits
in directly from your fork.

Thanks,

Peter


From anaryin at gmail.com  Wed Feb 27 11:41:45 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 27 Feb 2013 17:41:45 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
	<CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
	<CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>
Message-ID: <CAJ9sUYNkNB0Hcna3PDOCB6GnDLJ2ZZQB0xKHcK6_eTCwBGg01Q@mail.gmail.com>

Ok, done I guess: https://github.com/biopython/biopython/pull/165/files

Thanks for all the input!


2013/2/27 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> > I'll have a look at the tutorial later (I think it is in the Bio.PDB
> FAQ).
> >
> > The whitespace issue is solved I think. What are the rules exactly?
> Sorry if
> > I'm at a bit of a loss here..
>
> PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines
>
> (Currently an aspiration for the Biopython code, rather than a strict
> requirement)
>
> > I added tests for the save functions (a full structure and a single
> residue)
> > as well as one for the chainless residue. I added the suggestion from
> David
> > to keep the id in the parent if there is one.
> >
> > I reverted the commit and added the same (- the whitespace) and another
> with
> > tests. If it looks ok, I'll make a pull request (if I can find the
> button,
> > never did that..).
>
> GitHub have made it quite easy, but the first time is always the hardest.
> Good luck, and if you get stuck we can try to help or just pull the commits
> in directly from your fork.
>
> Thanks,
>
> Peter
>


From p.j.a.cock at googlemail.com  Wed Feb 27 17:32:35 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 22:32:35 +0000
Subject: [Biopython-dev] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for
	abstracts
In-Reply-To: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
Message-ID: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>

The new bioinformatics mini-symposium this year makes SciPy 2013
especially interesting.

Peter

---------- Forwarded message ----------
From: *Jonathan Rocher*
Date: Wednesday, February 27, 2013
Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
Discussion of Numerical Python <numpy-discussion at scipy.org>


[Apologies for cross-posts]

Dear all,

The annual SciPy Conference (Scientific Computing with
Python)<http://conference.scipy.org/scipy2013/about.php> allows
participants from academic, commercial, and governmental organizations to
showcase their latest projects, learn from skilled users and developers,
and collaborate on code development. *The deadline for abstract submissions
is March 20th, 2013. *

Submissions are welcome that address general Scientific Computing with
Python, one of the two special themes for this years conference (machine
learning & reproducible science), or the domain-specific
mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
during the conference (Meteorology, climatology, and atmospheric and
oceanic science, Astronomy and astrophysics, Medical imaging,
Bio-informatics).

Please submit your abstract at the SciPy 2013 website abstract submission
form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
Abstracts will be accepted for posters or presentations. Optional papers to
be published in the conference proceedings will be requested following
abstract submission. This year the proceedings will be made available prior
to the conference to help attendees navigate the conference.

We look forward to an exciting and interesting set of talks, posters, and
discussions and hope to see you at the conference.
The SciPy 2013 Program Committee Chairs

Matt McCormick, Kitware, Inc.
Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory

From redmine at redmine.open-bio.org  Wed Feb 27 20:53:22 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 01:53:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO
Message-ID: <redmine.issue-3419.20130228015322@redmine.open-bio.org>


Issue #3419 has been reported by Jason Stajich.

----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Wed Feb 27 20:53:23 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 01:53:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO
Message-ID: <redmine.issue-3419.20130228015322@redmine.open-bio.org>


Issue #3419 has been reported by Jason Stajich.

----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb 28 02:08:50 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 07:08:50 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO
References: <redmine.issue-3419.20130228015322@redmine.open-bio.org>
Message-ID: <redmine.journal-15112.20130228070850@redmine.open-bio.org>


Issue #3419 has been updated by Wibowo Arindrarto.


Hi Jason,

Thanks for the report :). Do you have an example file handy which I can try to include in our test suite? The FASTA parser was not tested using [t]fast[y|x], so there may be some lines / cases which the parser couldn't handle.
----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb 28 02:38:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 07:38:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO
References: <redmine.issue-3419.20130228015322@redmine.open-bio.org>
Message-ID: <redmine.journal-15113.20130228073820@redmine.open-bio.org>


Issue #3419 has been updated by Jason Stajich.

File bll0026-vs-94.tfasty added

Here is a -m 10 report.

I made this local patch to get it to report the strands, but this is not quite right because you actually don't have a strand for the query which is the protein.

diff --git a/Bio/SearchIO/FastaIO.py b/Bio/SearchIO/FastaIO.py
index ca08797..794efb8 100644
--- a/Bio/SearchIO/FastaIO.py
+++ b/Bio/SearchIO/FastaIO.py
@@ -197,7 +197,7 @@ def _set_hsp_seqs(hsp, parsed, program):
         # set seq and alphabet
         setattr(hsp.fragment, seq_type, parsed[seq_type]['seq'])
 
-        if alphabet is not generic_protein:
+        if alphabet is not generic_protein or 'tfast' in program:
             # get strand from coordinate; start <= end is plus
             # start > end is minus
             if start <= end:

In BioPerl I solved this by writing explicit code for the TBLASTN/TFAST[XY] and BLASTX/FAST[XY] situations which then new whether the query or subject was translated DNA with a strand or input DNA.
----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From chapmanb at 50mail.com  Thu Feb 28 21:25:42 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Feb 2013 21:25:42 -0500
Subject: [Biopython-dev] Coming soon: BOSC/Broad Hackathon,
	SciPy Bioinformatics, BOSC Codefest
Message-ID: <87lia7ua8p.fsf@fastmail.fm>


Hi all; 
There are some upcoming coding events and conferences of interest to open source
biology programmers:

- BOSC/Broad Interoperability Hackathon -- This is a two day coding session at
  the Broad Institute in Cambridge, MA on April 7-8 focused on improving tool
  interoperability.
  
  Sign up and details: http://j.mp/XJT6ew
  
- SciPy 2013 -- The Scientific Python conference is June 26-27 in Austin and has
  a Bioinformatics mini-symposia this year. They're doing some great work like
  IPython, NumPy, SciPy and scikit-learn; and this is a nice opportunity to reach a
  new set of like minded programmers and expand the open source bioinformatics
  community.
  
  Bioinformatics mini-symposia: http://j.mp/Z4xxXB
  Abstract details: http://conference.scipy.org/scipy2013/about.php
  
- Codefest at the Bioinformatics Open Source Conference -- This year BOSC is
  taking place in Berlin from July 19-20 and we'll have a two day coding session
  before the conference. This is the 4th year of Codefests and they've proven to
  be a productive and fun time to work collectively on open source projects.

  Sign up and details: http://www.open-bio.org/wiki/Codefest_2013
  BOSC conference: http://www.open-bio.org/wiki/BOSC_2013

Here are the key dates for the events and abstracts:

March   20, 2013: SciPy abstracts due
April  7-8, 2013: BOSC/Broad Interoperability Hackathon, Cambridge, MA
April   12, 2013: BOSC abstracts due
June 24-29, 2013: SciPy in Austin, TX
July 17-18, 2013: Codefest 2013, Berlin
July 19-20, 2013: BOSC 2013, Berlin

Looking forward to seeing everyone this spring and summer for plenty of fun
science and code,
Brad

From chapmanb at 50mail.com  Thu Feb 28 21:36:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Feb 2013 21:36:34 -0500
Subject: [Biopython-dev] [ANN] SciPy2013: Call for abstracts
In-Reply-To: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
	<CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
Message-ID: <87ppzjsv65.fsf@fastmail.fm>


Peter;
Thanks for sending this out. I'm helping with the organization of the
SciPy bioinformatics session thanks to Peter's recommendation and wrote
up a little bit about the types of abstracts that would fit will with
the overall theme of SciPy:

http://j.mp/Z4xxXB

This is a great chance to connect with another open source scientific
community so definitely send in an abstract if this is of interest; the
deadline is coming up next month: March 20th. Austin also has awesome
music and barbecue in addition to science and hacking so lots of reasons
to attend,
Brad


> The new bioinformatics mini-symposium this year makes SciPy 2013
> especially interesting.
>
> Peter
>
> ---------- Forwarded message ----------
> From: *Jonathan Rocher*
> Date: Wednesday, February 27, 2013
> Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
> To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> [Apologies for cross-posts]
>
> Dear all,
>
> The annual SciPy Conference (Scientific Computing with
> Python)<http://conference.scipy.org/scipy2013/about.php> allows
> participants from academic, commercial, and governmental organizations to
> showcase their latest projects, learn from skilled users and developers,
> and collaborate on code development. *The deadline for abstract submissions
> is March 20th, 2013. *
>
> Submissions are welcome that address general Scientific Computing with
> Python, one of the two special themes for this years conference (machine
> learning & reproducible science), or the domain-specific
> mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
> during the conference (Meteorology, climatology, and atmospheric and
> oceanic science, Astronomy and astrophysics, Medical imaging,
> Bio-informatics).
>
> Please submit your abstract at the SciPy 2013 website abstract submission
> form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
> Abstracts will be accepted for posters or presentations. Optional papers to
> be published in the conference proceedings will be requested following
> abstract submission. This year the proceedings will be made available prior
> to the conference to help attendees navigate the conference.
>
> We look forward to an exciting and interesting set of talks, posters, and
> discussions and hope to see you at the conference.
> The SciPy 2013 Program Committee Chairs
>
> Matt McCormick, Kitware, Inc.
> Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

From mjldehoon at yahoo.com  Fri Feb  1 05:35:18 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 31 Jan 2013 21:35:18 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
Message-ID: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Peter, Bow,

> > I'm OK with using the setUp and tearDown arguments to
> > doctest.DocTestSuite to do the directory magic, but
> keeping the test files
> > under Tests/.
> 
> As a more elegant version of the Bio._utils.run_doctest()
> function?

Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers).

Best,
-Michiel.


From w.arindrarto at gmail.com  Fri Feb  1 10:29:59 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 11:29:59 +0100
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
	<1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>

Hi Michiel, Peter, everyone,

>> > I'm OK with using the setUp and tearDown arguments to
>> > doctest.DocTestSuite to do the directory magic, but
>> keeping the test files
>> > under Tests/.
>>
>> As a more elegant version of the Bio._utils.run_doctest()
>> function?
>
> Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers).

Just to be clear, we are:

* changing all module's doctest file path to use relative paths (with
respect to the module's location),
* replacing the run_doctest() import with a simpler doctest import and
`doctest.testmod()` in each module having this doctest
* resorting to setUp and tearDown in the DocTestSuite in
`run_tests.py` so that each module / submodule can find their test
files
* and refactoring all string functions in Bio._utils to Bio.Phylo and
Bio.SearchIO, so that we can remove Bio._utils,

right?

I'd be happy to give this a shot if everyone feels the same :).

Regards,
Bow


From p.j.a.cock at googlemail.com  Fri Feb  1 11:07:22 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 11:07:22 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
References: <CAKVJ-_6RR-gZ+oCmW+N2hBnutvk7OvwxiFDym_y5NZ+KGi0Sow@mail.gmail.com>
	<1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com>
	<CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
Message-ID: <CAKVJ-_4ZYT37J2aA=JSm0WGMe50gfH6u4G2o6eLTBsEaT9q6LA@mail.gmail.com>

On Fri, Feb 1, 2013 at 10:29 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Michiel, Peter, everyone,
>
>>> > I'm OK with using the setUp and tearDown arguments to
>>> > doctest.DocTestSuite to do the directory magic, but
>>> > keeping the test files
>>> > under Tests/.
>>>
>>> As a more elegant version of the Bio._utils.run_doctest()
>>> function?
>>
>> Exactly. Bow, do you want to give this approach a try?
>> (Assuming that there are no further objections from the other developers).
>
> Just to be clear, we are:
>
> * changing all module's doctest file path to use relative paths (with
> respect to the module's location),
> * replacing the run_doctest() import with a simpler doctest import and
> `doctest.testmod()` in each module having this doctest
> * resorting to setUp and tearDown in the DocTestSuite in
> `run_tests.py` so that each module / submodule can find their test
> files

That wasn't my understanding - I thought we we just talking about
making the Bio._utils.run_doctest() use setUp and tearDown to
take care of the path changes (although I'm not sure if that will
actually be any shorter - we'd find out).

> * and refactoring all string functions in Bio._utils to Bio.Phylo and
> Bio.SearchIO, so that we can remove Bio._utils,

I'm not particularly bothered either way on this. Having misc utilities
like this under Bio.Phylo or Bio.SearchIO makes is clear where they
are used, and makes it easier to compartmentalise functionality.

Regards,

Peter


From mjldehoon at yahoo.com  Fri Feb  1 11:23:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 03:23:15 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
Message-ID: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Bow,

Yes, that is correct.
Responding to Peter's email: Peter, do you agree with this approach?

Best,
-Michiel.

--- On Fri, 2/1/13, Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:

> From: Wibowo Arindrarto <w.arindrarto at gmail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Peter Cock" <p.j.a.cock at googlemail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 5:29 AM
> Hi Michiel, Peter, everyone,
> 
> >> > I'm OK with using the setUp and tearDown
> arguments to
> >> > doctest.DocTestSuite to do the directory
> magic, but
> >> keeping the test files
> >> > under Tests/.
> >>
> >> As a more elegant version of the
> Bio._utils.run_doctest()
> >> function?
> >
> > Exactly. Bow, do you want to give this approach a try?
> (Assuming that there are no further objections from the
> other developers).
> 
> Just to be clear, we are:
> 
> * changing all module's doctest file path to use relative
> paths (with
> respect to the module's location),
> * replacing the run_doctest() import with a simpler doctest
> import and
> `doctest.testmod()` in each module having this doctest
> * resorting to setUp and tearDown in the DocTestSuite in
> `run_tests.py` so that each module / submodule can find
> their test
> files
> * and refactoring all string functions in Bio._utils to
> Bio.Phylo and
> Bio.SearchIO, so that we can remove Bio._utils,
> 
> right?
> 
> I'd be happy to give this a shot if everyone feels the same
> :).
> 
> Regards,
> Bow
> 


From p.j.a.cock at googlemail.com  Fri Feb  1 11:51:16 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 11:51:16 +0000
Subject: [Biopython-dev] Trie with_prefix doesn't work as expected
In-Reply-To: <CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>
References: <CAEe6yUEX_xbawFkP74-yKO+eTKo_mAU9SGO-+0PoRwU0mA8=vw@mail.gmail.com>
	<CAKVJ-_5XvS899Oi7=3oDMW713oHnz6nNvEx5C-rRMUa=aashvQ@mail.gmail.com>
	<CAKVJ-_7i4+MY6OSYkm+_tqbV_ndwCvmGc=nW0gMG9PnoEobyGA@mail.gmail.com>
Message-ID: <CAKVJ-_7dyse3-+n9kL8+P9hJysCDqS4q_dg7MKTkhfxOrbxo9A@mail.gmail.com>

On Thu, Jan 31, 2013 at 11:38 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list),
> which I have applied to the repository:
> https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74
>
> I've also added a unit test based on Kevin's example:
> https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a
>
> Thank you for reporting this Kevin.
>
> Peter
>
> P.S. Nice to hear from you again Jeff :)
>
> I think your last commit was before we moved from CVS to git, please
> let us know if you want commit access on github.

Thanks again to Kevin for another test case, and a Jeff for another quick
code fix where a trie key exceeded the MAX_KEY_LENGTH buffer:

https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b

Peter


From redmine at redmine.open-bio.org  Fri Feb  1 11:51:51 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 1 Feb 2013 11:51:51 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15070.20130201115151@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Kevin Wu reported a related issue, which we discussed with Jeff Chang (off list), where a key in the trie exceeded 1000 bytes (the original value of MAX_KEY_LENGTH). See:
http://lists.open-bio.org/pipermail/biopython-dev/2013-February/010284.html
https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b

(Ideally we could give a specific ValueError exception here, but nevertheless the current print message is an improvement)
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Fri Feb  1 12:14:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 12:14:49 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CADEGkF5q1oVwimfNAqrHPDvGvHa8JWOsWAeEoAEpZWzGyYGxgw@mail.gmail.com>
	<1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>

>Wibowo Arindrarto <w.arindrarto at gmail.com> wrote:
>> Just to be clear, we are:
>>
>> * changing all module's doctest file path to use relative
>> paths (with
>> respect to the module's location),
>> * replacing the run_doctest() import with a simpler doctest
>> import and
>> `doctest.testmod()` in each module having this doctest
>> * resorting to setUp and tearDown in the DocTestSuite in
>> `run_tests.py` so that each module / submodule can find
>> their test
>> files
>> * and refactoring all string functions in Bio._utils to
>> Bio.Phylo and
>> Bio.SearchIO, so that we can remove Bio._utils,
>>
>> right?
>>
>> I'd be happy to give this a shot if everyone feels the same
>> :).
>>

On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Bow,
>
> Yes, that is correct.
> Responding to Peter's email: Peter, do you agree with this approach?
>
> Best,
> -Michiel.

No. I think we have misunderstood each other on the doctest
discussion :(

If we keep the test files under Tests/ (and I think that is best)
then for example look at this doctest in Bio/SeqRecord.py

        >>> from Bio import SeqIO
        >>> record = SeqIO.read(open("Fasta/sweetpea.nu"),"fasta")
        >>> len(record)
        309

That is currently written to assume it is run from the Tests/
folder. If we write this assuming is it in the Bio/ folder where
the Python file SeqRecord.py lives, it becomes:

        >>> from Bio import SeqIO
        >>> record = SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta")
        >>> len(record)
        309

I think a beginner would find that more confusing. It is also longer
and we already have trouble with some lines exceeding 80 chars.

Ideally there would be a nice way for doctests to specify the folder,
and then we could use a simple filename like "sweetpea.nu" with
no directories at all. But I don't think that is possible without us
making the testing infrastructure even more complicated.

--

If we want to get rid of Bio._utils.run_doctest() (and the whole of
the file Bio/_utils.py) then I would prefer reverting to the old situation
prior to adding the Bio._utils.run_doctest() helper function.

If the repetitive code snippets to run the doctests of a module are a
problem it can be shortened to something less flexible, for example
in Bio/SeqRecord.py could use something very short like this:

if __name__ == "__main__":
    assert os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
    import doctest
    doctest.testmod(verbose=2)

Or, as I suggested before, we can remove these development
convenience hooks completely?

--

On the subject of the string functions in Bio/_utils.py, I have no
objection to moving them back under Bio.SearchIO and/or
Bio.Phylo - which has advantages in terms of modularity (a
good thing for preventing accidental side effects).

Regards,

Peter


From mjldehoon at yahoo.com  Fri Feb  1 13:54:46 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 05:54:46 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
Message-ID: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Lenna,

> Regarding point (2), is your primary concern namespace clutter or
> importing efficiency? 

Regarding point (2), my primary concern is that a Bio.WWW module would group together modules that don't have much in common with each other. I agree to your point that the category of internet access is more fundamental than the category of parsers. But still, which modules should then go into a Bio.WWW module? Any module whose sole purpose is to use the internet (that would exclude Bio.Entrez)? Any module whose main purpose is to use the internet? This would be unclear; for example, Bio.Entrez may or may not fall in that category, depending on how you use the module. Any module whose functionality includes internet access? Then if one day we add access to the JASPAR database over the internet to Bio.Motif, it would have to move to Bio.WWW.

Currently most modules are organized by theme (Bio.Seq, Bio.Motif, Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one module, one chapter in the documentation, one test of unit tests, one set of doctests, which I think is a huge advantage both in terms of clarity and in terms of user experience.

Best,
-Michiel.

--- On Wed, 1/30/13, Lenna Peterson <arklenna at gmail.com> wrote:

From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] Namespace for online resources?
To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
Date: Wednesday,
 January 30, 2013, 12:10 PM

Michiel,?
You raise an excellent point that separating the modules in this way will complicate doctests.?
Regarding point (2), is your primary concern namespace clutter or importing efficiency??


I still maintain that the category of internet access is more fundamental than the category of parsers. For point (1), if every database is accessed using a WWW submodule, a user will know to look there.

Obviously moving everything would be a lot of work...
Cheers,?
Lenna


On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW:


1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW?


2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time.


3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results:


>>> from Bio import Entrez

>>> Entrez.email = "Your.Name.Here at example.org"

>>> handle = Entrez.einfo() # or esearch, efetch, ...

>>> record = Entrez.read(handle)

>>> handle.close()


The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access.


Best,

-Michiel.


--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:


> From: Peter Cock <p.j.a.cock at googlemail.com>

> Subject: Re: [Biopython-dev] Namespace for online resources?

> To: "Wibowo Arindrarto" <w.arindrarto at gmail.com>

> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>

> Date: Tuesday, January 29, 2013, 4:11 PM

> On Tue, Jan 29, 2013 at 9:03 PM,

> Peter Cock <p.j.a.cock at googlemail.com>

> wrote:

> > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto

> > <w.arindrarto at gmail.com>

> wrote:

> >> Hi everyone,

> >>

> >> Why was Bio.WWW deprecated in the first place?

> >>

> >

> > The flippant answer is everything under Bio.WWW was

> moved

> > or deprecated:

> > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html

> >

> > I'm trying to identify the discussions prior to that

> covering the moves:

> >

> > Bio.WWW.ExPASy -> Bio.ExPASy

> > Bio.WWW.InterPro -> Bio.InterPro

> > Bio.WWW.NCBI -> Bio.Entrez

> > Bio.WWW.SCOP -> Bio.SCOP

>

> Probably this thread,

> http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html

>

> Also a bit more background on the NCBI Entrez side:

> http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html

>

> Peter

> _______________________________________________

> Biopython-dev mailing list

> Biopython-dev at lists.open-bio.org

> http://lists.open-bio.org/mailman/listinfo/biopython-dev

>

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Fri Feb  1 14:14:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:14:56 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>

On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Lenna,
>
>> Regarding point (2), is your primary concern namespace clutter or
>> importing efficiency?
>
> Regarding point (2), my primary concern is that a Bio.WWW module would
> group together modules that don't have much in common with each other. I
> agree to your point that the category of internet access is more fundamental
> than the category of parsers. But still, which modules should then go into a
> Bio.WWW module? Any module whose sole purpose is to use the internet (that
> would exclude Bio.Entrez)? Any module whose main purpose is to use the
> internet? This would be unclear; for example, Bio.Entrez may or may not fall
> in that category, depending on how you use the module. Any module whose
> functionality includes internet access? Then if one day we add access to the
> JASPAR database over the internet to Bio.Motif, it would have to move to
> Bio.WWW.
>
> Currently most modules are organized by theme (Bio.Seq, Bio.Motif,
> Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one
> module, one chapter in the documentation, one test of unit tests, one set of
> doctests, which I think is a huge advantage both in terms of clarity and in
> terms of user experience.

Also with the theme approach, most (if not all) the themes are likely to
have some online resources (databases or remote APIs). On those
grounds it makes sense to keep online motif functionality (like weblogo)
under Bio.Motif, and so on.

People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
(which could be a big disruption with lots of code relocation)

People leaning against a Bio.WWW grouping: Michiel, Peter (me)
(which would also be the status quo, so no disruption).

In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.

Regards,

Peter


From mjldehoon at yahoo.com  Fri Feb  1 14:12:38 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 06:12:38 -0800 (PST)
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>
Message-ID: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Hi Peter,

As we misunderstood each other, let me try once to make the case for putting test files in Bio/*. If I fail to convince you, let's either go back to the situation before Bio._utils, or remove the "if __name__ == '__main__':" stuff altogether.

First of all, if we use "if __name__ == '__main__':" to run the docstring tests, then those tests should pass if a user executes the script. Otherwise, we have installed some code that makes no sense outside of the distribution. This is also a problem with the os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
solution, as after installation there is no Tests/ folder any more.

Suppose we make a subdirectory Examples in each module that uses docstring tests which need some data files, and put the data files in the Examples subdirectory. The docstring tests are supposed to be simple (full testing is done by the unittests), so the example data files can be tiny.

The docstring tests can then use
>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta")
which is simple enough.
The unit tests can switch to the appropriate directory when running the docstring tests.
A user, finding the example in the docstring tests, can try out the example directly, since the data file is provided together with the relevant module.
And since the data file is in the subdirectory Examples/, there is still some separation between the code and the data.

Best,
-Michiel.

--- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Wibowo Arindrarto" <w.arindrarto at gmail.com>, "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 7:14 AM
> >Wibowo Arindrarto <w.arindrarto at gmail.com>
> wrote:
> >> Just to be clear, we are:
> >>
> >> * changing all module's doctest file path to use
> relative
> >> paths (with
> >> respect to the module's location),
> >> * replacing the run_doctest() import with a simpler
> doctest
> >> import and
> >> `doctest.testmod()` in each module having this
> doctest
> >> * resorting to setUp and tearDown in the
> DocTestSuite in
> >> `run_tests.py` so that each module / submodule can
> find
> >> their test
> >> files
> >> * and refactoring all string functions in
> Bio._utils to
> >> Bio.Phylo and
> >> Bio.SearchIO, so that we can remove Bio._utils,
> >>
> >> right?
> >>
> >> I'd be happy to give this a shot if everyone feels
> the same
> >> :).
> >>
> 
> On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Bow,
> >
> > Yes, that is correct.
> > Responding to Peter's email: Peter, do you agree with
> this approach?
> >
> > Best,
> > -Michiel.
> 
> No. I think we have misunderstood each other on the doctest
> discussion :(
> 
> If we keep the test files under Tests/ (and I think that is
> best)
> then for example look at this doctest in Bio/SeqRecord.py
> 
> ? ? ? ? >>> from Bio import
> SeqIO
> ? ? ? ? >>> record =
> SeqIO.read(open("Fasta/sweetpea.nu"),"fasta")
> ? ? ? ? >>> len(record)
> ? ? ? ? 309
> 
> That is currently written to assume it is run from the
> Tests/
> folder. If we write this assuming is it in the Bio/ folder
> where
> the Python file SeqRecord.py lives, it becomes:
> 
> ? ? ? ? >>> from Bio import
> SeqIO
> ? ? ? ? >>> record =
> SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta")
> ? ? ? ? >>> len(record)
> ? ? ? ? 309
> 
> I think a beginner would find that more confusing. It is
> also longer
> and we already have trouble with some lines exceeding 80
> chars.
> 
> Ideally there would be a nice way for doctests to specify
> the folder,
> and then we could use a simple filename like "sweetpea.nu"
> with
> no directories at all. But I don't think that is possible
> without us
> making the testing infrastructure even more complicated.
> 
> --
> 
> If we want to get rid of Bio._utils.run_doctest() (and the
> whole of
> the file Bio/_utils.py) then I would prefer reverting to the
> old situation
> prior to adding the Bio._utils.run_doctest() helper
> function.
> 
> If the repetitive code snippets to run the doctests of a
> module are a
> problem it can be shortened to something less flexible, for
> example
> in Bio/SeqRecord.py could use something very short like
> this:
> 
> if __name__ == "__main__":
> ? ? assert os.path.isfile("Fasta/sweetpea.nu"),
> "Run from Tests/ folder"
> ? ? import doctest
> ? ? doctest.testmod(verbose=2)
> 
> Or, as I suggested before, we can remove these development
> convenience hooks completely?
> 
> --
> 
> On the subject of the string functions in Bio/_utils.py, I
> have no
> objection to moving them back under Bio.SearchIO and/or
> Bio.Phylo - which has advantages in terms of modularity (a
> good thing for preventing accidental side effects).
> 
> Regards,
> 
> Peter
> 


From p.j.a.cock at googlemail.com  Fri Feb  1 14:32:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:32:46 +0000
Subject: [Biopython-dev] Deprecating Bio.ParserSupport,
	Bio.Blast.NCBIStandalone
In-Reply-To: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <CAKVJ-_4ngvs=95MwciGZUtv3Q4R+Anw4-=XLgNPJQbBsqM0b-g@mail.gmail.com>
	<1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_4amfbk6tDLva0eW1TqixF4wNSSoEqx+yE8xJNzT-UYHg@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:12 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
> As we misunderstood each other, let me try once to make the case for
> putting test files in Bio/*. If I fail to convince you, let's either go back
> to the situation before Bio._utils, or remove the "if __name__ ==
> '__main__':" stuff altogether.

I'm not convinced about putting test files under Bio/* so lets revert
the use of the helper function Bio._utils.run_doctest(), and if you
wish proceed with removing Bio/_utils.py as well.

Shall I go ahead and revert 8b59d89bb4e282192ddee751e24ceef4afa63528
then remove run_doctest and find_test_dir from Bio/_utils.py now?

> First of all, if we use "if __name__ == '__main__':" to run the docstring
> tests, then those tests should pass if a user executes the script.
> Otherwise, we have installed some code that makes no sense outside of the
> distribution. This is also a problem with the
> os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder"
> solution, as after installation there is no Tests/ folder any more.

That is a good point, this has always been a weakness of the __main__
hook to run the doctests.

> Suppose we make a subdirectory Examples in each module that uses docstring
> tests which need some data files, and put the data files in the Examples
> subdirectory. The docstring tests are supposed to be simple (full testing is
> done by the unittests), so the example data files can be tiny.
>
> The docstring tests can then use
>>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta")
> which is simple enough.
> The unit tests can switch to the appropriate directory when running the
> docstring tests.
> A user, finding the example in the docstring tests, can try out the
> example directly, since the data file is provided together with the relevant
> module.
> And since the data file is in the subdirectory Examples/, there is still
> some separation between the code and the data.

Did you envision installing the examples subdirectories next to the code
under site-packages? Technically that is doable, but I'm not sure if that
is considered good practice (does anyone know the relevant Debian
policies for example - they're quite keen on this kind of thing?).

I much prefer the simplicity of having all the test files in one place
(under the Tests/ folder) especially as things like simple FASTA files
get used in doctests and unittests for multiple different areas of
Biopython.

Regards,

Peter


From p.j.a.cock at googlemail.com  Fri Feb  1 14:56:02 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 14:56:02 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
	<1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter and all,
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> We need to say something about this in the NEWS file too.
>
> Done.
>
>> I think it would make sense to add a PendingDeprecationWarning
>> to Bio.Motif now.
>
> Done.

Thanks.

>> Also, if you feel the new Bio.motifs API isn't quite
>> settled yet, adding the new BiopythonExperimentalWarning to
>> that makes sense.
>
> I don't expect big changes in the API, so I think we can do without the
> BiopythonExperimentalWarning. Also we should avoid the situation
> that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a
> BiopythonExperimentalWarning.

Agreed.

>> (And once this is settled, I think we can schedule the
>> release)
>
> We should also check whether we can remove deprecated modules,
> or deprecate modules that are currently declared obsolete. Or has
> somebody done that already?

I went over the list in the DEPRECATED file last month, but a second
check would be a good idea.

Peter


From mjldehoon at yahoo.com  Fri Feb  1 14:53:06 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 06:53:06 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
Message-ID: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>

Hi Peter and all,

--- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> We need to say something about this in the NEWS file too.

Done.

> I think it would make sense to add a PendingDeprecationWarning
> to Bio.Motif now.

Done.

> Also, if you feel the new Bio.motifs API isn't quite
> settled yet, adding the new BiopythonExperimentalWarning to
> that makes sense.

I don't expect big changes in the API, so I think we can do without the BiopythonExperimentalWarning. Also we should avoid the situation that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a BiopythonExperimentalWarning.

> (And once this is settled, I think we can schedule the
> release)

We should also check whether we can remove deprecated modules, or deprecate modules that are currently declared obsolete. Or has somebody done that already?

Best,
-Michiel


From p.j.a.cock at googlemail.com  Fri Feb  1 15:03:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:03:24 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
Message-ID: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>

Hello all,

I think we're overdue for a Biopython release now, and I would
like to do this next week. There are still plenty more additions
and enhancements waiting in the wings, but right now I just
want any remaining bug fixes addressed.

Are there any release blocking issues?

Thanks,

Peter


From w.arindrarto at gmail.com  Fri Feb  1 15:29:09 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 16:29:09 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>

Hi Peter,

> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?

There's still one bug for Bio.SearchIO that I would prefer to be fixed
(https://redmine.open-bio.org/issues/3400). Is it possible to wait a
few more days (no later than next week I hope) to sort this bug out?

Also, since this is our first release with the
BiopythonExperimentalWarning, I was thinking maybe we can include some
modules that have been in the waiting line. One that I can think of
right now is Andrew's MafIO (re: the recent mention as well).
Considering that some people have started using it, maybe we can
release it under a BiopythonExperimentalWarning.

And later down the line, perhaps we can include Brad's GTF/GFF parser
(seeing that this is already included in the wiki, maybe it's a good
time to start considering where to put it)? Brad, if you don't mind,
perhaps we can start working on this as well.

Regards,
Bow


From p.j.a.cock at googlemail.com  Fri Feb  1 15:40:03 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:40:03 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
Message-ID: <CAKVJ-_5UvtrCVkHVYqDdw7xwbX_h7cqPMKsV9UAHJWJY1NM7Mw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:29 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
>
>> I think we're overdue for a Biopython release now, and I would
>> like to do this next week. There are still plenty more additions
>> and enhancements waiting in the wings, but right now I just
>> want any remaining bug fixes addressed.
>>
>> Are there any release blocking issues?
>
> There's still one bug for Bio.SearchIO that I would prefer to be fixed
> (https://redmine.open-bio.org/issues/3400). Is it possible to wait a
> few more days (no later than next week I hope) to sort this bug out?

A few days sure - but that is a small enough issue (and in a clearly
marked 'here be dragons experimental code' section) that I don't think
it should delay the whole release.

> Also, since this is our first release with the
> BiopythonExperimentalWarning, I was thinking maybe we can include some
> modules that have been in the waiting line. One that I can think of
> right now is Andrew's MafIO (re: the recent mention as well).
> Considering that some people have started using it, maybe we can
> release it under a BiopythonExperimentalWarning.
>
> And later down the line, perhaps we can include Brad's GTF/GFF parser
> (seeing that this is already included in the wiki, maybe it's a good
> time to start considering where to put it)? Brad, if you don't mind,
> perhaps we can start working on this as well.

Both examples of things I would like to do *after* shipping
Biopython 1.61, which I feel is already overdue.

Regards,

Peter


From mjldehoon at yahoo.com  Fri Feb  1 15:39:15 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 07:39:15 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>
Message-ID: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> I went over the list in the DEPRECATED file last month, but
> a second check would be a good idea.

The following were declared obsolete in Biopython 1.60, and can in principle be declared deprecated in Biopython 1.61:

----------
Bio/Blast/Applications.py:
BlastallCommandline
BlastpgpCommandline
RpsBlastCommandline

Bio/Blast/NCBIStandalone.py overall, and specifically:
blastall
blastpgp
rpsblast

Bio/ParserSupport.py overall

Bio/PDB/AbstractPropertyMap.py:
The has_key function in class AbstractPropertyMap

Bio/PDB/FragmentMapper.py:
The has_key function in class FragmentMapper

Bio/UniGene/UniGene.py overall

In BioSQL/BioSeqDatabase.py:
  class DBServer:
     remove_database
  class BioSeqDatabase:
     get_all_primary_ids
     get_Seq_by_primary_id

-----------

These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:

Bio/Align/__init__.py:
  class MultipleSeqAlignment:
     get_column
     add_sequence

Bio/Align/Generic.py:
  class Alignment overall
    get_all_seqs
    get_seq_by_num

Bio/File.py:
  class StringHandle

Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
  class AbstractDrawer:
    _set_xcentre, _set_ycentre

Bio/Graphics/GenomeDiagram/_Graph.py:
  class GraphData:
    _set_centre

Bio/ParserSupport.py:
  SGMLStrippingConsumer

Bio/Seq.py:
  class Seq:
     .data property

Bio/SeqIO/SffIO.py:
  _sff_read_roche_index_xml
 
--------------------

The tostring() method of the class Seq in Bio/Seq.py:
Can we declare this obsolete?

-Michiel


From w.arindrarto at gmail.com  Fri Feb  1 15:47:14 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 16:47:14 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <510BE201.4090002@biotech.uni-tuebingen.de>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>

Hi Peter, Kai,


>> There's still one bug for Bio.SearchIO that I would prefer to be
>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
>> wait a few more days (no later than next week I hope) to sort this
>> bug out?
>
> Sorry for letting this slip for so long, but I never got around to
> write an actual test case.
>
> Bow, did we agree to use optionalcascade for now and then maybe
> refactor? I'm pretty confident the code works as-is, all the BioPython
> issues I've been running into with my production site have been in the
> GenBank/EMBL parsers. :)

Yes, we did :). I meant to do the optionalcascade refactor so the code
is more maintainable (and to prevent a corner case bug). But in
general, the optionalcascade fix looks to be fine. And for code marked
with the BiopythonExperimentalWarning, having a fix without test cases
seems better than no fix at all.

Peter, if you're fine with Kai's fix, I think we can mark this bug
solved and go on with the release. I'll add the test cases and
refactor the code later on.

Regards,
Bow


From p.j.a.cock at googlemail.com  Fri Feb  1 15:51:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 15:51:07 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CAKVJ-_7OPbA5pJXF6KX3D3Zk369thAbK0iUi2DMND4XYhpZxbQ@mail.gmail.com>
	<1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7XZ1F9aq7b+DAKyDCr95rAtRBw7aNFqjPksdQJCzJAdw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I went over the list in the DEPRECATED file last month, but
>> a second check would be a good idea.
>
> The following were declared obsolete in Biopython 1.60, and can
> in principle be declared deprecated in Biopython 1.61:
>
> ----------
> Bio/Blast/Applications.py:
> BlastallCommandline
> BlastpgpCommandline
> RpsBlastCommandline

My impression is there is still a sizeable group of people still using
blastall and the rest of legacy BLAST as it is mature reliable code,
while BLAST+ still has some rough edges. But as the NCBI themselves
have now stopped updating legacy BLAST, perhaps the time has come.

So if you want, deprecating the legacy BLAST wrappers seems OK.

> Bio/Blast/NCBIStandalone.py overall, and specifically:
> blastall
> blastpgp
> rpsblast

Given the SearchIO use of the plain text BLAST parser, I think we
agreed to leave that as is in the short term.

The command line calling functions blastall, blastpgp & rpsblast
the same applies as for BlastallCommandline, BlastpgpCommandline
and RpsBlastCommandline (above).

> Bio/ParserSupport.py overall

Given the SearchIO use of the plain text BLAST parser which uses
this, I think we agreed to leave that as is in the short term.

> Bio/PDB/AbstractPropertyMap.py:
> The has_key function in class AbstractPropertyMap
>
> Bio/PDB/FragmentMapper.py:
> The has_key function in class FragmentMapper

The Python dict lost the has_key function in Python 3, so it does
make sense to proceed with those deprecations.

> Bio/UniGene/UniGene.py overall
>

Yes, ready to deprecate.

> In BioSQL/BioSeqDatabase.py:
>   class DBServer:
>      remove_database
>   class BioSeqDatabase:
>      get_all_primary_ids
>      get_Seq_by_primary_id

Yes, ready to deprecate.

Thanks,

Peter


From p.j.a.cock at googlemail.com  Fri Feb  1 16:02:33 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:02:33 +0000
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
Message-ID: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:40 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
> PS: I'd have replied on the bug tracker, but for some reason I can't
> log in again, even after resetting the password. For some reason
> redmine hates me.
>

Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/
(it was left as a read only legacy listing, but it broke last year when
the old server started to die and isn't really worth fixing).

This was moved over to RedMine, along with all the other OBF
projects. This does have some git integration, but I'm not that
taken with it - and it is yet another service for the OBF team
to maintain.

What do people think of moving over to using GitHub issues?
This would link in very well with pull requests and makes linking
to commits much simpler too. One potential issue is if and how
we could have bug reports sent to the biopython-dev mailing list
(something we touched on recently for pull requests).

A full automated move could be possible (NumPy did this), but I
think a gradual move would be fine - stop filing new issues on
RedMine and use GitHub issues in future. There are only about
100 issues open at the moment anyway, and a manual migration
would also be a good way to review some of the older tickets.

Thoughts?,

Peter


From p.j.a.cock at googlemail.com  Fri Feb  1 16:04:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:04:10 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
Message-ID: <CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:47 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter, Kai,
>
>
>>> There's still one bug for Bio.SearchIO that I would prefer to be
>>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
>>> wait a few more days (no later than next week I hope) to sort this
>>> bug out?
>>
>> Sorry for letting this slip for so long, but I never got around to
>> write an actual test case.
>>
>> Bow, did we agree to use optionalcascade for now and then maybe
>> refactor? I'm pretty confident the code works as-is, all the BioPython
>> issues I've been running into with my production site have been in the
>> GenBank/EMBL parsers. :)
>
> Yes, we did :). I meant to do the optionalcascade refactor so the code
> is more maintainable (and to prevent a corner case bug). But in
> general, the optionalcascade fix looks to be fine. And for code marked
> with the BiopythonExperimentalWarning, having a fix without test cases
> seems better than no fix at all.

That sounds OK for now.

> Peter, if you're fine with Kai's fix, I think we can mark this bug
> solved and go on with the release. I'll add the test cases and
> refactor the code later on.

You mean this patch from https://redmine.open-bio.org/issues/3400 ?:
https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch

I can apply that if you want.

Peter


From redmine at redmine.open-bio.org  Fri Feb  1 16:04:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 1 Feb 2013 16:04:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3405] (New) to_networkx converts
	weights as string
Message-ID: <redmine.issue-3405.20130201160420@redmine.open-bio.org>


Issue #3405 has been reported by Aleksey Kladov.

----------------------------------------
Bug #3405: to_networkx converts weights as string
https://redmine.open-bio.org/issues/3405

Author: Aleksey Kladov
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


in the file /Bio/Phylo/_utils.py in the method add_edge(graph, n1, n2) there is a line

<pre> graph.add_edge(n1, n2, weight=str(n2.branch_length or 1.0)) </pre>


It's strange, because if weights are strings, then you are unable to find shortest paths due to

<pre>TypeError: unsupported operand type(s) for +: 'int' and 'str'</pre>


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 15:40:49 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 16:40:49 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
Message-ID: <510BE201.4090002@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-02-01 16:29, Wibowo Arindrarto wrote:

> There's still one bug for Bio.SearchIO that I would prefer to be
> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to
> wait a few more days (no later than next week I hope) to sort this
> bug out?

Sorry for letting this slip for so long, but I never got around to
write an actual test case.

Bow, did we agree to use optionalcascade for now and then maybe
refactor? I'm pretty confident the code works as-is, all the BioPython
issues I've been running into with my production site have been in the
GenBank/EMBL parsers. :)

Cheers,
Kai

PS: I'd have replied on the bug tracker, but for some reason I can't
log in again, even after resetting the password. For some reason
redmine hates me.


- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRC+IBAAoJEKM5lwBiwTTPlisH/1QSF+4jIx2jKycRCys1NPMj
6YwFTdKoGmIDYjEB+qge5PKNIHplN3EsGz6l4bRYZiWbqTlyvb5IUPHgwxFRigXg
VuSnR/k8faSLNuGJpoFezLmZ0yJoLslXztCUJ+HbWXB02K9uzYXovRg8AtfHlnOu
Qd9aNbyX/nzFrsayllTvYy9ZxcQNCH5Lrgm+EWMkuBptcMdBLjqSGkov5iE2g1bV
ItHacrQUPJXVIAMTXW9mSy3AXzTqjOjqfBwXsthLSyXHEv8ppcnIi4bmVX+XS//n
4vc+LdaxzgkENaw4P+60bikkFqey/GFoxaIzLACh4HFupRAjK+6NaUzGYPSEQXM=
=efd0
-----END PGP SIGNATURE-----


From p.j.a.cock at googlemail.com  Fri Feb  1 16:25:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:25:56 +0000
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
In-Reply-To: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
References: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zag-sKzmUE7W-dg86Nqkk4YcBMtr66N2ZZ2OhJgR7Fg@mail.gmail.com>

On Fri, Feb 1, 2013 at 4:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> What do people think of moving over to using GitHub issues?
> This would link in very well with pull requests and makes linking
> to commits much simpler too. One potential issue is if and how
> we could have bug reports sent to the biopython-dev mailing list
> (something we touched on recently for pull requests).
>

I've filled an issue for that (I couldn't find any open issue like it):
https://github.com/gitlabhq/gitlabhq/issues/2884

Peter


From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 16:27:13 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 17:27:13 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <510BEAEF.4070108@biotech.uni-tuebingen.de>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
	<CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
	<510BEAEF.4070108@biotech.uni-tuebingen.de>
Message-ID: <510BECE1.4020306@biotech.uni-tuebingen.de>

On 2013-02-01 17:18, Kai Blin wrote:

> That's not quite it. Let me update my bug3400 branch and submit a
> pull request. Will be ready in a minute.

https://github.com/biopython/biopython/pull/150

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From kai.blin at biotech.uni-tuebingen.de  Fri Feb  1 16:18:55 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Fri, 01 Feb 2013 17:18:55 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CADEGkF5xTKcvPtTUJsVfFXphOrS3k8BHf8_foLeovShA2gjHJg@mail.gmail.com>
	<510BE201.4090002@biotech.uni-tuebingen.de>
	<CADEGkF6kbXTsXcZXBKHxAM0yV8XO96d8FNSjFhMNen1pw8HYNw@mail.gmail.com>
	<CAKVJ-_7hzYHSc_xr5WJH2VyGQ=puzbrfkZB-qjghfhJv30T_-Q@mail.gmail.com>
Message-ID: <510BEAEF.4070108@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-02-01 17:04, Peter Cock wrote:

Hi Peter,

>> Peter, if you're fine with Kai's fix, I think we can mark this
>> bug solved and go on with the release. I'll add the test cases
>> and refactor the code later on.
> 
> You mean this patch from https://redmine.open-bio.org/issues/3400
> ?: 
> https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch
>
>  I can apply that if you want.

That's not quite it. Let me update my bug3400 branch and submit a pull
request. Will be ready in a minute.

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRC+rvAAoJEKM5lwBiwTTPYH4H+QGiY5cyN7tFjT2RZGN28Pp8
2t/RbW9bYakVqKHtZR6xXu4QF48jCmHkkER0cMvDuKcWrjko/xAWSGuNqWK59rHe
b7t9CgGywYC9KdhPih+pG5HzKuc9ZP1c2unK/e+c+y8rrFZTUoB1e2AbGqzg163S
qplu0RGv8kSOMXmGVFNj+iZ/AJnN735Tp5gfzFHfudS13kzfqW+Mq1+DlSG1GOwM
Y99kc6Uc5WFHmHME4pDdlLBGyKVd+9LlQnTeApBjWnBDcRBMyXI0HIck6Bw64swH
BvPIz2yq3PEnhvgI0v0A9lO1xR0Yj9wGQGr8XGPLq0UHh0W0O0P1I8YbMCVHkPg=
=kCtp
-----END PGP SIGNATURE-----


From p.j.a.cock at googlemail.com  Fri Feb  1 16:50:57 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 16:50:57 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?
>
> Thanks,
>
> Peter

I won't have time to look at it today, but the BLAST+ wrappers
need updating for the BLAST 2.2.27+ release, e.g. new arg
frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).

Any volunteers? This should be a small job...

Peter


From w.arindrarto at gmail.com  Fri Feb  1 17:37:57 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:37:57 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
Message-ID: <CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>

Hi Peter,

>> I think we're overdue for a Biopython release now, and I would
>> like to do this next week. There are still plenty more additions
>> and enhancements waiting in the wings, but right now I just
>> want any remaining bug fixes addressed.
>>
>> Are there any release blocking issues?
>>
>> Thanks,
>>
>> Peter
>
> I won't have time to look at it today, but the BLAST+ wrappers
> need updating for the BLAST 2.2.27+ release, e.g. new arg
> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).
>
> Any volunteers? This should be a small job...

I've submitted a pull request here:
https://github.com/biopython/biopython/pull/151


From w.arindrarto at gmail.com  Fri Feb  1 17:43:23 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:43:23 +0100
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_5tQZ_H=ewFUyvOSi=aawBxC7hXvqxhbsxFqHVuCOKbWw@mail.gmail.com>
	<CADEGkF74OKWSSKHMEV8825_C64j4LqLNqqUCYyYz=OQ8zMS45A@mail.gmail.com>
Message-ID: <CADEGkF4AK5npXVtD1LEV=OZFebiDn1O7Yd+qZxdtpPVuer43pg@mail.gmail.com>

> Hi Peter,
>
>>> I think we're overdue for a Biopython release now, and I would
>>> like to do this next week. There are still plenty more additions
>>> and enhancements waiting in the wings, but right now I just
>>> want any remaining bug fixes addressed.
>>>
>>> Are there any release blocking issues?
>>>
>>> Thanks,
>>>
>>> Peter
>>
>> I won't have time to look at it today, but the BLAST+ wrappers
>> need updating for the BLAST 2.2.27+ release, e.g. new arg
>> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py).
>>
>> Any volunteers? This should be a small job...
>
> I've submitted a pull request here:
> https://github.com/biopython/biopython/pull/151

Wops, sorry for sending an incomplete mail ~ I wanted to add that some
test_NCBI_BLAST_tools.py doesn't correctly detect my blast
installations (even though I have it). I had to comment out the
"Install BLAST+ ..." notice and the rpsblast test (for some reason it
keeps saying I don't have rpsblast too, even though I do). Anyway,
these are not in the pull request, just something I did when writing
this fix.

Could you confirm that the fixes are OK?

Hope that helps,
Bow


From w.arindrarto at gmail.com  Fri Feb  1 17:48:09 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 1 Feb 2013 18:48:09 +0100
Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues?
In-Reply-To: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
References: <CAKVJ-_7U8HW4wa657oEYsR=vC=+cXV1O1nREps118O6F1uYjTQ@mail.gmail.com>
Message-ID: <CADEGkF6LAVmLd1SmVp5UNexaWe5irzxD+9NHm2kAvR7r0KmxXA@mail.gmail.com>

>> PS: I'd have replied on the bug tracker, but for some reason I can't
>> log in again, even after resetting the password. For some reason
>> redmine hates me.
>>
>
> Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/
> (it was left as a read only legacy listing, but it broke last year when
> the old server started to die and isn't really worth fixing).
>
> This was moved over to RedMine, along with all the other OBF
> projects. This does have some git integration, but I'm not that
> taken with it - and it is yet another service for the OBF team
> to maintain.
>
> What do people think of moving over to using GitHub issues?
> This would link in very well with pull requests and makes linking
> to commits much simpler too. One potential issue is if and how
> we could have bug reports sent to the biopython-dev mailing list
> (something we touched on recently for pull requests).
>
> A full automated move could be possible (NumPy did this), but I
> think a gradual move would be fine - stop filing new issues on
> RedMine and use GitHub issues in future. There are only about
> 100 issues open at the moment anyway, and a manual migration
> would also be a good way to review some of the older tickets.
>
> Thoughts?,

Moving to GitHub sounds good to me. I'd prefer if we go over the
issues manually (removing the obsolete ones and keeping the current
ones).

As per the bug reports sending to the mailing list, could we perhaps
create our own custom hooks? e.g. anytime a pull request is issued, an
email would be sent (see https://github.com/github/github-services and
http://developer.github.com/v3/repos/hooks/#create-a-hook)

Regards,
Bow


From arklenna at gmail.com  Fri Feb  1 19:05:18 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Fri, 1 Feb 2013 14:05:18 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
Message-ID: <CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>

On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
> (which could be a big disruption with lots of code relocation)
>
> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
> (which would also be the status quo, so no disruption).
>
>
I concede that the potential benefit of refactoring to separate WWW is
outweighed both by potential downsides and the disruption and effort
involved.

In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.
>
>
Populating the top level namespace with a submodule for each web-only
service has the risk of creating too many submodules. Bio.Seq* makes sense,
because the TAIR code pulls data into a Seq. Web services that connect to a
single biopython representation can be organized under that submodule. Web
services that return multiple types of information (e.g. Entrez) are big
enough to logically comprise their own submodule.

Is my interpretation of the biopython classification scheme more or less
correct?

Cheers,

Lenna


From p.j.a.cock at googlemail.com  Fri Feb  1 21:00:10 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 21:00:10 +0000
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
	<CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
Message-ID: <CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>>
>> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
>> (which could be a big disruption with lots of code relocation)
>>
>> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
>> (which would also be the status quo, so no disruption).
>>
>
> I concede that the potential benefit of refactoring to separate WWW is
> outweighed both by potential downsides and the disruption and effort
> involved.
>
>> In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
>> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
>> Bio.Seq* also seems sensible to me, as I wrote at the start of this
>> thread.
>>
>
> Populating the top level namespace with a submodule for each web-only
> service has the risk of creating too many submodules. Bio.Seq* makes sense,
> because the TAIR code pulls data into a Seq. Web services that connect to a
> single biopython representation can be organized under that submodule. Web
> services that return multiple types of information (e.g. Entrez) are big
> enough to logically comprise their own submodule.
>
> Is my interpretation of the biopython classification scheme more or less
> correct?

Yes that sounds about right :)

Of course, the historical muddle of Bio.Seq* is something we've talked
about addressing recently - see this thread from October,
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html

Peter


From natemsutton at yahoo.com  Fri Feb  1 21:54:42 2013
From: natemsutton at yahoo.com (Nate Sutton)
Date: Fri, 1 Feb 2013 13:54:42 -0800 (PST)
Subject: [Biopython-dev] New BioPython member
In-Reply-To: <CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>
References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com>
	<CAKVJ-_4HQC5V59V=jy4cpkqAAs-8FxtbAsjph3tWXwRAMFAMyQ@mail.gmail.com>
Message-ID: <1359755682.16563.YahooMailNeo@web122605.mail.ne1.yahoo.com>

Thanks for the welcome! ?Also, I looked briefly through the code with the files you wrote about and I see the command line app wrapping components you described. ?I appreciate the advice about how the do the wrapper and I am glad to know of that pattern of command line app wrapping that is consistent with code in other places of BioPython. ?Thanks for the other advice including possibly asking for guidance. ?I?ll just give it a shot and hopefully things go smoothly but it being my first BioPython coding I appreciate the support.

Thanks,

Nate


________________________________
 From: Peter Cock <p.j.a.cock at googlemail.com>
To: Nate Sutton <natemsutton at yahoo.com> 
Cc: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
Sent: Wednesday, January 30, 2013 2:31 AM
Subject: Re: [Biopython-dev] New BioPython member
 
On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton <natemsutton at yahoo.com> wrote:
> Dear all,
>
> I just recently joined the BioPython developers group and am
> looking forward to contributing to BioPython!? I have worked for a while
> in programming, genetics, and biology and have
> a m.s. in Biomedical Informatics.? After
> talking with some fellow contributors I have decided to try working on
> https://redmine.open-bio.org/issues/3360 but I will also work on writing
> some documentation on examples from the
> cookbook, especially if I am stuck on the bug.? If anyone wants to work on
> the same things, I?d be glad to hear that, I
> may be slow on the work because I am still learning Python after coming
> from
> other languages.
>
> -Nate

Hi Nate, and welcome.

Eric is in charge of the Bio.Phylo module, but within that the
command line application wrappers under Bio.Phylo.Applications
follow a pattern used elsewhere in Biopython.

To add a wrapper for fasttree http://www.microbesonline.org/fasttree/
have a look at the existing wrappers for PHYML and RAXML, defined in
Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py
(leading underscores mean private modules in Python), which are
exposed to the user via Bio/Phylo/Applications/__init__.py

In this case, I'd suggest putting the new wrapper in a new file,
Bio/Phylo/Applications/_fastree.py

Other similar wrappers existing under Bio.Emboss, Bio.Align, etc.

Don't be shy about asking for guidance on this, or git and github.
Ultimately I'm hoping you'll be able to do is take a fork (personally
copy of the repository) on GitHub, create a new fasttree branch,
commit your enhancements, and make a pull request. If that's
all too much for now, simply writing the new file and letting us
do the git side would be fine.

Regards,

Peter


From k.d.murray.91 at gmail.com  Fri Feb  1 23:59:57 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Sat, 2 Feb 2013 10:59:57 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
	<CALfq9tLe9jbQg8CYkaOaX92-tw3Pgpx6feDnyYLc_eBYYB8rCg@mail.gmail.com>
	<CAKVJ-_5od2F8o2-27abO3sLtJvUT7eOEhShDNcNCnYXJKDcKVQ@mail.gmail.com>
Message-ID: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>

Hi All,

How about this:
In the vein of Lenna's last email, we create a module WebSeq (or Seq.Web,
or whatever), containing modules whose sole purpose is to get sequences
(Seq/SeqRecord objects) from an internet database. This would i think
provide a good balance between a messy top-level domain full of modules
like Bio.tair, and the absolutisim of having anything vaugly web related in
a single WWW module. It should also provide the unified theme per module
which Michiel talks of, and unit/doctests should be fine, as no modules
will be split (simply moved in their entirety from Bio.x to Bio.WebSeq.x).

>From a quick look, the only candiate (apart from TAIR) for a shift is
TogoWS, and even then I'm not sure, as TogoWS isn't used just for Seq's
(and does not return them).

Regards
Kevin Murray


On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >>
> >>
> >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin
> >> (which could be a big disruption with lots of code relocation)
> >>
> >> People leaning against a Bio.WWW grouping: Michiel, Peter (me)
> >> (which would also be the status quo, so no disruption).
> >>
> >
> > I concede that the potential benefit of refactoring to separate WWW is
> > outweighed both by potential downsides and the disruption and effort
> > involved.
> >
> >> In the specific case of Kevin's TAIR code for fetch Arabidopsis
> sequences,
> >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> >> Bio.Seq* also seems sensible to me, as I wrote at the start of this
> >> thread.
> >>
> >
> > Populating the top level namespace with a submodule for each web-only
> > service has the risk of creating too many submodules. Bio.Seq* makes
> sense,
> > because the TAIR code pulls data into a Seq. Web services that connect
> to a
> > single biopython representation can be organized under that submodule.
> Web
> > services that return multiple types of information (e.g. Entrez) are big
> > enough to logically comprise their own submodule.
> >
> > Is my interpretation of the biopython classification scheme more or less
> > correct?
>
> Yes that sounds about right :)
>
> Of course, the historical muddle of Bio.Seq* is something we've talked
> about addressing recently - see this thread from October,
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From mjldehoon at yahoo.com  Sat Feb  2 01:36:03 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 1 Feb 2013 17:36:03 -0800 (PST)
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>
Message-ID: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>

In principle I am OK with this, but is TAIR only used for sequences? Or is it possible / likely that in the future we may want to add other functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, then Bio.Seq.Web is a good option I think.

Best,
-Michiel.

--- On Fri, 2/1/13, Kevin Murray <k.d.murray.91 at gmail.com> wrote:

> From: Kevin Murray <k.d.murray.91 at gmail.com>
> Subject: Re: [Biopython-dev] Namespace for online resources?
> To: "Peter Cock" <p.j.a.cock at googlemail.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, February 1, 2013, 6:59 PM
> Hi All,
> 
> How about this:
> In the vein of Lenna's last email, we create a module WebSeq
> (or Seq.Web,
> or whatever), containing modules whose sole purpose is to
> get sequences
> (Seq/SeqRecord objects) from an internet database. This
> would i think
> provide a good balance between a messy top-level domain full
> of modules
> like Bio.tair, and the absolutisim of having anything vaugly
> web related in
> a single WWW module. It should also provide the unified
> theme per module
> which Michiel talks of, and unit/doctests should be fine, as
> no modules
> will be split (simply moved in their entirety from Bio.x to
> Bio.WebSeq.x).
> 
> >From a quick look, the only candiate (apart from TAIR)
> for a shift is
> TogoWS, and even then I'm not sure, as TogoWS isn't used
> just for Seq's
> (and does not return them).
> 
> Regards
> Kevin Murray
> 
> 
> On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> 
> > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com>
> wrote:
> > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>
> > > wrote:
> > >>
> > >>
> > >> People leaning for a Bio.WWW grouping: Bow,
> Lenna, Kevin
> > >> (which could be a big disruption with lots of
> code relocation)
> > >>
> > >> People leaning against a Bio.WWW grouping:
> Michiel, Peter (me)
> > >> (which would also be the status quo, so no
> disruption).
> > >>
> > >
> > > I concede that the potential benefit of
> refactoring to separate WWW is
> > > outweighed both by potential downsides and the
> disruption and effort
> > > involved.
> > >
> > >> In the specific case of Kevin's TAIR code for
> fetch Arabidopsis
> > sequences,
> > >> Bio.TAIR (lower case?) is consistent with
> current usage. Somewhere under
> > >> Bio.Seq* also seems sensible to me, as I wrote
> at the start of this
> > >> thread.
> > >>
> > >
> > > Populating the top level namespace with a
> submodule for each web-only
> > > service has the risk of creating too many
> submodules. Bio.Seq* makes
> > sense,
> > > because the TAIR code pulls data into a Seq. Web
> services that connect
> > to a
> > > single biopython representation can be organized
> under that submodule.
> > Web
> > > services that return multiple types of information
> (e.g. Entrez) are big
> > > enough to logically comprise their own submodule.
> > >
> > > Is my interpretation of the biopython
> classification scheme more or less
> > > correct?
> >
> > Yes that sounds about right :)
> >
> > Of course, the historical muddle of Bio.Seq* is
> something we've talked
> > about addressing recently - see this thread from
> October,
> > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> >
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From k.d.murray.91 at gmail.com  Sat Feb  2 06:00:34 2013
From: k.d.murray.91 at gmail.com (Kevin Murray)
Date: Sat, 2 Feb 2013 17:00:34 +1100
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CAH80STUm4y_8vZXA-UZUVw5mWGwKYYtrp25_gB8drRLosLnpeg@mail.gmail.com>
	<1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAH80STXNGY8mxs2Zr4SF5rTaBCzH0b1bHvMJqfpLRJ-Y0sQLSA@mail.gmail.com>

Michiel,
TAIR (http://www.arabidopsis.org/) is primarily a sequence repository. I
have no intention to extend it beyond that, and any other features would
not be easily scriptable, or would be pointless to include in Biopython.

Regards
Kevin Murray


On 2 February 2013 12:36, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> In principle I am OK with this, but is TAIR only used for sequences? Or is
> it possible / likely that in the future we may want to add other
> functionality to TAIR? Anyway, if TAIR is predominantly used for sequences,
> then Bio.Seq.Web is a good option I think.
>
> Best,
> -Michiel.
>
> --- On Fri, 2/1/13, Kevin Murray <k.d.murray.91 at gmail.com> wrote:
>
> > From: Kevin Murray <k.d.murray.91 at gmail.com>
> > Subject: Re: [Biopython-dev] Namespace for online resources?
> > To: "Peter Cock" <p.j.a.cock at googlemail.com>
> > Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> > Date: Friday, February 1, 2013, 6:59 PM
> > Hi All,
> >
> > How about this:
> > In the vein of Lenna's last email, we create a module WebSeq
> > (or Seq.Web,
> > or whatever), containing modules whose sole purpose is to
> > get sequences
> > (Seq/SeqRecord objects) from an internet database. This
> > would i think
> > provide a good balance between a messy top-level domain full
> > of modules
> > like Bio.tair, and the absolutisim of having anything vaugly
> > web related in
> > a single WWW module. It should also provide the unified
> > theme per module
> > which Michiel talks of, and unit/doctests should be fine, as
> > no modules
> > will be split (simply moved in their entirety from Bio.x to
> > Bio.WebSeq.x).
> >
> > >From a quick look, the only candiate (apart from TAIR)
> > for a shift is
> > TogoWS, and even then I'm not sure, as TogoWS isn't used
> > just for Seq's
> > (and does not return them).
> >
> > Regards
> > Kevin Murray
> >
> >
> > On 2 February 2013 08:00, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >
> > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson <arklenna at gmail.com>
> > wrote:
> > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <
> p.j.a.cock at googlemail.com>
> > > > wrote:
> > > >>
> > > >>
> > > >> People leaning for a Bio.WWW grouping: Bow,
> > Lenna, Kevin
> > > >> (which could be a big disruption with lots of
> > code relocation)
> > > >>
> > > >> People leaning against a Bio.WWW grouping:
> > Michiel, Peter (me)
> > > >> (which would also be the status quo, so no
> > disruption).
> > > >>
> > > >
> > > > I concede that the potential benefit of
> > refactoring to separate WWW is
> > > > outweighed both by potential downsides and the
> > disruption and effort
> > > > involved.
> > > >
> > > >> In the specific case of Kevin's TAIR code for
> > fetch Arabidopsis
> > > sequences,
> > > >> Bio.TAIR (lower case?) is consistent with
> > current usage. Somewhere under
> > > >> Bio.Seq* also seems sensible to me, as I wrote
> > at the start of this
> > > >> thread.
> > > >>
> > > >
> > > > Populating the top level namespace with a
> > submodule for each web-only
> > > > service has the risk of creating too many
> > submodules. Bio.Seq* makes
> > > sense,
> > > > because the TAIR code pulls data into a Seq. Web
> > services that connect
> > > to a
> > > > single biopython representation can be organized
> > under that submodule.
> > > Web
> > > > services that return multiple types of information
> > (e.g. Entrez) are big
> > > > enough to logically comprise their own submodule.
> > > >
> > > > Is my interpretation of the biopython
> > classification scheme more or less
> > > > correct?
> > >
> > > Yes that sounds about right :)
> > >
> > > Of course, the historical muddle of Bio.Seq* is
> > something we've talked
> > > about addressing recently - see this thread from
> > October,
> > >
> http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html
> > >
> > > Peter
> > > _______________________________________________
> > > Biopython-dev mailing list
> > > Biopython-dev at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> > >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>


From eric.talevich at gmail.com  Sat Feb  2 22:29:57 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 2 Feb 2013 17:29:57 -0500
Subject: [Biopython-dev] Namespace for online resources?
In-Reply-To: <CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_6U+8m5DJYJmoKeoS86jP1z9d6ODS66kGfT7AHp9jBhbA@mail.gmail.com>
Message-ID: <CAMC681=rosLhOPyi3bmaUaj+WC_M573VY4YPvmvdjwu1YsTiLw@mail.gmail.com>

On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Lenna,
> >
> >> Regarding point (2), is your primary concern namespace clutter or
> >> importing efficiency?
> >
> > Regarding point (2), my primary concern is that a Bio.WWW module would
> > group together modules that don't have much in common with each other. I
> > agree to your point that the category of internet access is more
> fundamental
> > than the category of parsers. But still, which modules should then go
> into a
> > Bio.WWW module? Any module whose sole purpose is to use the internet
> (that
> > would exclude Bio.Entrez)? Any module whose main purpose is to use the
> > internet? This would be unclear; for example, Bio.Entrez may or may not
> fall
> > in that category, depending on how you use the module. Any module whose
> > functionality includes internet access? Then if one day we add access to
> the
> > JASPAR database over the internet to Bio.Motif, it would have to move to
> > Bio.WWW.
> >
> > Currently most modules are organized by theme (Bio.Seq, Bio.Motif,
> > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one
> > module, one chapter in the documentation, one test of unit tests, one
> set of
> > doctests, which I think is a huge advantage both in terms of clarity and
> in
> > terms of user experience.
>
> Also with the theme approach, most (if not all) the themes are likely to
> have some online resources (databases or remote APIs). On those
> grounds it makes sense to keep online motif functionality (like weblogo)
> under Bio.Motif, and so on.
>

I agree.
>From an engineering perspective, it's usually best to organize code around
data types. (To be clear: think classes and structures, not ints and
strings.) The SeqIO, AlignIO, SearchIO, Phylo, Motif, PDB, etc. modules
each have a core data type that serves as the "theme" for the sub-package.
Within the sub-package we can have modules for different file formats, data
transformations/manipulations, web servers, and command-line program
wrappers, and keep all the interdependencies within the same small region
of the code base. Since most users will not read the documentation in its
entirety (if at all), this also makes it easier to look up how to do things
with the data type in question.

The core data type for a WWW module would be a network handle, I suppose --
but that's already part of the Python standard library.

I've suggested before that we can justify the current placement of
sequence-related modules at the top level, rather than under a new "Seq"
sub-package, by considering sequences to be the default/implicit data type.
As we've covered, many online resources can serve up several different data
types, although sequences are probably the most common. In terms of
namespace clutter, perhaps I've gotten too used to R, but I don't think
we've reached the point where the number of modules and functions visible
from the top level harms the user experience.


In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences,
> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under
> Bio.Seq* also seems sensible to me, as I wrote at the start of this thread.
>

Bio.TAIR or Bio.Seq.TAIR or perhaps Bio.Seq.WWW.TAIR seem sensible to me,
too. No preference on casing.

-Eric


From p.j.a.cock at googlemail.com  Mon Feb  4 12:01:49 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 12:01:49 +0000
Subject: [Biopython-dev] Deprecations for Biopython 1.61 release;
	Was: Bio.Motif update
Message-ID: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I went over the list in the DEPRECATED file last month, but
>> a second check would be a good idea.
>
> The following were declared obsolete in Biopython 1.60, and can
> in principle be declared deprecated in Biopython 1.61:
>
> ----------
> Bio/Blast/Applications.py:
> BlastallCommandline
> BlastpgpCommandline
> RpsBlastCommandline
>
> Bio/Blast/NCBIStandalone.py overall, and specifically:
> blastall
> blastpgp
> rpsblast
>
> Bio/ParserSupport.py overall
>
> Bio/PDB/AbstractPropertyMap.py:
> The has_key function in class AbstractPropertyMap
>
> Bio/PDB/FragmentMapper.py:
> The has_key function in class FragmentMapper
>
> Bio/UniGene/UniGene.py overall
>
> In BioSQL/BioSeqDatabase.py:
>   class DBServer:
>      remove_database
>   class BioSeqDatabase:
>      get_all_primary_ids
>      get_Seq_by_primary_id
>
> -----------
>
> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:
>
> Bio/Align/__init__.py:
>   class MultipleSeqAlignment:
>      get_column
>      add_sequence
>
> Bio/Align/Generic.py:
>   class Alignment overall
>     get_all_seqs
>     get_seq_by_num
>
> Bio/File.py:
>   class StringHandle
>
> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
>   class AbstractDrawer:
>     _set_xcentre, _set_ycentre
>
> Bio/Graphics/GenomeDiagram/_Graph.py:
>   class GraphData:
>     _set_centre
>
> Bio/ParserSupport.py:
>   SGMLStrippingConsumer
>
> Bio/Seq.py:
>   class Seq:
>      .data property
>
> Bio/SeqIO/SffIO.py:
>   _sff_read_roche_index_xml
>
> --------------------
>
> The tostring() method of the class Seq in Bio/Seq.py:
> Can we declare this obsolete?
>
> -Michiel

Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done:
https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1

Bio/File.py and Bio/ParserSupport.py bits done:
https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a

GenomeDiagram centre setters done:
https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288

Peter


From ben at bendmorris.com  Mon Feb  4 15:17:36 2013
From: ben at bendmorris.com (Ben Morris)
Date: Mon, 4 Feb 2013 10:17:36 -0500
Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo
In-Reply-To: <CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>
References: <CAAzEd5AvRgkr=UYmqwHPH+cBYXCS+5yLHs=bHjCDxN1rY_aGFg@mail.gmail.com>
	<CAMC681=OrHJmfEbxWz=8-qzo2rEVJaqFeqgihiAMVi6No7GBCw@mail.gmail.com>
	<CAAzEd5Bz5xvc2Bz80Ru+FbUbJK-WnAjfvLv70SfkPZup89NGRQ@mail.gmail.com>
	<CAMC681ndsaK0J0iR==O7djsG1KQxdbp6TWd7sgGDVySP2OHuSA@mail.gmail.com>
Message-ID: <CAAzEd5CWOJ57YHEAw2-LXBVJ6oc_XHiU3fqJMLHj_jwS9edVhg@mail.gmail.com>

On Fri, Jan 18, 2013 at 8:20 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris <ben at bendmorris.com> wrote:
>>
>> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> >
>> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris <ben at bendmorris.com> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I've implemented support for two new phylogenetic tree formats: NeXML
>> >> and
>> >> RDF (conforming to the Comparative Data Analysis Ontology).
>> >>
>> >> I noticed that NeXML support was planned, but I didn't see anyone
>> >> working
>> >> on it on GitHub and the feature request hadn't been updated in about a
>> >> year, so I went ahead and implemented a simple version. At first I
>> >> tried
>> >> the generateDS.py approach, but the generated writer doesn't give very
>> >> much
>> >> control over the output, so I ended up writing my own parser/writer
>> >> using
>> >> ElementTree.
>> >>
>> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported
>> >> by
>> >> any other phylogenetic libraries, so I'm not sure how useful this is to
>> >> everyone else. It provides a simple, standards-compliant format that
>> >> can be
>> >> imported to a triple store and supports annotation. We'll be using it
>> >> at
>> >> NESCent so I wanted to make it available to everyone else as well. The
>> >> parser and writer require the Redlands Python bindings.
>> >>
>> >> The code is available in my fork of Biopython,
>> >>
>> >>     https://github.com/bendmorris/biopython
>> >>
>> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts
>> >> and
>> >> see if these contributions would be a good fit for the Biopython
>> >> project.
>> >
>> >
>> >
>> > Thanks for letting us know! I'll try it out soonish. Looking at the code
>> > on your nexml branch, I have a few comments:
>> >
>> > - The parser uses ElementTree.parse rather than iterparse, so in its
>> > current state it would not be able to parse massive files (those larger than
>> > available RAM). Worth fixing eventually?
>>
>> Great point. I rewrote it to use iterparse instead.
>>
>> > - The parser creates Newick.Tree and Newick.Clade objects, which is
>> > nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and
>> > BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you
>> > don't have any additional attributes to attach to those classes at the
>> > moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and
>> > PhyloXMLIO.py.)
>>
>> Went ahead and did this as well.
>
>
> Thanks! Sorry for the pace of this, I'm in the midst of a dissertation.
>
>
>> > - The 'confidence' or 'confidences' attribute isn't used (for e.g.
>> > bootstrap support values). Does NeXML define it?
>>
>> Not that I'm aware of, but I'm not sure. I searched
>> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything.
>> I'm going to ask some people who know more about this than I do.
>
>
> I would like for Bio.Phylo's I/O modules to be able to successfully
> round-trip a file from Newick to phyloXML to NeXML and back to Newick
> without losing support values. I found these two examples of how to add this
> data to a NeXML document by referencing CDAO:
> https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag
> https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements
>
> That's the standard way to store bootstrap supports in NeXML (Hilmar
> confirms). How do your NeXML and CDAO modules interact, if at all? Would the
> CDAO modules be useful to properly support NeXML metadata like
> support/confidence values, or would it be simpler to just hard-code the few
> tags we're specifically interested in?
>
> Relatedly, those look like good test files. I see you've started writing
> NeXML unit tests already; if you would like help with any of this, just let
> me know.
>
> -Eric


No worries! I just returned from a NESCent-sponsored hackathon where
we used BioPython as part of a Virtuoso-backed RDF treestore
(https://github.com/phylotastic/rdf-treestore). Now that I'm back,
I'll work on the bootstrap support values and annotations for NeXML as
I have time.

I think it's probably much easier to just hard-code specific tags for
now. The CDAO module can convert the more readable CDAO prefix names
to OBO numeric identifiers (e.g. cdao:has_Root -> obo:CDAO_0000148)
but other than that I don't see a good way for them to interact.

I gave a short demo of Bio.Phylo at the hackathon, and people were
very impressed. We had some issues with Newick and Nexus parsing as
well, so I'll open issues on the bug tracker.

~Ben


From redmine at redmine.open-bio.org  Mon Feb  4 15:20:38 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 15:20:38 +0000
Subject: [Biopython-dev] [Biopython - Bug #3407] (New) Handling of bootstrap
	support values in Bio.Phylo Newick parser
Message-ID: <redmine.issue-3407.20130204152038@redmine.open-bio.org>


Issue #3407 has been reported by Ben Morris.

----------------------------------------
Bug #3407: Handling of bootstrap support values in Bio.Phylo Newick parser
https://redmine.open-bio.org/issues/3407

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


This was reported to me by Arlin Stoltzfus (quote):


"There is a description of Newick here: 

  http://evolution.genetics.washington.edu/phylip/newicktree.html

and a BNF here: 

  http://evolution.genetics.washington.edu/phylip/newick_doc.html

Note that this allows square-bracketed comments. 

Bootstrap values commonly are represented in 2 ways, one of which is wrong.  The wrong way to represent bootstrap values is to present them as internal node labels.   Labels for internal nodes are given as follows: 

   ((( human: 0.1, chimp:0.1 ) primates: 0.2, (rat:0.1, mouse:0.1) rodents:0.2), cat:0.3 )

where "primates" and "rodents" are internal node labels.  They go between the right paren and the (optional) colon and distance.  If you put numbers in the label position, a graphic renderer may place them on the nodes, which is why some people represent bootstrap values this way. 

However, the preferred way to represent bootstrap values is to make them syntactic comments (enclosed in square brackets) placed after all other node information, i.e., after the optional colon & branch length.   Both examples are shown here: 

((raccoon:19.19959,bear:6.80041)50:0.84600,((sea_lion:11.99700, seal:12.00300)100:7.52973,((monkey:100.85930,cat:47.14069)80:20.59201, weasel:18.87953)75:2.09460)50:3.87382,dog:25.46154);
or
((raccoon:19.19959,bear:6.80041):0.84600[50],((sea_lion:11.99700, seal:12.00300):7.52973[100],((monkey:100.85930,cat:47.14069):20.59201[80], weasel:18.87953):2.09460[75]):3.87382[50],dog:25.46154);

I recommend that you only support the second version, and treat the first version as a case of internal node labels.  

Arlin
-------
Arlin Stoltzfus (arlin at umd.edu)
Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST
IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850
tel: 240 314 6208; web: www.molevol.org"


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Feb  4 15:26:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 15:26:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3408] (New) Parsing of Nexus
	files generated by TreeBase fails (Bio.Phylo)
Message-ID: <redmine.issue-3408.20130204152631@redmine.open-bio.org>


Issue #3408 has been reported by Ben Morris.

----------------------------------------
Bug #3408: Parsing of Nexus files generated by TreeBase fails (Bio.Phylo)
https://redmine.open-bio.org/issues/3408

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Steps to reproduce: 

Pick a tree on TreeBase (e.g. http://treebase.org/treebase-web/search/study/trees.html?id=12003 or http://treebase.org/treebase-web/search/study/trees.html?id=1029) and click on "download reconstructed NEXUS file."

Attempt to parse the file using Bio.Phylo.read.

Exception:

<pre>Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 62, in read
    tree = tree_gen.next()
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 50, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NexusIO.py", line 39, in parse
    nex = Nexus.Nexus(handle)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 572, in __init__
    self.read(input)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 623, in read
    self._parse_nexus_block(title, contents)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 664, in _parse_nexus_block
    getattr(self,'_'+line.command)(line.options)
AttributeError: 'Nexus' object has no attribute '_link'
</pre>


DendroPy is able to parse the same files.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Feb  4 16:49:07 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 16:49:07 +0000
Subject: [Biopython-dev] Deprecations for Biopython 1.61 release;
	Was: Bio.Motif update
In-Reply-To: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>
References: <CAKVJ-_5MiJvfGE8RvMWpRpqK1_GQcfGCCUYBmV_NfE5vwhF7iQ@mail.gmail.com>
Message-ID: <CAKVJ-_5KcSA2Sk4djieEcAP+RyJB7Ek44PSjJV=iykQRcdVeGQ@mail.gmail.com>

On Mon, Feb 4, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> --- On Fri, 2/1/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> I went over the list in the DEPRECATED file last month, but
>>> a second check would be a good idea.
>>
>> The following were declared obsolete in Biopython 1.60, and can
>> in principle be declared deprecated in Biopython 1.61:
>>
>> ----------
>> Bio/Blast/Applications.py:
>> BlastallCommandline
>> BlastpgpCommandline
>> RpsBlastCommandline
>>
>> Bio/Blast/NCBIStandalone.py overall, and specifically:
>> blastall
>> blastpgp
>> rpsblast
>>
>> Bio/ParserSupport.py overall
>>
>> Bio/PDB/AbstractPropertyMap.py:
>> The has_key function in class AbstractPropertyMap
>>
>> Bio/PDB/FragmentMapper.py:
>> The has_key function in class FragmentMapper
>>
>> Bio/UniGene/UniGene.py overall
>>
>> In BioSQL/BioSeqDatabase.py:
>>   class DBServer:
>>      remove_database
>>   class BioSeqDatabase:
>>      get_all_primary_ids
>>      get_Seq_by_primary_id
>>
>> -----------
>>
>> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61:
>>
>> Bio/Align/__init__.py:
>>   class MultipleSeqAlignment:
>>      get_column
>>      add_sequence
>>
>> Bio/Align/Generic.py:
>>   class Alignment overall
>>     get_all_seqs
>>     get_seq_by_num
>>
>> Bio/File.py:
>>   class StringHandle
>>
>> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py:
>>   class AbstractDrawer:
>>     _set_xcentre, _set_ycentre
>>
>> Bio/Graphics/GenomeDiagram/_Graph.py:
>>   class GraphData:
>>     _set_centre
>>
>> Bio/ParserSupport.py:
>>   SGMLStrippingConsumer
>>
>> Bio/Seq.py:
>>   class Seq:
>>      .data property
>>
>> Bio/SeqIO/SffIO.py:
>>   _sff_read_roche_index_xml
>>
>> --------------------
>>
>> The tostring() method of the class Seq in Bio/Seq.py:
>> Can we declare this obsolete?
>>
>> -Michiel
>
> Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done:
> https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1
>
> Bio/File.py and Bio/ParserSupport.py bits done:
> https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a
>
> GenomeDiagram centre setters done:
> https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288

Michiel already did most of the others,
https://github.com/biopython/biopython/commit/1b2025bee868b0282b913690a999833d13598ea4

I've just removed the Seq object's deprecated data property:
https://github.com/biopython/biopython/commit/e3cf12a1bf28c1cd52e4b5492fb1cd76731b486b

For the Seq object's tostring() method, let's review Bow's pull request
after this release? https://github.com/biopython/biopython/pull/137

Regards,

Peter


From p.j.a.cock at googlemail.com  Mon Feb  4 17:26:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 17:26:44 +0000
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <CAKVJ-_5X3eoXEnqD7yfTGFW1Saxr7rMe-WcbCofmCqdu_yq6KA@mail.gmail.com>
	<1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7EVLdBwX=Gjzdpdo3abj39dPnkytdeNHtkFedzbHMK7w@mail.gmail.com>

On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter and all,
>
> --- On Tue, 1/29/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> We need to say something about this in the NEWS file too.
>
> Done.
>
>> I think it would make sense to add a PendingDeprecationWarning
>> to Bio.Motif now.
>
> Done.
>
>> Also, if you feel the new Bio.motifs API isn't quite
>> settled yet, adding the new BiopythonExperimentalWarning to
>> that makes sense.
>
> I don't expect big changes in the API, so I think we can do without the
> BiopythonExperimentalWarning. Also we should avoid the situation
> that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a
> BiopythonExperimentalWarning.
>
>> (And once this is settled, I think we can schedule the
>> release)

Hi Michiel,

Rather than having two (very similar) chapters in the Tutorial for
the old Bio.Motif and new Bio.motifs modules, I've downgraded
the old chapter to just a section of the new chapter:

https://github.com/biopython/biopython/commit/ee5cccf6bc661befc924cb7fc2a422c07f3eeee1

There is still a lot of redundant content - would you be able to
shorten this? Or can we just cut it and refer anyone interested
to the tutorial shipped with Biopython 1.60 instead?

I think a summary of the differences  be more useful, to help people
convert from the old module to the new motifs module.

Also, what is the point of the Bio.motifs.create function? Is there
a reason not to initialise a Motif object directly?

Thanks,

Peter


From p.j.a.cock at googlemail.com  Mon Feb  4 17:57:42 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 17:57:42 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
Message-ID: <CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>

On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hello all,
>
> I think we're overdue for a Biopython release now, and I would
> like to do this next week. There are still plenty more additions
> and enhancements waiting in the wings, but right now I just
> want any remaining bug fixes addressed.
>
> Are there any release blocking issues?
>
> Thanks,
>
> Peter

Hi all,

I've posted the current tutorial as HTML and PDF online [*],
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

It would be great to have you all re-read chapters you've
contributed to or are familiar with - and fix or report any
more typos etc.

Note that some of the embedded examples in the LaTeX
source are now tested via doctest using test_Tutorial.py,
so if you do make some local edits run that before you
commit them.

Thanks,

Peter

[*] Those URLs used to be updated nightly, something I've
not yet restored since the website was moved from the old
OBF hardware to an Amazon cloud server. The simplest
option here would be to install latex on the server...


From redmine at redmine.open-bio.org  Mon Feb  4 18:14:19 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 4 Feb 2013 18:14:19 +0000
Subject: [Biopython-dev] [Biopython - Bug #3409] (New) Newick parser fails
	to parse Greengenes tree (Bio.Phylo)
Message-ID: <redmine.issue-3409.20130204181419@redmine.open-bio.org>


Issue #3409 has been reported by Ben Morris.

----------------------------------------
Bug #3409: Newick parser fails to parse Greengenes tree (Bio.Phylo)
https://redmine.open-bio.org/issues/3409

Author: Ben Morris
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


The file is available here: http://www.evoio.org/wg/evoio/images/f/f9/Greengenes2011.txt (9.2 MB)

The problem may be related to the use of single-quoted node labels which sometimes contain parentheses, e.g. <pre>'p__Fusobacteria; c__Fusobacteria (class); o__Fusobacteriales; f__Fusobacteriaceae':0.11021</pre>

Exception:

<pre>  ...
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 87, in _parse_subtree
    raise NewickError("Parentheses do not match in (sub)tree: " + text)
Bio.Phylo.NewickIO.NewickError: Parentheses do not match in (sub)tree: 139839:0.04507):0.02429
</pre>

Other Newick parsers (ete and dendropy) are able to parse this file.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From mjldehoon at yahoo.com  Tue Feb  5 04:01:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 4 Feb 2013 20:01:26 -0800 (PST)
Subject: [Biopython-dev] Bio.Motif update
In-Reply-To: <CAKVJ-_7EVLdBwX=Gjzdpdo3abj39dPnkytdeNHtkFedzbHMK7w@mail.gmail.com>
Message-ID: <1360036886.33220.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Hi Peter,

--- On Mon, 2/4/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Rather than having two (very similar) chapters in the
> Tutorial for the old Bio.Motif and new Bio.motifs modules,
> I've downgraded the old chapter to just a section of
> the new chapter:
... 
> There is still a lot of redundant content - would you be
> able to shorten this?

I think it's OK if it is redundant. Anyway the chapter on the older Bio.Motif will be removed a few releases later.

> I think a summary of the differences?be more useful,
> to help people convert from the old module to the new
> motifs module.

Maybe, but for me it doesn't have a high priority. It's easier to understand the new chapter on Bio.motifs.
 
> Also, what is the point of the Bio.motifs.create function?
> Is there a reason not to initialise a Motif object directly?

There are two ways to initialize a Motif: either to specify the alignment from which the motif is created, or directly from a position-weight matrix. This can be a bit confusing. To separate the two, the Bio.motifs.create function only initializes a Motif from an alignment; some of the motif parsers initialize a Motif from a position-weight matrix. 

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Tue Feb  5 12:32:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:32:47 +0000
Subject: [Biopython-dev] KEGG enhancements
Message-ID: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>

Hi all,

We have a couple of new pull requests for KEGG enhancements,
which we can look at after the imminent Biopython 1.61 release
goes out this week.

Kevin's working on the REST API,
https://github.com/biopython/biopython/pull/152

Leighton's working on KGML and graphics,
https://github.com/biopython/biopython/pull/153

There is a tiny bit of online access code in Leighton's code
which can probably be changed to use Kevin's work - I've
not had time to examine the overlap yet.

Peter


---------- Forwarded message ----------
From: kevin <notifications at github.com>
Date: Mon, Feb 4, 2013 at 8:03 PM
Subject: [biopython] Add KEGG API Querying Support (#152)
To: biopython/biopython <biopython at noreply.github.com>


This adds support to query KEGG's REST API
(http://www.kegg.jp/kegg/docs/keggapi.html) along with simple tests
which ensure that the correct url is hit and documentation in the
cookbook.

This has been discussed on the mailing list in the following thread:
http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009981.html.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/kevinwuhoo/biopython master

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/152

Commit Summary

Added a KEGG API Wrapper
Forgot copyright
Added a general parser and a KEGG section in the tutorial.
Updated querying code and corresponding tests.
Updated documentation to reflect changes in KEGG module.

File Changes

M Bio/KEGG/__init__.py (196)
M Doc/Tutorial.tex (88)
M Tests/output/test_KEGG (41)
M Tests/test_KEGG.py (159)

Patch Links:

https://github.com/biopython/biopython/pull/152.patch
https://github.com/biopython/biopython/pull/152.diff


From p.j.a.cock at googlemail.com  Tue Feb  5 12:33:52 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:33:52 +0000
Subject: [Biopython-dev] KEGG enhancements
In-Reply-To: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
References: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
Message-ID: <CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>

On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> We have a couple of new pull requests for KEGG enhancements,
> which we can look at after the imminent Biopython 1.61 release
> goes out this week.
>
> Kevin's working on the REST API,
> https://github.com/biopython/biopython/pull/152
>
> Leighton's working on KGML and graphics,

Sorry, the correct URL, https://github.com/biopython/biopython/pull/155

Details below,

Peter

---------- Forwarded message ----------
From: Leighton Pritchard <notifications at github.com>
Date: Tue, Feb 5, 2013 at 12:28 PM
Subject: [biopython] KGML files (#155)
To: biopython/biopython <biopython at noreply.github.com>


As we discussed - not an ideal pull request (rebasing added the recent
Biopython changes to the KEGG branch, rather than what was expected),
but if it's workable, here's the code in a way that doesn't seem to
break Biopython ;)

L.

________________________________

You can merge this Pull Request by running

  git pull https://github.com/widdowquinn/biopython kegg

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/155

Commit Summary

First addition of KGML module (with tests)
Moved Bio.KGML to Bio.KEGG.KGML and split KGML tests
Modified comments to indicate TODO
Removed accidentally-committed files
Fix typo in error message
Fix typo in blastall wrapper
Add new Blast 2.2.27+ arguments to wrappers
Ignore new blastx arguments if testing with old BLAST+
BLAST 2.2.27+ dropped -frame_shift_penalty argument
Remove deprecated Bio.File.StringHandle and SGMLStripper
Remove centre setters, add explicit deprecation warning to getters.
Clarify docstrings of deprecated BLAST functions.
Avoid ResourceWarning: unclosed file in these doctests
Close handle in this doctest
Remove the deprecated Seq object's data property
Remove duplicated section labels in Tutorial (in repeated Motifs text)
Downgrade Bio.Motif chapter to a section at the end of the Bio.motifs chapter
Fix a typo
Clarify docstring for obsolete Bio.Motif module
Explain Bio.motifs replaces Bio.Motif in its docstring
Update date in Tutorial
Fix 2 typos.
Add links to SearchIO tutorial files
Update SearchIO tutorial language style
Add links to SearchIO documentation pages
Tutorial specific example files have previously gone under Doc/examples
Update paths in tutorial after moving example files

File Changes

M Bio/Blast/Applications.py (36)
M Bio/Blast/NCBIStandalone.py (21)
M Bio/File.py (65)
M Bio/Graphics/GenomeDiagram/_AbstractDrawer.py (30)
M Bio/Graphics/GenomeDiagram/_Graph.py (14)
A Bio/Graphics/KGML_vis.py (422)
A Bio/KEGG/KGML/KGML_parser.py (184)
A Bio/KEGG/KGML/KGML_pathway.py (766)
A Bio/KEGG/KGML/KGML_scrape.py (109)
A Bio/KEGG/KGML/__init__.py (15)
M Bio/Motif/__init__.py (13)
M Bio/ParserSupport.py (34)
M Bio/Seq.py (33)
M Bio/SeqIO/SffIO.py (1)
M Bio/SeqRecord.py (6)
M Bio/motifs/__init__.py (7)
M DEPRECATED (8)
M Doc/Tutorial.tex (164)
A Doc/examples/my_blast.xml (0)
A Doc/examples/my_blat.psl (0)
A Tests/KEGG/ko01100.kgml (17805)
A Tests/KEGG/ko01100.xml (25176)
A Tests/KEGG/ko01100_mod_original.pdf (98)
A Tests/KEGG/ko01100_original.pdf (98)
A Tests/KEGG/ko01120.xml (11425)
A Tests/KEGG/ko03070.kgml (249)
A Tests/KEGG/ko03070.xml (413)
A Tests/KEGG/ko03070_mod_original.pdf (113)
A Tests/KEGG/ko03070_original.pdf (113)
A Tests/KEGG/map01100.png (0)
A Tests/KEGG/map03070.png (0)
D Tests/Tutorial/README.txt (9)
M Tests/test_File.py (13)
A Tests/test_KGML_graphics.py (138)
A Tests/test_KGML_nographics.py (99)
A Tests/test_KGML_online.py (68)
M Tests/test_NCBI_BLAST_tools.py (9)
M Tests/test_ParserSupport.py (9)
M setup.py (1)

Patch Links:

https://github.com/biopython/biopython/pull/155.patch
https://github.com/biopython/biopython/pull/155.diff


From p.j.a.cock at googlemail.com  Tue Feb  5 12:36:55 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 12:36:55 +0000
Subject: [Biopython-dev] KEGG enhancements
In-Reply-To: <CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>
References: <CAKVJ-_5MYuN310EuS4uOeVcNQZxRdPshh-GALGQWx1aj25eS-w@mail.gmail.com>
	<CAKVJ-_6ncVoF2OXmr6M-81JJQ42wJK-oYJuTBrG1XUx0R0Us6Q@mail.gmail.com>
Message-ID: <CAKVJ-_43EXOwqAR86WsSxjGUgZK0-w8Fb2V5um-0hQY-tt5NOw@mail.gmail.com>

On Tue, Feb 5, 2013 at 12:33 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi all,
>>
>> We have a couple of new pull requests for KEGG enhancements,
>> which we can look at after the imminent Biopython 1.61 release
>> goes out this week.
>>
>> Kevin's working on the REST API,
>> https://github.com/biopython/biopython/pull/152
>>
>> Leighton's working on KGML and graphics,
>
> Sorry, the correct URL, https://github.com/biopython/biopython/pull/155
>
> Details below,

See also Leighton's blog posts about this work (with pictures):
http://armchairbiology.blogspot.co.uk/2013/01/keggwatch-part-i.html
http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-ii.html
http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-iii.html

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 13:55:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 13:55:20 +0000
Subject: [Biopython-dev] Doing the Biopython 1.61 release next week?
In-Reply-To: <CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>
References: <CAKVJ-_7mvg2COukJQimyxUKRCmjPo65P4+p53SUbXZRDGqHLbQ@mail.gmail.com>
	<CAKVJ-_6D5kAWcQn1+1=mqVBQB+WOn1AXzk4qYNW0SB7PueWadQ@mail.gmail.com>
Message-ID: <CAKVJ-_76PY9oLykHwqGHgJhs1GxTwOiabMQx+r5+PebQbMWtUQ@mail.gmail.com>

Hi all,

I'm going to try and do the release this afternoon, so
no commits to the master branch until further notice
please.

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 14:49:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 14:49:20 +0000
Subject: [Biopython-dev] Biopython 1.61 release
Message-ID: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>

On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> I'm going to try and do the release this afternoon, so
> no commits to the master branch until further notice
> please.
>
> Thanks,
>
> Peter

The release is in progress...

The Windows installers are on the website for some quick
pre-announcement testing. If anyone spots an issue, please
email me ASAP: http://biopython.org/DIST/

Last time we put 'beta' in the Python 3.2 installer to emphasise
this was still not quite reading for prime time. Should we do that
again? How comfortable are we all about encouraging more
use under Python 3?

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 18:14:24 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:14:24 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
Message-ID: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>

On Tue, Feb 5, 2013 at 2:49 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Hi all,
>>
>> I'm going to try and do the release this afternoon, so
>> no commits to the master branch until further notice
>> please.
>>
>> Thanks,
>>
>> Peter
>
> The release is in progress...
>
> The Windows installers are on the website for some quick
> pre-announcement testing. If anyone spots an issue, please
> email me ASAP: http://biopython.org/DIST/
>
> Last time we put 'beta' in the Python 3.2 installer to emphasise
> this was still not quite reading for prime time. Should we do that
> again? How comfortable are we all about encouraging more
> use under Python 3?

I'm planning to do the same in terms of putting beta in the
Windows installer for Python 3.2.

After some trouble, I now have the epydoc API files updated
(a manual refresh might be needed to see the changes):
http://biopython.org/DIST/docs/api/

Bow - the `backtick` markup doesn't do anything in epydoc, but
perhaps for the next release we can turn the SearchIO markup
into restructuredtext instead?

I think last time I didn't have the docutils dependency installed
in order for epydoc to try and parse the restructuredtext (used
in Bio.Phylo). Running epydoc also showed a few more epydoc
formatting errors, fixed in git - I will now regenerate the installers,
and tag this in git etc.

Peter


From w.arindrarto at gmail.com  Tue Feb  5 18:22:46 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 5 Feb 2013 19:22:46 +0100
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
Message-ID: <CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>

Hi Peter,

> Bow - the `backtick` markup doesn't do anything in epydoc, but
> perhaps for the next release we can turn the SearchIO markup
> into restructuredtext instead?
>
> I think last time I didn't have the docutils dependency installed
> in order for epydoc to try and parse the restructuredtext (used
> in Bio.Phylo). Running epydoc also showed a few more epydoc
> formatting errors, fixed in git - I will now regenerate the installers,
> and tag this in git etc.

Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText
markup; in hindsight probably not wise since we still rely on epydoc.
Using rSt for the next release sounds good.

On a related not, do we have any solid plans to move out of epydoc
(and into Sphinx?) for the next release?

regards,
Bow


From p.j.a.cock at googlemail.com  Tue Feb  5 18:30:29 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:30:29 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CADEGkF53YZkDXFZ2SMifAckbZponntf5cevuu3sSbCKiKH7r=w@mail.gmail.com>
Message-ID: <CAKVJ-_6xHY9G5eE=Bb0mab5K+FF7cseUqwXqEgJhkwLA01DYZQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 6:22 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
>
>> Bow - the `backtick` markup doesn't do anything in epydoc, but
>> perhaps for the next release we can turn the SearchIO markup
>> into restructuredtext instead?
>>
>> I think last time I didn't have the docutils dependency installed
>> in order for epydoc to try and parse the restructuredtext (used
>> in Bio.Phylo). Running epydoc also showed a few more epydoc
>> formatting errors, fixed in git - I will now regenerate the installers,
>> and tag this in git etc.
>
> Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText
> markup; in hindsight probably not wise since we still rely on epydoc.
> Using rSt for the next release sounds good.

Using reStructuredText (like Eric did with Bio.Phylo) would have
been (and is) fine, however you had __docformat__ = 'epytext en'
in the file.

> On a related not, do we have any solid plans to move out of epydoc
> (and into Sphinx?) for the next release?

Not yet - but moving all the docstrings to reStructuredText is a
very good step towards that, and a chance to review/update
all the plain text docstrings in particular to look nicer and be
more consistent.

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 18:57:58 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 18:57:58 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
Message-ID: <CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>

Hi all,

The Biopython 1.61 release files are live, http://biopython.org/DIST/
and this its tagged on GitHub now, i.e. this commit:
https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5

I've not yet pushed this to PyPI, nor done the announcement.

If anyone would like to write a draft based on the NEWS file
and the previous announcements during the next hour or two,
that would be great. Otherwise I'll do this after dinner...

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 21:30:45 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 21:30:45 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
Message-ID: <CAKVJ-_70VMtqdOFpQe6m9ue0mgbOxOyyhG4kwy77i+PuSM-vEQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> The Biopython 1.61 release files are live, http://biopython.org/DIST/
> and this its tagged on GitHub now, i.e. this commit:
> https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5
>
> I've not yet pushed this to PyPI, nor done the announcement.
>
> If anyone would like to write a draft based on the NEWS file
> and the previous announcements during the next hour or two,
> that would be great. Otherwise I'll do this after dinner...
>
> Thanks,
>
> Peter

Draft text below, based heavily on the NEWS file - any comments?

I'll post the new Tutorial online now, and then update the
Downloads page on the wiki before posting this.

Peter

--

Biopython 1.61 released

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 too follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

Features

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto


From p.j.a.cock at googlemail.com  Tue Feb  5 21:42:06 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 21:42:06 +0000
Subject: [Biopython-dev] Biopython 1.61 release
In-Reply-To: <CALfq9tLHAjX0pjLX_nB+FPkv3X5tJcp+nWRxVEoSc8J1TYhAqg@mail.gmail.com>
References: <CAKVJ-_4qO-Be-yYwzYUmo3xf-Gm++XfZ4ErRUX0rMOWeheVSBw@mail.gmail.com>
	<CAKVJ-_7acjqqkagjTn5KZKudveMT=YDqCgVN45hxOjqYo6goYw@mail.gmail.com>
	<CAKVJ-_5ZE1EAg61o5-jiKEvuGsJcaAO60+jBKLE-g6JSZTsqVA@mail.gmail.com>
	<CAKVJ-_70VMtqdOFpQe6m9ue0mgbOxOyyhG4kwy77i+PuSM-vEQ@mail.gmail.com>
	<CALfq9tLHAjX0pjLX_nB+FPkv3X5tJcp+nWRxVEoSc8J1TYhAqg@mail.gmail.com>
Message-ID: <CAKVJ-_5MCMT0NwP+5FdN+6K=GwVhfcOkNVpVGF_3Y1+1ej2=ew@mail.gmail.com>

On Tue, Feb 5, 2013 at 9:34 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> Hi Peter,
>
> Looks great. Very small typo: in the last sentence of the paragraph about
> platforms, "Python 3.3 too follow" should be "Python 3.3 to follow".

Thanks Lenna :)

I didn't make an installer for Python 3.3 this afternoon, but I will
tomorrow having heard back from the NumPy 1.7 release manager
that there shouldn't be any problems from compiling against their
release candidate:

http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065369.html

On a related point, NumPy are looking at if they can include
pre-compiled installers for 64bit Windows - once that happens
(and it may have to wait until NumPy 1.8), we will need to look
at this too:

http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065339.html

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 22:05:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:05:25 +0000
Subject: [Biopython-dev] Biopython 1.61 released
Message-ID: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>

Dear Biopythoneers,

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment:

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 to follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

New Features:

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO:

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors:

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto

Thank you all.

Release announcement here (RSS feed available):
http://news.open-bio.org/news/2013/02/biopython-1-61-released/

P.S. You can follow @Biopython on Twitter
https://twitter.com/Biopython


From p.j.a.cock at googlemail.com  Tue Feb  5 22:38:32 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:38:32 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <50FD0F2B.1080606@biotech.uni-tuebingen.de>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>

On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
>
>> Kai - would you mind retesting with f_loc5 (the rebased branch)?
>
> The location of the feature that caused trouble for me still looks
> correct. I'm currently running some more sequences, but I'm pretty
> confident that the code will work just fine. The tests I added to the
> genbank parser code for all the problem cases I had pass, after all. :)
>
>> Everyone - does it seem sensible to include this now, ready for the
>> upcoming release (*)? Or perhaps just after the release?
>
> I'd perfer having this in the next release if possible, but of course
> if the release after that is coming up within a reasonable time frame,
> that would work as well.
>
> Cheers,
> Kai

Unless anyone objects, I will apply the (rebased) version of this
f_loc4 / f_loc5 branch later this week (now that Biopython 1.61
is out).

This replaces the SeqFeature use of sub_features with a new
CompoundLocation which I think is a far more natural way to
handle join locations in EMBL/GenBank files.

Also, it means we can offer parsing of GenBank/EMBL style
location lines into (Compound)Location objects directly :)

Regards,

Peter


From w.arindrarto at gmail.com  Wed Feb  6 00:03:52 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 6 Feb 2013 01:03:52 +0100
Subject: [Biopython-dev] [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CADEGkF4-H0cc2zC245gaK3AbN8kZRyByD0xe5o8RfX1patj-qA@mail.gmail.com>

Hi Peter,

> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
>
> Please note we are phasing out support for Python 2.5. We will
> continue support for at least one further release (Biopython 1.62).
> This could be extended given feedback from our users. Focusing on
> Python 2.6 and 2.7 only will make writing Python 3 compatible code
> easier.
>
> New Features:
>
> GenomeDiagram has three new sigils (shapes to illustrate features).
> OCTO shows an octagonal shape, like the existing BOX sigil but with
> the corners cut off. JAGGY shows a box with jagged edges at the start
> and end, intended for things like NNNNN regions in draft genomes.
> Finally BIGARROW is like the existing ARROW sigil but is drawn
> straddling the axis. This is useful for drawing vertically compact
> figures where you do not have overlapping genes.
>
> New module Bio.Graphics.ColorSpiral can generate colors along a spiral
> path through HSV color space. This can be used to make arbitrary
> ?rainbow? scales, for example to color features or cross-links on a
> GenomeDiagram figure.
>
> The Bio.SeqIO module now supports reading sequences from PDB files in
> two different ways. The ?pdb-atom? format determines the sequence as
> it appears in the structure based on the atom coordinate section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you can use
> the ?pdb-seqres? format to read the complete protein sequence as it is
> listed in the PDB header, if available.
>
> The Bio.SeqUtils module how has a seq1 function to turn a sequence
> using three letter amino acid codes into one using the more common one
> letter codes. This acts as the inverse of the existing seq3 function.
>
> The multiple-sequence-alignment object used by Bio.AlignIO etc now
> supports an annotation dictionary. Additional support for per-column
> annotation is planned, with addition and splicing to work like that
> for the SeqRecord per-letter annotation.
>
> The Bio.Motif module has been updated and reorganized. To allow for a
> clean deprecation of the old code, the new motif code is stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was added to
> Bio.Motif.
>
> Experimental Code ? SearchIO:
>
> This release also includes Bow?s Google Summer of Code work writing a
> unified parsing framework for NCBI BLAST (assorted formats including
> tabular and XML), HMMER, BLAT, and other sequence searching tools.
> This is currently available with the new BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental. We?re bundling
> it with the main release to get more public feedback, but with the big
> warning that the API is likely to change. In fact, even the current
> name of Bio.SearchIO may change since unless you are familiar with
> BioPerl its purpose isn?t immediately clear.
>
> Contributors:
>
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
>
> Thank you all.
>
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython

Thanks for doing the release! It feels exciting to see SearchIO code
finally live in the distributions :). Hopefully this will result in
more feedback (and then more improvements ~ likewise for the whole
Biopython as well).

Also, thank you as well to everyone who has criticized / commented /
contributed code to the module :).

cheers,
Bow


From mjldehoon at yahoo.com  Wed Feb  6 01:03:30 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 5 Feb 2013 17:03:30 -0800 (PST)
Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Thanks Peter!
Great to see this new code out.

Best,
-Michiel.

--- On Tue, 2/5/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: [Biopython-announce] Biopython 1.61 released
> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" <biopython at lists.open-bio.org>, "Biopython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Tuesday, February 5, 2013, 5:05 PM
> Dear Biopythoneers,
> 
> Source distributions and Windows installers for Biopython
> 1.61 are now
> available from the downloads page on the Biopython website
> and from
> the Python Package Index (PyPI).
> 
> The updated Biopython Tutorial and Cookbook is online
> (PDF).
> 
> Platforms/Deployment:
> 
> We currently support Python 2.5, 2.6 and 2.7 and also test
> under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and
> Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our
> C
> extensions). We are still encouraging early adopters to help
> test on
> these platforms, and have included a ?beta? installer
> for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
> 
> Please note we are phasing out support for Python 2.5. We
> will
> continue support for at least one further release (Biopython
> 1.62).
> This could be extended given feedback from our users.
> Focusing on
> Python 2.6 and 2.7 only will make writing Python 3
> compatible code
> easier.
> 
> New Features:
> 
> GenomeDiagram has three new sigils (shapes to illustrate
> features).
> OCTO shows an octagonal shape, like the existing BOX sigil
> but with
> the corners cut off. JAGGY shows a box with jagged edges at
> the start
> and end, intended for things like NNNNN regions in draft
> genomes.
> Finally BIGARROW is like the existing ARROW sigil but is
> drawn
> straddling the axis. This is useful for drawing vertically
> compact
> figures where you do not have overlapping genes.
> 
> New module Bio.Graphics.ColorSpiral can generate colors
> along a spiral
> path through HSV color space. This can be used to make
> arbitrary
> ?rainbow? scales, for example to color features or
> cross-links on a
> GenomeDiagram figure.
> 
> The Bio.SeqIO module now supports reading sequences from PDB
> files in
> two different ways. The ?pdb-atom? format determines the
> sequence as
> it appears in the structure based on the atom coordinate
> section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you
> can use
> the ?pdb-seqres? format to read the complete protein
> sequence as it is
> listed in the PDB header, if available.
> 
> The Bio.SeqUtils module how has a seq1 function to turn a
> sequence
> using three letter amino acid codes into one using the more
> common one
> letter codes. This acts as the inverse of the existing seq3
> function.
> 
> The multiple-sequence-alignment object used by Bio.AlignIO
> etc now
> supports an annotation dictionary. Additional support for
> per-column
> annotation is planned, with addition and splicing to work
> like that
> for the SeqRecord per-letter annotation.
> 
> The Bio.Motif module has been updated and reorganized. To
> allow for a
> clean deprecation of the old code, the new motif code is
> stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was
> added to
> Bio.Motif.
> 
> Experimental Code ? SearchIO:
> 
> This release also includes Bow?s Google Summer of Code
> work writing a
> unified parsing framework for NCBI BLAST (assorted formats
> including
> tabular and XML), HMMER, BLAT, and other sequence searching
> tools.
> This is currently available with the new
> BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental.
> We?re bundling
> it with the main release to get more public feedback, but
> with the big
> warning that the API is likely to change. In fact, even the
> current
> name of Bio.SearchIO may change since unless you are
> familiar with
> BioPerl its purpose isn?t immediately clear.
> 
> Contributors:
> 
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
> 
> Thank you all.
> 
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
> 
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython
> 
> _______________________________________________
> Biopython-announce mailing list? -? Biopython-announce at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-announce
> 


From mjldehoon at yahoo.com  Wed Feb  6 01:07:53 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 5 Feb 2013 17:07:53 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
Message-ID: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>

With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:

http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html

Or is anybody else already looking at this module?

Best,
-Michiel.


From arklenna at gmail.com  Wed Feb  6 01:31:16 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 5 Feb 2013 20:31:16 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>
References: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com>
Message-ID: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>

Hi Michiel,

I worked on that a bit early last year. See thread on this bug:

https://redmine.open-bio.org/issues/2619

Namely, I determined that the flex headers aren't required to compile the
flex-generated C, which is a great start.

I also started work on a PLY-based pure Python reimplementation. Pull
request here:

https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have
any questions about what I did and I will do my best to remember...

Cheers,

Lenna


On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> With Biopython 1.61 now out, perhaps this is a good time to tackle
> Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like
> to replace this with a plain C module, or perhaps with a pure-Python
> parser. This issue was previously discussed here:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html
>
> Or is anybody else already looking at this module?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From kieran.mace at gmail.com  Wed Feb  6 02:05:19 2013
From: kieran.mace at gmail.com (Kieran Mace)
Date: Tue, 5 Feb 2013 18:05:19 -0800
Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released
In-Reply-To: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>

Hi.  

I'm wondering if the MafIO module is going to be included in this release?

-Kieran

On Feb 5, 2013, at 17:03, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Thanks Peter!
> Great to see this new code out.
> 
> Best,
> -Michiel.
> 
> --- On Tue, 2/5/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
>> From: Peter Cock <p.j.a.cock at googlemail.com>
>> Subject: [Biopython-announce] Biopython 1.61 released
>> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" <biopython at lists.open-bio.org>, "Biopython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
>> Date: Tuesday, February 5, 2013, 5:05 PM
>> Dear Biopythoneers,
>> 
>> Source distributions and Windows installers for Biopython
>> 1.61 are now
>> available from the downloads page on the Biopython website
>> and from
>> the Python Package Index (PyPI).
>> 
>> The updated Biopython Tutorial and Cookbook is online
>> (PDF).
>> 
>> Platforms/Deployment:
>> 
>> We currently support Python 2.5, 2.6 and 2.7 and also test
>> under
>> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and
>> Jython
>> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our
>> C
>> extensions). We are still encouraging early adopters to help
>> test on
>> these platforms, and have included a ?beta? installer
>> for Python 3.2
>> (and Python 3.3 to follow soon) under 32-bit Windows.
>> 
>> Please note we are phasing out support for Python 2.5. We
>> will
>> continue support for at least one further release (Biopython
>> 1.62).
>> This could be extended given feedback from our users.
>> Focusing on
>> Python 2.6 and 2.7 only will make writing Python 3
>> compatible code
>> easier.
>> 
>> New Features:
>> 
>> GenomeDiagram has three new sigils (shapes to illustrate
>> features).
>> OCTO shows an octagonal shape, like the existing BOX sigil
>> but with
>> the corners cut off. JAGGY shows a box with jagged edges at
>> the start
>> and end, intended for things like NNNNN regions in draft
>> genomes.
>> Finally BIGARROW is like the existing ARROW sigil but is
>> drawn
>> straddling the axis. This is useful for drawing vertically
>> compact
>> figures where you do not have overlapping genes.
>> 
>> New module Bio.Graphics.ColorSpiral can generate colors
>> along a spiral
>> path through HSV color space. This can be used to make
>> arbitrary
>> ?rainbow? scales, for example to color features or
>> cross-links on a
>> GenomeDiagram figure.
>> 
>> The Bio.SeqIO module now supports reading sequences from PDB
>> files in
>> two different ways. The ?pdb-atom? format determines the
>> sequence as
>> it appears in the structure based on the atom coordinate
>> section of
>> the file (via Bio.PDB,
>> so NumPy is currently required for this). Alternatively, you
>> can use
>> the ?pdb-seqres? format to read the complete protein
>> sequence as it is
>> listed in the PDB header, if available.
>> 
>> The Bio.SeqUtils module how has a seq1 function to turn a
>> sequence
>> using three letter amino acid codes into one using the more
>> common one
>> letter codes. This acts as the inverse of the existing seq3
>> function.
>> 
>> The multiple-sequence-alignment object used by Bio.AlignIO
>> etc now
>> supports an annotation dictionary. Additional support for
>> per-column
>> annotation is planned, with addition and splicing to work
>> like that
>> for the SeqRecord per-letter annotation.
>> 
>> The Bio.Motif module has been updated and reorganized. To
>> allow for a
>> clean deprecation of the old code, the new motif code is
>> stored in a
>> new module Bio.motifs, and a PendingDeprecationWarning was
>> added to
>> Bio.Motif.
>> 
>> Experimental Code ? SearchIO:
>> 
>> This release also includes Bow?s Google Summer of Code
>> work writing a
>> unified parsing framework for NCBI BLAST (assorted formats
>> including
>> tabular and XML), HMMER, BLAT, and other sequence searching
>> tools.
>> This is currently available with the new
>> BiopythonExperimentalWarning
>> to indicate that this is still somewhat experimental.
>> We?re bundling
>> it with the main release to get more public feedback, but
>> with the big
>> warning that the API is likely to change. In fact, even the
>> current
>> name of Bio.SearchIO may change since unless you are
>> familiar with
>> BioPerl its purpose isn?t immediately clear.
>> 
>> Contributors:
>> 
>> Brandon Invergo
>> Bryan Lunt (first contribution)
>> Christian Brueffer (first contribution)
>> David Cain
>> Eric Talevich
>> Grace Yeo (first contribution)
>> Jeffrey Chang
>> Jingping Li (first contribution)
>> Kai Blin (first contribution)
>> Leighton Pritchard
>> Lenna Peterson
>> Lucas Sinclair (first contribution)
>> Michiel de Hoon
>> Nick Semenkovich (first contribution)
>> Peter Cock
>> Robert Ernst (first contribution)
>> Tiago Antao
>> Wibowo ?Bow? Arindrarto
>> 
>> Thank you all.
>> 
>> Release announcement here (RSS feed available):
>> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>> 
>> P.S. You can follow @Biopython on Twitter
>> https://twitter.com/Biopython
>> 
>> _______________________________________________
>> Biopython-announce mailing list  -  Biopython-announce at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-announce
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Wed Feb  6 08:37:05 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 6 Feb 2013 08:37:05 +0000
Subject: [Biopython-dev] Biopython 1.61 released
In-Reply-To: <EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>
References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<EBCA5B2B-B62D-4D1D-A21D-B1CF64E5F6DF@gmail.com>
Message-ID: <CAKVJ-_6SOm1j0-273iaEqGH46aiXsCA1-Mb7mmOQCfmjjsgk0w@mail.gmail.com>

On Wednesday, February 6, 2013, Kieran Mace wrote:

> Hi.
>
> I'm wondering if the MafIO module is going to be included in this release?
>
> -Kieran


 I'm not promising but I would hope so. There is some
work to be done first with locations and start/end
information in the SeqRecord.

See also the CompoundLocation discussion.

Peter


From mjldehoon at yahoo.com  Wed Feb  6 08:36:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 6 Feb 2013 00:36:26 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
Message-ID: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Lenna,

Thanks for your reply.
Are you planning to continue your work on the PLY-based mmCIF parser?

Best,
-Michiel

--- On Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com> wrote:

From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
Date: Tuesday, February 5, 2013, 8:31 PM

Hi Michiel,?
I worked on that a bit early last year. See thread on this bug:?
https://redmine.open-bio.org/issues/2619


Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.?
I also started work on a PLY-based pure Python reimplementation. Pull request here:


https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember...


Cheers,
Lenna

On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:


With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:


http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html


Or is anybody else already looking at this module?


Best,

-Michiel.

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Wed Feb  6 21:39:04 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 6 Feb 2013 21:39:04 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] (New) Bio.Entrez.efetch
	does not respect the API docs / spec on HTTP verb use (GET vs. POST)
Message-ID: <redmine.issue-3411.20130206213904@redmine.open-bio.org>


Issue #3411 has been reported by Tom McCoy.

----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb  7 10:20:30 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 7 Feb 2013 10:20:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not
	respect the API docs / spec on HTTP verb use (GET vs. POST)
References: <redmine.issue-3411.20130206213904@redmine.open-bio.org>
Message-ID: <redmine.journal-15092.20130207102030@redmine.open-bio.org>


Issue #3411 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

I don't recall that guideline being in the earlier requirements/documentation when Bio.Entrez was first written, but the fix proposed looks sensible.

(Note - do we need to worry about the ids being a string or a list at that point, and therefore how to count the entries?)

P.S. Resetting assignee to default of the dev mailing list.
----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Feb  7 11:33:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:33:25 +0000
Subject: [Biopython-dev] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CAKVJ-_7bXcXkxFQ9Xx0W3CDwd_QzhYiRKpsFHGL9n5YoSFDtXQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.

For those of you wanting to try Biopython on Python 3.3 on Windows,
there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2.

NumPy 1.7 is their first release to support Python 3.3, and the
official release is expected to be near-identical to this second
release candidate, see:
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html

Regards,

Peter


From p.j.a.cock at googlemail.com  Thu Feb  7 11:53:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:53:40 +0000
Subject: [Biopython-dev] [Biopython] Fwd: Bug in bgzf module
In-Reply-To: <CANJ6P8LT9VoanMszVO=aEeFHYD5AzDTJUoDxoEKyJuUxJ6Dx4g@mail.gmail.com>
References: <CANJ6P8KTPF0DCoOGvFfVAXQkwJtZezncpr4HDDTYn4HAQJjUnQ@mail.gmail.com>
	<CANJ6P8LBkbR89pROYfka4P82TFAPvdLSOiwjEr3gxNgxx=wghw@mail.gmail.com>
	<CAKVJ-_48X3YGXN7ky+LmtQf8YFyscm6e0wtJWZs2ZM8yLyj3Bg@mail.gmail.com>
	<CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
	<CAKVJ-_6tvf3U3MJp0O2Cd6sPsgoMP_yaQtWnkG3yro5vsoXneA@mail.gmail.com>
	<CANJ6P8LT9VoanMszVO=aEeFHYD5AzDTJUoDxoEKyJuUxJ6Dx4g@mail.gmail.com>
Message-ID: <CAKVJ-_7=k-MNcJoVkQuuV4fsSK5rSsjsMnMUxnsj65WQkR6fGQ@mail.gmail.com>

On Wed, Feb 6, 2013 at 10:35 PM, Petra Kubincov?
<petra.kubincova at gmail.com> wrote:
> Hi Peter,
>
> based on your unit test for tell method I've created this:
> http://dl.dropbox.com/u/...
> I hope it's at least partially usable.
>
> Regards,
> Petra

Thanks, I turned that into this commit:
https://github.com/biopython/biopython/commit/194bda7cd4bc292b37fd219f1f95a19e1316ac5a

That lead me to notice a special case with offsets on a block
boundary, see this fix and test:
https://github.com/biopython/biopython/commit/fef7659dacaf93ddeb6270103d8ded6fb89414b7

Peter


From p.j.a.cock at googlemail.com  Thu Feb  7 13:30:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 13:30:31 +0000
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
	<CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
Message-ID: <CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin
> <kai.blin at biotech.uni-tuebingen.de> wrote:
>>
>>> Kai - would you mind retesting with f_loc5 (the rebased branch)?
>>
>> The location of the feature that caused trouble for me still looks
>> correct. I'm currently running some more sequences, but I'm pretty
>> confident that the code will work just fine. The tests I added to the
>> genbank parser code for all the problem cases I had pass, after all. :)
>>
>>> Everyone - does it seem sensible to include this now, ready for the
>>> upcoming release (*)? Or perhaps just after the release?
>>
>> I'd perfer having this in the next release if possible, but of course
>> if the release after that is coming up within a reasonable time frame,
>> that would work as well.
>>
>> Cheers,
>> Kai
>
> Unless anyone objects, I will apply the (rebased) version of this
> f_loc4 / f_loc5 branch later this week (now that Biopython 1.61
> is out).
>
> This replaces the SeqFeature use of sub_features with a new
> CompoundLocation which I think is a far more natural way to
> handle join locations in EMBL/GenBank files.
>
> Also, it means we can offer parsing of GenBank/EMBL style
> location lines into (Compound)Location objects directly :)
>
> Regards,
>
> Peter

Applied to master,
https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b

Peter


From kai.blin at biotech.uni-tuebingen.de  Thu Feb  7 14:47:37 2013
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 07 Feb 2013 15:47:37 +0100
Subject: [Biopython-dev] More 'fun' with GenBank
In-Reply-To: <CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>
References: <50F57BC5.7020607@biotech.uni-tuebingen.de>
	<CAKVJ-_6HpOr+ph9o8Dygu6O0LM=6wHV8wD1tHsZA3CrO32B37Q@mail.gmail.com>
	<CALfq9tK0ZA3wRZCPQ4DHyOd8+n2raAFb6z3Zf-nkZrUBLAy+8Q@mail.gmail.com>
	<CAKVJ-_77kdYzXv_Q_KVqy9jWSNJSgU+PWdVB-DzxdF8TKwUAGg@mail.gmail.com>
	<50F66496.8000109@biotech.uni-tuebingen.de>
	<CAKVJ-_5Tj+POzmYLvHx_nScjE6x9A-HgQPRQ_Ec_Bu1VGGjH6Q@mail.gmail.com>
	<50FD0F2B.1080606@biotech.uni-tuebingen.de>
	<CAKVJ-_60M6aOVgJMU6dNjXD_3B_Ooe7XqU+KW1tYp6zKuhsdaA@mail.gmail.com>
	<CAKVJ-_7LMS6mvgA6cP06To7-46pw-ApqiXVU6MNuWHcw+=VtHw@mail.gmail.com>
Message-ID: <5113BE89.3050303@biotech.uni-tuebingen.de>

On 2013-02-07 14:30, Peter Cock wrote:

Hi Peter,

> Applied to master, 
> https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b

Thanks for that.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From arklenna at gmail.com  Thu Feb  7 18:21:37 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 7 Feb 2013 13:21:37 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
	<1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>

Hi Michiel,

If there are well-defined problems with the PLY parser, I can work on
fixing them. I am not currently working with mmCIF so I am not in the best
position to evaluate where and how the parser needs to be improved.

I am working with X-ray PDB files and I am not sure if my collaborators are
familiar with mmCIF. I have not dealt with NMR files of any type, either.

Cheers,

Lenna


On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Hi Lenna,
>
> Thanks for your reply.
> Are you planning to continue your work on the PLY-based mmCIF parser?
>
> Best,
> -Michiel
>
> --- On *Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com>* wrote:
>
>
> From: Lenna Peterson <arklenna at gmail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, February 5, 2013, 8:31 PM
>
>
> Hi Michiel,
>
> I worked on that a bit early last year. See thread on this bug:
>
> https://redmine.open-bio.org/issues/2619
>
> Namely, I determined that the flex headers aren't required to compile the
> flex-generated C, which is a great start.
>
> I also started work on a PLY-based pure Python reimplementation. Pull
> request here:
>
> https://github.com/biopython/biopython/pull/33
>
> I haven't looked at this code in quite a long time! Let me know if you
> have any questions about what I did and I will do my best to remember...
>
> Cheers,
>
> Lenna
>
>
> On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com<http://mc/compose?to=mjldehoon at yahoo.com>
> > wrote:
>
> With Biopython 1.61 now out, perhaps this is a good time to tackle
> Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like
> to replace this with a plain C module, or perhaps with a pure-Python
> parser. This issue was previously discussed here:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html
>
> Or is anybody else already looking at this module?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org<http://mc/compose?to=Biopython-dev at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>


From anaryin at gmail.com  Thu Feb  7 18:25:37 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Thu, 7 Feb 2013 19:25:37 +0100
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
References: <CALfq9tLhN9Lc4HnQkrQ40eMrr=7=XjGSay=mczb6QERHWtXmUw@mail.gmail.com>
	<1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com>
	<CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
Message-ID: <CAJ9sUYPddfAm-Zp7Wr7y2k1=pqj3wcY5E68Ah1W2P+sL_ojSPQ@mail.gmail.com>

Hi,

In our NMR lab I am pretty sure mmCIF files are not even known.. How widely
used is the format in x-ray labs? I have never seen it outside this mailing
list to be honest.

Best,

Jo?o


From p.j.a.cock at googlemail.com  Fri Feb  8 15:21:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 8 Feb 2013 15:21:46 +0000
Subject: [Biopython-dev] Fwd: [biopython] Newick parser (#156)
In-Reply-To: <biopython/biopython/pull/156@github.com>
References: <biopython/biopython/pull/156@github.com>
Message-ID: <CAKVJ-_5q-n8qvpumBc6OZkAaNDLV4o41LBr+WsWOzGxP0SZV8A@mail.gmail.com>

Eric,

Could you take a look at this please?

Thanks,

Peter

---------- Forwarded message ----------
From: Ben Morris <notifications at github.com>
Date: Fri, Feb 8, 2013 at 3:12 PM
Subject: [biopython] Newick parser (#156)
To: biopython/biopython <biopython at noreply.github.com>


In light of three issues with the Newick parser:

https://redmine.open-bio.org/issues/3409
https://redmine.open-bio.org/issues/3386
https://redmine.open-bio.org/issues/3407

this is a rewrite of the parser from scratch. It supports quoted node
labels and can handle support values either as they were previously handled
or from square-bracketed comments, as requested by Arlin. Additionally,
it's consistently quite fast:

[image: newick_parse_times]<https://f.cloud.github.com/assets/544977/139616/fac0df38-71fe-11e2-91a8-a95ba7c6340b.png>

The unit tests still pass with these changes, and I'm now able to parse
trees that previously raised exceptions.
------------------------------
You can merge this Pull Request by running

  git pull https://github.com/bendmorris/biopython newick

Or view, comment on, or merge it at:

  https://github.com/biopython/biopython/pull/156
Commit Summary

   - A more efficient implementation of a Newick parser (linear time vs.
   quadratic) that makes only a single pass over the text and handles quoted
   labels correctly.
   - Implementing support values and fixing issue when external parentheses
   are missing.

File Changes

   - *M* Bio/Phylo/NewickIO.py<https://github.com/biopython/biopython/pull/156/files#diff-0>(198)

Patch Links:

   - https://github.com/biopython/biopython/pull/156.patch
   - https://github.com/biopython/biopython/pull/156.diff


From mjldehoon at yahoo.com  Sat Feb  9 01:42:23 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 8 Feb 2013 17:42:23 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
Message-ID: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

--- On Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com> wrote:
> If
 there are well-defined problems with the PLY parser, I can work on
> fixing them. I am not currently working with mmCIF so I am not in the
> best position to evaluate where and how the parser needs to be improved.
 

I don't know of any problems with the PLY parser, but since it relies on PLY, it would add another dependency to Biopython. On the other hand, a pure-Python solution may be preferable, as it's easier to maintain and runs with Jython. The C implementation is considerably faster, but I doubt that it really matters since the Python (PLY) parser seems to be fast enough.

I see three options then:
1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining C code to Python.
2) Remove the PLY dependency from the PLY-based parser.
3) Write a new pure-Python parser from scratch.

I'm guessing that 1) will be the most straightforward. Other opinions?

Best,
-Michiel.


--- On Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com> wrote:
If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. 


I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either.?
Cheers,

Lenna

On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

Hi Lenna,

Thanks for your reply.

Are you planning to continue your work on the PLY-based mmCIF parser?

Best,
-Michiel

--- On Tue, 2/5/13, Lenna Peterson <arklenna at gmail.com> wrote:


From: Lenna Peterson <arklenna at gmail.com>
Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)

To: "Michiel de Hoon" <mjldehoon at yahoo.com>
Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>

Date: Tuesday, February 5, 2013, 8:31 PM

Hi Michiel,?
I worked on that a bit early last year. See thread on this bug:?
https://redmine.open-bio.org/issues/2619


Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.?
I also started work on a PLY-based pure Python reimplementation. Pull request here:


https://github.com/biopython/biopython/pull/33

I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember...


Cheers,
Lenna

On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:


With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here:


http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html


Or is anybody else already looking at this module?


Best,

-Michiel.

_______________________________________________

Biopython-dev mailing list

Biopython-dev at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Sat Feb  9 08:22:31 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sat, 9 Feb 2013 08:22:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not
	respect the API docs / spec on HTTP verb use (GET vs. POST)
References: <redmine.issue-3411.20130206213904@redmine.open-bio.org>
Message-ID: <redmine.journal-15095.20130209082231@redmine.open-bio.org>


Issue #3411 has been updated by Michiel de Hoon.


Fixed (using a slightly different code); see revision f1836165.
----------------------------------------
Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST)
https://redmine.open-bio.org/issues/3411

Author: Tom McCoy
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


"Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*."
-- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch

Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Sat Feb  9 11:53:31 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 11:53:31 +0000
Subject: [Biopython-dev] Deprecating Bio.Index?
Message-ID: <CAKVJ-_67V+7u47d4shLbkC79tdj78SU6H2Q_SGZ8ztYNOcmhUg@mail.gmail.com>

Hello all,

Does anyone still use Bio.Index? I don't think any of Biopython
itself does nowadays, so perhaps we can deprecate this?

https://github.com/biopython/biopython/blob/master/Bio/Index.py

(We should of course ask on the main list first just in case)

Regards,

Peter


From colin.aibn at gmail.com  Sat Feb  9 13:06:13 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sat, 9 Feb 2013 23:06:13 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
Message-ID: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>

Hi everyone,
                  I have a question about the implementation of
high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
output file in XML format I am parsing and this is one of the hits (removed
the alignment details to save space):

        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
          <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
          <Hit_accession>111</Hit_accession>
          <Hit_len>1893</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>3352.79</Hsp_bit-score>
              <Hsp_score>1815</Hsp_score>
              <Hsp_evalue>0</Hsp_evalue>
              <Hsp_query-from>1</Hsp_query-from>
              <Hsp_query-to>1893</Hsp_query-to>
              <Hsp_hit-from>1</Hsp_hit-from>
              <Hsp_hit-to>1893</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>1867</Hsp_identity>
              <Hsp_positive>1867</Hsp_positive>
              <Hsp_gaps>0</Hsp_gaps>
            </Hsp>
            <Hsp>
              <Hsp_num>2</Hsp_num>
              <Hsp_bit-score>399.997</Hsp_bit-score>
              <Hsp_score>216</Hsp_score>
              <Hsp_evalue>2.88061e-111</Hsp_evalue>
              <Hsp_query-from>331</Hsp_query-from>
              <Hsp_query-to>881</Hsp_query-to>
              <Hsp_hit-from>22</Hsp_hit-from>
              <Hsp_hit-to>581</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>452</Hsp_identity>
              <Hsp_positive>452</Hsp_positive>
              <Hsp_gaps>19</Hsp_gaps>
              <Hsp_align-len>565</Hsp_align-len>
            </Hsp>

Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
"Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
the BlastResult, both values are equal to 0:

>>> blast_record[0][0].query_start
0
>>> blast_record[0][0].hit_start
0

However, when I access the end objects for the query and hit, the result
isn't 1892 (zero based 1893) but 1893:

>>> blast_record[0][0].query_end
1893
>>> blast_record[0][0].hit_end
1893

Is this correct? I find it a little confusing that one result is zero-based
and the other one-based.

Thanks
Colin


From p.j.a.cock at googlemail.com  Sat Feb  9 13:16:43 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 13:16:43 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
Message-ID: <CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>

On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> Hi everyone,
>                   I have a question about the implementation of
> high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> output file in XML format I am parsing and this is one of the hits (removed
> the alignment details to save space):
>
>         <Hit>
>           <Hit_num>1</Hit_num>
>           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
>           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
>           <Hit_accession>111</Hit_accession>
>           <Hit_len>1893</Hit_len>
>           <Hit_hsps>
>             <Hsp>
>               <Hsp_num>1</Hsp_num>
>               <Hsp_bit-score>3352.79</Hsp_bit-score>
>               <Hsp_score>1815</Hsp_score>
>               <Hsp_evalue>0</Hsp_evalue>
>               <Hsp_query-from>1</Hsp_query-from>
>               <Hsp_query-to>1893</Hsp_query-to>
>               <Hsp_hit-from>1</Hsp_hit-from>
>               <Hsp_hit-to>1893</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>1867</Hsp_identity>
>               <Hsp_positive>1867</Hsp_positive>
>               <Hsp_gaps>0</Hsp_gaps>
>             </Hsp>
>             <Hsp>
>               <Hsp_num>2</Hsp_num>
>               <Hsp_bit-score>399.997</Hsp_bit-score>
>               <Hsp_score>216</Hsp_score>
>               <Hsp_evalue>2.88061e-111</Hsp_evalue>
>               <Hsp_query-from>331</Hsp_query-from>
>               <Hsp_query-to>881</Hsp_query-to>
>               <Hsp_hit-from>22</Hsp_hit-from>
>               <Hsp_hit-to>581</Hsp_hit-to>
>               <Hsp_query-frame>1</Hsp_query-frame>
>               <Hsp_hit-frame>1</Hsp_hit-frame>
>               <Hsp_identity>452</Hsp_identity>
>               <Hsp_positive>452</Hsp_positive>
>               <Hsp_gaps>19</Hsp_gaps>
>               <Hsp_align-len>565</Hsp_align-len>
>             </Hsp>
>
> Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
> the BlastResult, both values are equal to 0:
>
>>>> blast_record[0][0].query_start
> 0
>>>> blast_record[0][0].hit_start
> 0
>
> However, when I access the end objects for the query and hit, the result
> isn't 1892 (zero based 1893) but 1893:
>
>>>> blast_record[0][0].query_end
> 1893
>>>> blast_record[0][0].hit_end
> 1893
>
> Is this correct? I find it a little confusing that one result is zero-based
> and the other one-based.
>
> Thanks
> Colin

Hi Colin,

The SearchIO positions like elsewhere in Biopython should be
using Python style counting. Looking at this one:

               <Hsp_hit-from>1</Hsp_hit-from>
               <Hsp_hit-to>1893</Hsp_hit-to>

That is like a GenBank/EMBL location 1..1893 which in Python string
slicing is [0:1893], so the start has -1 but the end is unchanged. The
nice thing is the length is 1893 and is given as the difference of the
Python slicing style end and start.

Perhaps we need to work on the help text? Any suggestions?

Thanks,

Peter


From colin.aibn at gmail.com  Sat Feb  9 13:54:42 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sat, 9 Feb 2013 23:54:42 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
Message-ID: <CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>

Hi Peter,
             Thanks for getting back to me so quickly.

I'm curious about the benefits of having these values in Python string
slicing format? I haven't come across this very often, I'm used to seeing
values systematically zero or one-based.

Would it be easier to keep the range variables hit_range and hit_range_all
in slicing format and the start and end variables in sequence position
format so that they represent the actual BLAST results?

I had a look at some of the code and I can't see the slicing format
mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be
helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end,
query_start, and query_end so that if people are interested they can have a
look at the files and see what they mean.

Thanks
Colin


On Sat, Feb 9, 2013 at 11:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> > Hi everyone,
> >                   I have a question about the implementation of
> > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> > output file in XML format I am parsing and this is one of the hits
> (removed
> > the alignment details to save space):
> >
> >         <Hit>
> >           <Hit_num>1</Hit_num>
> >           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
> >           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
> >           <Hit_accession>111</Hit_accession>
> >           <Hit_len>1893</Hit_len>
> >           <Hit_hsps>
> >             <Hsp>
> >               <Hsp_num>1</Hsp_num>
> >               <Hsp_bit-score>3352.79</Hsp_bit-score>
> >               <Hsp_score>1815</Hsp_score>
> >               <Hsp_evalue>0</Hsp_evalue>
> >               <Hsp_query-from>1</Hsp_query-from>
> >               <Hsp_query-to>1893</Hsp_query-to>
> >               <Hsp_hit-from>1</Hsp_hit-from>
> >               <Hsp_hit-to>1893</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>1867</Hsp_identity>
> >               <Hsp_positive>1867</Hsp_positive>
> >               <Hsp_gaps>0</Hsp_gaps>
> >             </Hsp>
> >             <Hsp>
> >               <Hsp_num>2</Hsp_num>
> >               <Hsp_bit-score>399.997</Hsp_bit-score>
> >               <Hsp_score>216</Hsp_score>
> >               <Hsp_evalue>2.88061e-111</Hsp_evalue>
> >               <Hsp_query-from>331</Hsp_query-from>
> >               <Hsp_query-to>881</Hsp_query-to>
> >               <Hsp_hit-from>22</Hsp_hit-from>
> >               <Hsp_hit-to>581</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>452</Hsp_identity>
> >               <Hsp_positive>452</Hsp_positive>
> >               <Hsp_gaps>19</Hsp_gaps>
> >               <Hsp_align-len>565</Hsp_align-len>
> >             </Hsp>
> >
> > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects
> from
> > the BlastResult, both values are equal to 0:
> >
> >>>> blast_record[0][0].query_start
> > 0
> >>>> blast_record[0][0].hit_start
> > 0
> >
> > However, when I access the end objects for the query and hit, the result
> > isn't 1892 (zero based 1893) but 1893:
> >
> >>>> blast_record[0][0].query_end
> > 1893
> >>>> blast_record[0][0].hit_end
> > 1893
> >
> > Is this correct? I find it a little confusing that one result is
> zero-based
> > and the other one-based.
> >
> > Thanks
> > Colin
>
> Hi Colin,
>
> The SearchIO positions like elsewhere in Biopython should be
> using Python style counting. Looking at this one:
>
>                <Hsp_hit-from>1</Hsp_hit-from>
>                <Hsp_hit-to>1893</Hsp_hit-to>
>
> That is like a GenBank/EMBL location 1..1893 which in Python string
> slicing is [0:1893], so the start has -1 but the end is unchanged. The
> nice thing is the length is 1893 and is given as the difference of the
> Python slicing style end and start.
>
> Perhaps we need to work on the help text? Any suggestions?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Sat Feb  9 14:30:26 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 14:30:26 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
Message-ID: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>

On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> Hi Peter,
>              Thanks for getting back to me so quickly.
>

Thank you - the main reason for including SearchIO in Biopython 1.61
as 'experimental code' is to get wider testing and feedback (hopefully
an approach that will work well and we can use this more in future for
other new code).

> I'm curious about the benefits of having these values in Python string
> slicing format? I haven't come across this very often, I'm used to seeing
> values systematically zero or one-based.

Once you're used to Python slicing it becomes very natural.

> Would it be easier to keep the range variables hit_range and hit_range_all
> in slicing format and the start and end variables in sequence position
> format so that they represent the actual BLAST results?

One reason for this is to be consistent across all the formats supported
in SearchIO, and since Biopython is a Python library following Python
norms seems most natural.

> I had a look at some of the code and I can't see the slicing format
> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be
> helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end,
> query_start, and query_end so that if people are interested they can have a
> look at the files and see what they mean.
>
> Thanks
> Colin

OK, so some clarification with examples in the docstrings is needed.
How about the Tutorial chapter?

Thanks,

Peter


From chapmanb at 50mail.com  Sat Feb  9 14:43:26 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 09 Feb 2013 09:43:26 -0500
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
Message-ID: <87a9rdy2cx.fsf@fastmail.fm>


Colin;

>> I'm curious about the benefits of having these values in Python string
>> slicing format? I haven't come across this very often, I'm used to seeing
>> values systematically zero or one-based.

To clarify further in addition to Peter's response, the 0-based
half-open and 1-based closed systems are the two systems you're
referring to. Python, and most programming languages, use the 0-based
half open indexing approach which is what SearchIO is converting to.
Aaron has a nice response on BioStars while explains the differences in
more details:

http://www.biostars.org/p/6373/#6377

Brad


From colin.aibn at gmail.com  Sat Feb  9 15:18:33 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 01:18:33 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <87a9rdy2cx.fsf@fastmail.fm>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<87a9rdy2cx.fsf@fastmail.fm>
Message-ID: <CAF++dEc0-+QQXxPgiHf9WY=6oK74WjzW9od2E=q_A_xvK67PQQ@mail.gmail.com>

Interesting commentary from Edsger Dijkstra as well:

http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF

if possible, I would definitely add some of these links to either the
tutorial or the code

Colin


On Sun, Feb 10, 2013 at 12:43 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Colin;
>
> >> I'm curious about the benefits of having these values in Python string
> >> slicing format? I haven't come across this very often, I'm used to
> seeing
> >> values systematically zero or one-based.
>
> To clarify further in addition to Peter's response, the 0-based
> half-open and 1-based closed systems are the two systems you're
> referring to. Python, and most programming languages, use the 0-based
> half open indexing approach which is what SearchIO is converting to.
> Aaron has a nice response on BioStars while explains the differences in
> more details:
>
> http://www.biostars.org/p/6373/#6377
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From Markus.Piotrowski at ruhr-uni-bochum.de  Sat Feb  9 15:12:12 2013
From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski)
Date: 9 Feb 2013 16:12:12 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
Message-ID: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>

Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the 
xml result. So query_start and sbjct_start (BTW, not hit_start) return 
the values from <Hsp_query-from> and <Hsp_hit-from>.
Thus, my first guess would be that a search function that can return an 
entity 'query_start' will return the value that is written in the file.

Markus

Am 2013-02-09 15:30, schrieb Peter Cock:
> On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer <colin.aibn at gmail.com> 
> wrote:
>> Hi Peter,
>>              Thanks for getting back to me so quickly.
>>
>
> Thank you - the main reason for including SearchIO in Biopython 1.61
> as 'experimental code' is to get wider testing and feedback 
> (hopefully
> an approach that will work well and we can use this more in future 
> for
> other new code).
>
>> I'm curious about the benefits of having these values in Python 
>> string
>> slicing format? I haven't come across this very often, I'm used to 
>> seeing
>> values systematically zero or one-based.
>
> Once you're used to Python slicing it becomes very natural.
>
>> Would it be easier to keep the range variables hit_range and 
>> hit_range_all
>> in slicing format and the start and end variables in sequence 
>> position
>> format so that they represent the actual BLAST results?
>
> One reason for this is to be consistent across all the formats 
> supported
> in SearchIO, and since Biopython is a Python library following Python
> norms seems most natural.
>
>> I had a look at some of the code and I can't see the slicing format
>> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would 
>> probably be
>> helpful to explain the values in Hsp.py as a ** mark on hsp_start, 
>> hsp_end,
>> query_start, and query_end so that if people are interested they can 
>> have a
>> look at the files and see what they mean.
>>
>> Thanks
>> Colin
>
> OK, so some clarification with examples in the docstrings is needed.
> How about the Tutorial chapter?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From colin.aibn at gmail.com  Sat Feb  9 15:19:26 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 01:19:26 +1000
Subject: [Biopython-dev] Fwd:  SearchIO HSP indexing
In-Reply-To: <CAF++dEeHsJJDfYkQs93hmb8mfATfNJs4dX_4pn8=jECEEF0wUQ@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<CAF++dEeHsJJDfYkQs93hmb8mfATfNJs4dX_4pn8=jECEEF0wUQ@mail.gmail.com>
Message-ID: <CAF++dEd5vRbsX6PC8Ng8SbEqLEe90=+EWn45F1Jn+wG7VzV-Cg@mail.gmail.com>

> Hi Peter,

>  >              Thanks for getting back to me so quickly.
> >
>
> Thank you - the main reason for including SearchIO in Biopython 1.61
> as 'experimental code' is to get wider testing and feedback (hopefully
> an approach that will work well and we can use this more in future for
> other new code).
>
>
I've been using it for a couple months now and i definitely prefer it over
the existing parser.


>  > I'm curious about the benefits of having these values in Python string
> > slicing format? I haven't come across this very often, I'm used to seeing
> > values systematically zero or one-based.
>
> Once you're used to Python slicing it becomes very natural.
>
>
> Would it be easier to keep the range variables hit_range and hit_range_all
> > in slicing format and the start and end variables in sequence position
> > format so that they represent the actual BLAST results?
>
> One reason for this is to be consistent across all the formats supported
> in SearchIO, and since Biopython is a Python library following Python
> norms seems most natural.
>
> > I had a look at some of the code and I can't see the slicing format
> > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably
> be
> > helpful to explain the values in Hsp.py as a ** mark on hsp_start,
> hsp_end,
> > query_start, and query_end so that if people are interested they can
> have a
> > look at the files and see what they mean.
> >
> > Thanks
> > Colin
>
> OK, so some clarification with examples in the docstrings is needed.
> How about the Tutorial chapter?
>
> I would definitely add comments to the Hsp.py file and if there is a
tutorial that people use, I would also update that as that would be the
first place most people would look.

I was wondering if there was any code in SearchIO to align high-scoring
segment pairs against the same hit? I see the fragmentation code but that
seems specific to BLAT results and when I look at the HSPFragments in the
QueryResult object it does not seem to combine multiple HSPs against the
same hit even if they are not overlapping.

Thanks
Colin


From p.j.a.cock at googlemail.com  Sat Feb  9 15:36:34 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 9 Feb 2013 15:36:34 +0000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
Message-ID: <CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>

> Am 2013-02-09 15:30, schrieb Peter Cock:
>> One reason for this is to be consistent across all the formats supported
>> in SearchIO, and since Biopython is a Python library following Python
>> norms seems most natural.

On Sat, Feb 9, 2013 at 3:12 PM, Markus Piotrowski
<Markus.Piotrowski at ruhr-uni-bochum.de> wrote:
> Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml
> result.

Yes, the old Bio.Blast parsers do not try and convert the co-ordinates.
Given they were only handling BLAST output that was a justifiable
option. With Bio.SearchIO we're not just modelling BLAST output
though - it covers multiple formats with different conventions.

Peter


From w.arindrarto at gmail.com  Sat Feb  9 16:56:46 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 9 Feb 2013 17:56:46 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
Message-ID: <CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>

Hi everyone,

Colin, thanks for the feedback! Peter has explained the rationale
behind the decision, so I would like to add that there has been indeed
an explanation of this behavior in the tutorial
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and
the code (https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100).
I do admit that the explanation in the code could be made clearer with
some comments in hsp.py ~ which I can add :).

As for your point about the alignment code:

> I was wondering if there was any code in SearchIO to align high-scoring
> segment pairs against the same hit? I see the fragmentation code but that
> seems specific to BLAT results and when I look at the HSPFragments in the
> QueryResult object it does not seem to combine multiple HSPs against the
> same hit even if they are not overlapping.

SearchIO relies on BLAST to do this ~ which has already grouped each
HSP aligning to the same database sequence in one group (all of which
is accessible through the Hit object). I've always assumed that if two
HSPs came from the same database entry (Hit), they are grouped into
one Hit by BLAST, regardless of whether they overlap or not. Have you
seen any results from BLAST that shows otherwise?

cheers,
Bow


From arklenna at gmail.com  Sat Feb  9 17:14:01 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Sat, 9 Feb 2013 12:14:01 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tJenMObZGHU6EgEo5oUd7=1K-NWsC_cN06HCdJkE=Et3A@mail.gmail.com>
	<1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CALfq9tKESkfdBfBFY7RWMF0tpo+cPaX7pcrwQg-OVzmNZvtxTA@mail.gmail.com>

On Fri, Feb 8, 2013 at 8:42 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Hi Lenna,
>
>
> --- On *Thu, 2/7/13, Lenna Peterson <arklenna at gmail.com>* wrote:
> > If there are well-defined problems with the PLY parser, I can work on
> > fixing them. I am not currently working with mmCIF so I am not in the
> > best position to evaluate where and how the parser needs to be improved.
>
> I don't know of any problems with the PLY parser, but since it relies on
> PLY, it would add another dependency to Biopython.
>


> On the other hand, a pure-Python solution may be preferable, as it's
> easier to maintain and runs with Jython.
>


As far as I can tell, PLY works with Jython, discussion on this thread:
http://permalink.gmane.org/gmane.comp.python.ply/402

Not sure about pypy. One option would be to deploy the PLY parser for
non-CPython platforms and tell them to manually install PLY if they want to
use mmCIF. Not ideal, but is that preferred to an explicit dependency?


>
> I see three options then:
> 1) Remove the lex stuff from lex.yy.c, and optionally convert the
> remaining C code to Python.
>

As is, the C compiles cross platform with no dependencies. There is nothing
but lex stuff in lex.yy.c - I'm not quite sure what you mean here.


> 2) Remove the PLY dependency from the PLY-based parser.
> 3) Write a new pure-Python parser from scratch.
>
>
I'm not sure whether there is an appreciable difference between options 2
and 3.

Cheers,

Lenna


From mjldehoon at yahoo.com  Sun Feb 10 03:55:37 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 9 Feb 2013 19:55:37 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tL7caRXcM1BmW-ksphdDP6E=J0bYE-x-rmYVbHtV+CCsA@mail.gmail.com>
Message-ID: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

>--- On Sat, 2/9/13, Lenna Peterson <lennalenna at gmail.com> wrote:
> > 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining
> >? C code to Python.

> As is, the C?compiles cross platform with?no?dependencies.?There is nothing
> but?lex stuff in lex.yy.c - I'm not quite sure what you mean here. 

Currently lex.yy.c contains lots of code that is generated automatically by lex but is not actually needed for the mmCIF parser. I was thinking to remove those parts, and to clean up the remainder so that the code is understandable (allowing us to fix any bugs, or to convert it to pure Python).

Best,
-Michiel


From colin.aibn at gmail.com  Sun Feb 10 07:28:36 2013
From: colin.aibn at gmail.com (Colin Archer)
Date: Sun, 10 Feb 2013 17:28:36 +1000
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
	<CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
Message-ID: <CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>

On Sun, Feb 10, 2013 at 2:56 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com>wrote:

> Hi everyone,
>
> Colin, thanks for the feedback! Peter has explained the rationale
> behind the decision, so I would like to add that there has been indeed
> an explanation of this behavior in the tutorial
> (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and
> the code (
> https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100
> ).
> I do admit that the explanation in the code could be made clearer with
> some comments in hsp.py ~ which I can add :).
>
> As for your point about the alignment code:
>
> > I was wondering if there was any code in SearchIO to align high-scoring
> > segment pairs against the same hit? I see the fragmentation code but that
> > seems specific to BLAT results and when I look at the HSPFragments in the
> > QueryResult object it does not seem to combine multiple HSPs against the
> > same hit even if they are not overlapping.
>
> SearchIO relies on BLAST to do this ~ which has already grouped each
> HSP aligning to the same database sequence in one group (all of which
> is accessible through the Hit object). I've always assumed that if two
> HSPs came from the same database entry (Hit), they are grouped into
> one Hit by BLAST, regardless of whether they overlap or not. Have you
> seen any results from BLAST that shows otherwise?
>
>
I have a couple of examples where BLAST doesn't combine the HSPs as you
would expect. It seems to mainly occur because the HSP alignments overlap
and to combine them would mean including more gaps in each hsp. For
example, *ftsK* in *E. coli* (ftsK.blast) or *aceF* in *E. coli* (aceF.blast).
In the second case, the first HSP spans the entire query and there are two
additional HSPs that are overlapped by it.

I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the
HSPs somewhat when required but some people are hesitant to use their
method in certain situations (e.g., with tblastn results that overestimate
some of the metrics). They also implement additional functionality so that
the user could do a complete smith-waterman alignment if they wanted to.

Thanks
Colin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aceF.blast
Type: application/octet-stream
Size: 12124 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130210/34bb5bfb/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ftsK.blast
Type: application/octet-stream
Size: 18537 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20130210/34bb5bfb/attachment-0005.obj>

From w.arindrarto at gmail.com  Sun Feb 10 15:31:51 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sun, 10 Feb 2013 16:31:51 +0100
Subject: [Biopython-dev] SearchIO HSP indexing
In-Reply-To: <CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>
References: <CAF++dEeM3YzHVsU1nfF7_yO-d7GH-JpdG0mcXLvwvtWrH4hPxQ@mail.gmail.com>
	<CAKVJ-_6+UGcN+VygcdqT-r2K6YCrF9HfEf+giiSmiVVqaqTY4Q@mail.gmail.com>
	<CAF++dEejrr_bRNoTPBPFwVsSYjuEGwtMMiJGrHvCpaEnnOxV1Q@mail.gmail.com>
	<CAKVJ-_4r6HAic5epn=kkyhrpZ6jHnd=r7HmXNZmqXGSFjF_Kdw@mail.gmail.com>
	<739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de>
	<CAKVJ-_7_9=ft16QT4dQY+VRfh5KAybZMr_H6v2Pqi9iAviW1RA@mail.gmail.com>
	<CADEGkF5HuG_goa0thibN4sFbq7W5RBeD6VV-FUrQGmGySdzOSg@mail.gmail.com>
	<CAF++dEd2rejB-kq3u2YPL-pDvoWYJYH9HwnD4RHJtcqRB3KD7Q@mail.gmail.com>
Message-ID: <CADEGkF7F=oUSwvr1WrqdZuqB8w7eR25c9hytO4nWPyuU52=tvw@mail.gmail.com>

Hi Colin,

>> As for your point about the alignment code:
>>
>> > I was wondering if there was any code in SearchIO to align high-scoring
>> > segment pairs against the same hit? I see the fragmentation code but
>> > that
>> > seems specific to BLAT results and when I look at the HSPFragments in
>> > the
>> > QueryResult object it does not seem to combine multiple HSPs against the
>> > same hit even if they are not overlapping.
>>
>> SearchIO relies on BLAST to do this ~ which has already grouped each
>> HSP aligning to the same database sequence in one group (all of which
>> is accessible through the Hit object). I've always assumed that if two
>> HSPs came from the same database entry (Hit), they are grouped into
>> one Hit by BLAST, regardless of whether they overlap or not. Have you
>> seen any results from BLAST that shows otherwise?
>>
>
> I have a couple of examples where BLAST doesn't combine the HSPs as you
> would expect. It seems to mainly occur because the HSP alignments overlap
> and to combine them would mean including more gaps in each hsp. For example,
> ftsK in E. coli (ftsK.blast) or aceF in E. coli (aceF.blast). In the second
> case, the first HSP spans the entire query and there are two additional HSPs
> that are overlapped by it.
>
> I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the
> HSPs somewhat when required but some people are hesitant to use their method
> in certain situations (e.g., with tblastn results that overestimate some of
> the metrics). They also implement additional functionality so that the user
> could do a complete smith-waterman alignment if they wanted to.

Thanks for including the files!

At the moment, no, SearchIO doesn't have any code to 'assemble'/'tile'
overlapping HSPs. The fragment bits you're seeing in the BLAT parser
is simply the name we use to refer to noncontiguous blocks inside a
reported HSP.

We may be able to add some functions to return the intervals for such
overlapping HSPs, given a Hit object. But I'm a bit hesitant to go
further than that (i.e. to the point where we merge the statistics of
the each HSP to assign to the assembled HSP). This is mostly because
such assembly seems very specific to the program's statistics and
format (BLAST's merge would be different from BLAT? and BLAST XML's
merge may be different from tabular BLAST). If anything, perhaps these
functions deserve their own space in SearchUtils (taking parallels
from Bio.SeqIO and Bio.SeqUtils)?

regards,
Bow


From redmine at redmine.open-bio.org  Sun Feb 10 22:13:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:13:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for
	module NCBIWWW.qblast
Message-ID: <redmine.issue-3412.20130210221320@redmine.open-bio.org>


Issue #3412 has been reported by Vincent Davis.

----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Feb 10 22:13:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:13:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for
	module NCBIWWW.qblast
Message-ID: <redmine.issue-3412.20130210221320@redmine.open-bio.org>


Issue #3412 has been reported by Vincent Davis.

----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Feb 10 22:40:21 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 10 Feb 2013 22:40:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3412] (Resolved) Bad URL in docs
	for module NCBIWWW.qblast
References: <redmine.issue-3412.20130210221320@redmine.open-bio.org>
Message-ID: <redmine.journal-15096.20130210224021@redmine.open-bio.org>


Issue #3412 has been updated by Peter Cock.

Status changed from New to Resolved
% Done changed from 0 to 100

The NCBI seem to have broken that link, and if they did setup a redirect for a while it has stopped now.

I'll use http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html instead I think,
https://github.com/biopython/biopython/commit/ae84cc8cb828e868883c75a980fcd83585c338f8

Thanks!
----------------------------------------
Bug #3412: Bad URL in docs for module NCBIWWW.qblast
https://redmine.open-bio.org/issues/3412

Author: Vincent Davis
Status: Resolved
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: 
URL: 


At the bottom of  "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html
This link is not valid.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From eric.talevich at gmail.com  Mon Feb 11 02:11:54 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 21:11:54 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
Message-ID: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>

Hi Ben,

I've noticed a couple new characteristics of the Newick parser that I had
questions about.

1. There is no longer a way to tell the parser to treat internal node
labels as confidence values. Lots of files in the wild do record the
support values here, including those generated by RAxML, PhyML, FastTree
and MrBayes, so I'd like to restore this option, and perhaps make it the
default. I think the condition is:

if not (self.values_are_confidence or self.comments_are_confidence or
current_clade.is_terminal()): # parse confidence from node label

Is there an easy way to add this option to the parser? I'm trying to get
this to work in the "else" clause in parse_tree, where unquoted node labels
are handled.


2. Confidence values are required to be between 0.0 and 1.0. Also, support
values recorded as integers are treated as percentages and divided by 100
automatically. The phyloXML spec doesn't have this range requirement. RAxML
scales bootstraps to 100, but PhyML records the raw number of supporting
bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
replicates). So, I'd prefer to leave the confidence values as they are,
requiring only that they be numeric. Thoughts?


Thanks,
Eric


From ben at bendmorris.com  Mon Feb 11 02:39:24 2013
From: ben at bendmorris.com (Ben Morris)
Date: Sun, 10 Feb 2013 21:39:24 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
Message-ID: <CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>

On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Hi Ben,
>
> I've noticed a couple new characteristics of the Newick parser that I had
> questions about.
>
> 1. There is no longer a way to tell the parser to treat internal node labels
> as confidence values. Lots of files in the wild do record the support values
> here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> I'd like to restore this option, and perhaps make it the default. I think
> the condition is:
>
> if not (self.values_are_confidence or self.comments_are_confidence or
> current_clade.is_terminal()): # parse confidence from node label
>
> Is there an easy way to add this option to the parser? I'm trying to get
> this to work in the "else" clause in parse_tree, where unquoted node labels
> are handled.
>
>
> 2. Confidence values are required to be between 0.0 and 1.0. Also, support
> values recorded as integers are treated as percentages and divided by 100
> automatically. The phyloXML spec doesn't have this range requirement. RAxML
> scales bootstraps to 100, but PhyML records the raw number of supporting
> bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> replicates). So, I'd prefer to leave the confidence values as they are,
> requiring only that they be numeric. Thoughts?
>
>
> Thanks,
> Eric

1. One issue is that current_clade.is_terminal() will always be true
at that point because current_clade's children haven't been parsed
yet. Putting the check in the "process_clade" function (which is
called when the closing paren is hit, and therefore all children
should have been parsed) should fix this.

So, if values_are_confidence and comments_are_confidence are both
false and a node label is numeric, it should be treated as confidence,
and clade.name should be set to None - is that correct?

2. This should be as simple as removing current lines 123-127.

~Ben


From eric.talevich at gmail.com  Mon Feb 11 03:30:47 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 22:30:47 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
Message-ID: <CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>

On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi Ben,
> >
> > I've noticed a couple new characteristics of the Newick parser that I had
> > questions about.
> >
> > 1. There is no longer a way to tell the parser to treat internal node
> labels
> > as confidence values. Lots of files in the wild do record the support
> values
> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> > I'd like to restore this option, and perhaps make it the default. I think
> > the condition is:
> >
> > if not (self.values_are_confidence or self.comments_are_confidence or
> > current_clade.is_terminal()): # parse confidence from node label
> >
> > Is there an easy way to add this option to the parser? I'm trying to get
> > this to work in the "else" clause in parse_tree, where unquoted node
> labels
> > are handled.
> >
> >
> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> support
> > values recorded as integers are treated as percentages and divided by 100
> > automatically. The phyloXML spec doesn't have this range requirement.
> RAxML
> > scales bootstraps to 100, but PhyML records the raw number of supporting
> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> > replicates). So, I'd prefer to leave the confidence values as they are,
> > requiring only that they be numeric. Thoughts?
> >
> >
> > Thanks,
> > Eric
>
> 1. One issue is that current_clade.is_terminal() will always be true
> at that point because current_clade's children haven't been parsed
> yet. Putting the check in the "process_clade" function (which is
> called when the closing paren is hit, and therefore all children
> should have been parsed) should fix this.
>
> So, if values_are_confidence and comments_are_confidence are both
> false and a node label is numeric, it should be treated as confidence,
> and clade.name should be set to None - is that correct?
>
> 2. This should be as simple as removing current lines 123-127.
>
> ~Ben
>


Thanks. Here's #2:
https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a

I agree with your assessment of #1, but haven't been able to get it working
yet. I'm leaving Bug #3407 open for now:
https://redmine.open-bio.org/issues/3407


From ben at bendmorris.com  Mon Feb 11 04:04:45 2013
From: ben at bendmorris.com (Ben Morris)
Date: Sun, 10 Feb 2013 23:04:45 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
	<CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
Message-ID: <CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>

On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:
>>
>> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com>
>> wrote:
>> > Hi Ben,
>> >
>> > I've noticed a couple new characteristics of the Newick parser that I
>> > had
>> > questions about.
>> >
>> > 1. There is no longer a way to tell the parser to treat internal node
>> > labels
>> > as confidence values. Lots of files in the wild do record the support
>> > values
>> > here, including those generated by RAxML, PhyML, FastTree and MrBayes,
>> > so
>> > I'd like to restore this option, and perhaps make it the default. I
>> > think
>> > the condition is:
>> >
>> > if not (self.values_are_confidence or self.comments_are_confidence or
>> > current_clade.is_terminal()): # parse confidence from node label
>> >
>> > Is there an easy way to add this option to the parser? I'm trying to get
>> > this to work in the "else" clause in parse_tree, where unquoted node
>> > labels
>> > are handled.
>> >
>> >
>> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
>> > support
>> > values recorded as integers are treated as percentages and divided by
>> > 100
>> > automatically. The phyloXML spec doesn't have this range requirement.
>> > RAxML
>> > scales bootstraps to 100, but PhyML records the raw number of supporting
>> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
>> > replicates). So, I'd prefer to leave the confidence values as they are,
>> > requiring only that they be numeric. Thoughts?
>> >
>> >
>> > Thanks,
>> > Eric
>>
>> 1. One issue is that current_clade.is_terminal() will always be true
>> at that point because current_clade's children haven't been parsed
>> yet. Putting the check in the "process_clade" function (which is
>> called when the closing paren is hit, and therefore all children
>> should have been parsed) should fix this.
>>
>> So, if values_are_confidence and comments_are_confidence are both
>> false and a node label is numeric, it should be treated as confidence,
>> and clade.name should be set to None - is that correct?
>>
>> 2. This should be as simple as removing current lines 123-127.
>>
>> ~Ben
>
>
>
> Thanks. Here's #2:
> https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a
>
> I agree with your assessment of #1, but haven't been able to get it working
> yet. I'm leaving Bug #3407 open for now:
> https://redmine.open-bio.org/issues/3407
>

I think this should do it:

https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63

I also updated the test case to make sure this is working correctly
and changed the default value of comments_are_confidences from True to
False.

If that looks correct, feel free to pull.

~Ben


From eric.talevich at gmail.com  Mon Feb 11 04:20:20 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 10 Feb 2013 23:20:20 -0500
Subject: [Biopython-dev] New Newick parser in Bio.Phylo
In-Reply-To: <CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>
References: <CAMC681mfd3g_w+g_otu2WO6rP48TJSmVUu5ZrN8vc0PPqzzdvg@mail.gmail.com>
	<CAAzEd5Bz4bSDAPGq+Hyz9AYPqfsswPWsPig2LRs7sAKoqOL_Xw@mail.gmail.com>
	<CAMC681kYUxFpcxLKST-1qRuyfj4sWS1kdQSquYJ6en2r=9tTEg@mail.gmail.com>
	<CAAzEd5AAdeM0nyQOSkYwQJMaEPb67s=DfVsae3b-bULs-zkdCQ@mail.gmail.com>
Message-ID: <CAMC681k3M5QspHJAGHxnZRLWzO46QAvBKpGe=0oFn0i9jYf9wQ@mail.gmail.com>

On Sun, Feb 10, 2013 at 11:04 PM, Ben Morris <ben at bendmorris.com> wrote:

> On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris <ben at bendmorris.com> wrote:
> >>
> >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com
> >
> >> wrote:
> >> > Hi Ben,
> >> >
> >> > I've noticed a couple new characteristics of the Newick parser that I
> >> > had
> >> > questions about.
> >> >
> >> > 1. There is no longer a way to tell the parser to treat internal node
> >> > labels
> >> > as confidence values. Lots of files in the wild do record the support
> >> > values
> >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes,
> >> > so
> >> > I'd like to restore this option, and perhaps make it the default. I
> >> > think
> >> > the condition is:
> >> >
> >> > if not (self.values_are_confidence or self.comments_are_confidence or
> >> > current_clade.is_terminal()): # parse confidence from node label
> >> >
> >> > Is there an easy way to add this option to the parser? I'm trying to
> get
> >> > this to work in the "else" clause in parse_tree, where unquoted node
> >> > labels
> >> > are handled.
> >> >
> >> >
> >> > 2. Confidence values are required to be between 0.0 and 1.0. Also,
> >> > support
> >> > values recorded as integers are treated as percentages and divided by
> >> > 100
> >> > automatically. The phyloXML spec doesn't have this range requirement.
> >> > RAxML
> >> > scales bootstraps to 100, but PhyML records the raw number of
> supporting
> >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> >> > replicates). So, I'd prefer to leave the confidence values as they
> are,
> >> > requiring only that they be numeric. Thoughts?
> >> >
> >> >
> >> > Thanks,
> >> > Eric
> >>
> >> 1. One issue is that current_clade.is_terminal() will always be true
> >> at that point because current_clade's children haven't been parsed
> >> yet. Putting the check in the "process_clade" function (which is
> >> called when the closing paren is hit, and therefore all children
> >> should have been parsed) should fix this.
> >>
> >> So, if values_are_confidence and comments_are_confidence are both
> >> false and a node label is numeric, it should be treated as confidence,
> >> and clade.name should be set to None - is that correct?
> >>
> >> 2. This should be as simple as removing current lines 123-127.
> >>
> >> ~Ben
> >
> >
> >
> > Thanks. Here's #2:
> >
> https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a
> >
> > I agree with your assessment of #1, but haven't been able to get it
> working
> > yet. I'm leaving Bug #3407 open for now:
> > https://redmine.open-bio.org/issues/3407
> >
>
> I think this should do it:
>
>
> https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63
>
> I also updated the test case to make sure this is working correctly
> and changed the default value of comments_are_confidences from True to
> False.
>
> If that looks correct, feel free to pull.
>
> ~Ben
>

Works for me, thanks! I cherry-picked it here:
https://github.com/biopython/biopython/commit/f382f550f49f73301663ad949a6c1e40f5d71c0c


From p.j.a.cock at googlemail.com  Mon Feb 11 11:46:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 11 Feb 2013 11:46:20 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
	<CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
Message-ID: <CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>

On Mon, Jan 7, 2013 at 6:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> My only significant concern is for Jython users, since this will also
> mean dropping support for Jython 2.5 (which implements the
> Python 2.5 language). The replacement Jython 2.7 is still only
> at the alpha release stage.

Good news for Jython fans, although originally expected last year,
they have now released a beta of Jython 2.7 (which supports the
same language features as C Python 2.7):

http://fwierzbicki.blogspot.co.uk/2013/02/jython-27-beta1-released.html

Hopefully the Biopython unit tests will all be fine under this... and
if so that is good news for phasing out support of Python 2.5.

Regards,

Peter


From tiagoantao at gmail.com  Mon Feb 11 11:50:10 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 11 Feb 2013 11:50:10 +0000
Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support?
In-Reply-To: <CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>
References: <CAKVJ-_60vgH_T8eqrfYKBScyRVNUApmFM32_4D+uTLABLy19Ng@mail.gmail.com>
	<CAKVJ-_5M55Lky6qg+DohH-QQ1JBBqueCZ1GCzcsYscQ1YSu6UQ@mail.gmail.com>
	<CAKVJ-_7SHN5vPP1y-W2vHYB1TbpCFdtrm+5pnxqajbXQAOxDCw@mail.gmail.com>
Message-ID: <CAA9RGEPv5G3Eh9Gv72XPEsQB3QvUZjr2R9thwU7C4Yg_GNj+cg@mail.gmail.com>

On Mon, Feb 11, 2013 at 11:46 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Good news for Jython fans, although originally expected last year,
> they have now released a beta of Jython 2.7 (which supports the
> same language features as C Python 2.7):


I am going to setup builldbot now for this. I will set my slave first.
If you have any slaves that you want to add this, please tell me.

Tiago


-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From saketkc at gmail.com  Tue Feb 12 09:51:54 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 12 Feb 2013 15:21:54 +0530
Subject: [Biopython-dev] BWA Wrapper
Message-ID: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>

Hi,

I am writing a bwa wrapper for bio-python. I have infact got the "index"
 option working. However I have a concern:

bwa has these options :


bwa index -a bwtsw database.fasta

bwa aln database.fasta short_read.fastq > aln_sa.sai

bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam

bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam

bwa bwasw database.fasta long_read.fastq > aln.sam


If you read the documentation here<http://bio-bwa.sourceforge.net/bwa.shtml>,
you will see that  "-r" is an option with "aln" command as well as the
"samse" command. In the former it is of type INT and in the latter of type
STR. Now I am not sure how can this be taken care of in the wrapper,
because I also plan to implement a checker_function.  One way is to make a
new class, lets say BwaAlignCommand which will take care of all options
inside the "aln" command and separately implement another class say
"BwaSamseCommand", and implement all the options of the "samse" command.
But I am not sure if that is indeed the correct way of addressing the
problem.


Any pointers on this issue ?


Thanks


Saket Choudhary

Undergraduate Student

IIT Bombay,India


From p.j.a.cock at googlemail.com  Tue Feb 12 17:38:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Feb 2013 17:38:46 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>

On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary <saketkc at gmail.com> wrote:
> Hi,
>
> I am writing a bwa wrapper for bio-python. I have infact got the "index"
>  option working. However I have a concern:
>
> bwa has these options :
>
> bwa index -a bwtsw database.fasta
>
> bwa aln database.fasta short_read.fastq > aln_sa.sai
>
> bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
>
> bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
>
> bwa bwasw database.fasta long_read.fastq > aln.sam
>
>
> If you read the documentation here<http://bio-bwa.sourceforge.net/bwa.shtml>,
> you will see that  "-r" is an option with "aln" command as well as the
> "samse" command. In the former it is of type INT and in the latter of type
> STR. Now I am not sure how can this be taken care of in the wrapper,
> because I also plan to implement a checker_function.  One way is to make a
> new class, lets say BwaAlignCommand which will take care of all options
> inside the "aln" command and separately implement another class say
> "BwaSamseCommand", and implement all the options of the "samse" command.
> But I am not sure if that is indeed the correct way of addressing the
> problem.
>
>
> Any pointers on this issue ?

I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and
write a wrapper class for each of them. This would probably fit under the
Bio.Sequencing.Applications namespace.

Peter


From p.j.a.cock at googlemail.com  Tue Feb 12 17:51:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 12 Feb 2013 17:51:15 +0000
Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
Message-ID: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>

Hello all,

Google recently confirmed they will be running Google Summer of Code 2013,
and we (Biopython and the other Bio* projects) would hope to be accepted again
under the Open Bioinformatics Foundation as in previous years:
http://lists.open-bio.org/pipermail/gsoc/2013/000196.html

It would be great to start coming up with potential project ideas, both larger
pieces of work suitable for GSoC but also smaller tasks for other project
students, or 'low hanging fruit' for potential contributors to cut
their teeth on.

See also http://biopython.org/wiki/Active_projects and the ideas list there.

Regards,

Peter


From w.arindrarto at gmail.com  Tue Feb 12 18:29:02 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 12 Feb 2013 19:29:02 +0100
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CADEGkF6s6vbTeutQrx6g_V1yCMFeKKCtouPKc_CgZyf52hd1PA@mail.gmail.com>

Hi everyone,

It's more or less a 'low hanging fruit', but I've been thinking
perhaps it may be useful if we have our own interface to the HMMER3
online service? The corresponding SearchIO parsers may be written for
this as well (they return different formats for which we haven't any
parsers currently).

And I think there are more things being worked on, not yet mentioned
in the wiki:

1. Porting our docs to Sphinx[1]
2. Converting some/all of the print and compare tests to unit tests.
For example, our Bio.Seq's tests are still print and compare tests.

regards,
Bow

[1] See the original feature request here:
https://redmine.open-bio.org/issues/3221
https://redmine.open-bio.org/issues/3220
https://redmine.open-bio.org/issues/3219


From eric.talevich at gmail.com  Tue Feb 12 20:00:11 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 12 Feb 2013 15:00:11 -0500
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CAMC681nJpArQEGLynZ7JP8FsgZ84ZwW+mPK=P1+NqOkc+fr=2w@mail.gmail.com>

On Tue, Feb 12, 2013 at 12:51 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hello all,
>
> Google recently confirmed they will be running Google Summer of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to be accepted
> again
> under the Open Bioinformatics Foundation as in previous years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>
> It would be great to start coming up with potential project ideas, both
> larger
> pieces of work suitable for GSoC but also smaller tasks for other project
> students, or 'low hanging fruit' for potential contributors to cut
> their teeth on.
>

One interesting GSoC project would be to implement support for phylogenetic
placements. The programs pplacer and EPA (part of RAxML) can place sequence
reads from metagenomic samples onto a reference phylogeny:
http://matsen.fhcrc.org/pplacer/
http://sysbio.oxfordjournals.org/content/60/3/291

The output format of those programs has been standardized as something I
suppose we could call the "jplace" format:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031009
http://arxiv.org/abs/1201.3397

It's based on JSON and Newick, with a small extension to Newick that
shouldn't be too hard to support. The GSoC project would be to implement a
parser for this and implement querying as well as integration with the rest
of Bio.Phylo to some reasonable extent. I would be available to mentor this.

In terms of low-hanging fruit, there are some small but important functions
that could be added to Bio.Phylo. My top three: Robinson-Foulds distance,
majority-rules consensus, draw an unrooted tree using Felsenstein's Equal
Daylight algorithm (which starts by computing the layout for a radial tree).

-Eric


From saketkc at gmail.com  Tue Feb 12 20:45:46 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Wed, 13 Feb 2013 02:15:46 +0530
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
References: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <CAEDHeivOFNv9r5FH_F92=oUqXDzXJ5T073V9NxQM6v=L+wCSGw@mail.gmail.com>

Hi,

I was thinking of a Synteny viewer on the lines of
GSV<http://cas-bioinfo.cas.unt.edu/gsv/homepage.php> if
it makes sense .

Saket

On 12 February 2013 23:21, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> Hello all,
>
> Google recently confirmed they will be running Google Summer of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to be accepted
> again
> under the Open Bioinformatics Foundation as in previous years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
>
> It would be great to start coming up with potential project ideas, both
> larger
> pieces of work suitable for GSoC but also smaller tasks for other project
> students, or 'low hanging fruit' for potential contributors to cut
> their teeth on.
>
> See also http://biopython.org/wiki/Active_projects and the ideas list
> there.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From sefakilic at gmail.com  Tue Feb 12 23:18:17 2013
From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=)
Date: Tue, 12 Feb 2013 18:18:17 -0500
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <CAMfh4tc8vsC258o5Zc1WkgOPDeAf23ohFVmrpouCON=kB=b=SA@mail.gmail.com>
References: <CAMfh4tc8vsC258o5Zc1WkgOPDeAf23ohFVmrpouCON=kB=b=SA@mail.gmail.com>
Message-ID: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>

Hi all,

I am working on comparative genomics and I frequently use Motif module of
Biopython. One of the most frequent operations that I do is to build a
motif out of sites and search a sequence to find instances that are similar
to the motif [Bio.Motif._Motif.search_instances()].

The problem is that the sequence that instances are searched is huge.
Mostly it is the genome sequence itself, with its reverse complement. For
example, scanning the E.coli genome + its reverse complement with a motif
of length ~20 takes almost a minute in my machine.

To make it faster, I implemented a C version of it and a Python interface
so that you can call it from Python. It is pretty fast, it takes about ~2.5
seconds.

Current implementation can be found at:

https://github.com/sefakilic/yassi

If anyone is interested and it is appropriate, I would like to modify the
current implementation and integrate it into Biopython.

Thanks!

Sefa Kilic


From mjldehoon at yahoo.com  Wed Feb 13 02:06:33 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 12 Feb 2013 18:06:33 -0800 (PST)
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>
Message-ID: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi Sefa,

Bio.Motif._Motif.search_instances() searches for exact instances of a motif, but it looks like your code searches for motifs based on its PSSM score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or Bio/motifs/_pwm.c)?

Best,
-Michiel.

--- On Tue, 2/12/13, Sefa K?l?? <sefakilic at gmail.com> wrote:

> From: Sefa K?l?? <sefakilic at gmail.com>
> Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
> To: biopython-dev at biopython.org
> Date: Tuesday, February 12, 2013, 6:18 PM
> Hi all,
> 
> I am working on comparative genomics and I frequently use
> Motif module of
> Biopython. One of the most frequent operations that I do is
> to build a
> motif out of sites and search a sequence to find instances
> that are similar
> to the motif [Bio.Motif._Motif.search_instances()].
> 
> The problem is that the sequence that instances are searched
> is huge.
> Mostly it is the genome sequence itself, with its reverse
> complement. For
> example, scanning the E.coli genome + its reverse complement
> with a motif
> of length ~20 takes almost a minute in my machine.
> 
> To make it faster, I implemented a C version of it and a
> Python interface
> so that you can call it from Python. It is pretty fast, it
> takes about ~2.5
> seconds.
> 
> Current implementation can be found at:
> 
> https://github.com/sefakilic/yassi
> 
> If anyone is interested and it is appropriate, I would like
> to modify the
> current implementation and integrate it into Biopython.
> 
> Thanks!
> 
> Sefa Kilic
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From mjldehoon at yahoo.com  Wed Feb 13 02:08:26 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 12 Feb 2013 18:08:26 -0800 (PST)
Subject: [Biopython-dev] Project ideas for GSoC (or other student
	projects)
In-Reply-To: <CAKVJ-_5iGTERnUG58E5cBnDJxFwZsYrxDYVWct-mpERLsCy-zw@mail.gmail.com>
Message-ID: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com>

It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC?

Best,
-Michiel.

--- On Tue, 2/12/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: [Biopython-dev] Project ideas for GSoC (or other student projects)
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Tuesday, February 12, 2013, 12:51 PM
> Hello all,
> 
> Google recently confirmed they will be running Google Summer
> of Code 2013,
> and we (Biopython and the other Bio* projects) would hope to
> be accepted again
> under the Open Bioinformatics Foundation as in previous
> years:
> http://lists.open-bio.org/pipermail/gsoc/2013/000196.html
> 
> It would be great to start coming up with potential project
> ideas, both larger
> pieces of work suitable for GSoC but also smaller tasks for
> other project
> students, or 'low hanging fruit' for potential contributors
> to cut
> their teeth on.
> 
> See also http://biopython.org/wiki/Active_projects
> and the ideas list there.
> 
> Regards,
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From sefakilic at gmail.com  Wed Feb 13 02:40:12 2013
From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=)
Date: Tue, 12 Feb 2013 21:40:12 -0500
Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
In-Reply-To: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMfh4tdAaH01z-DyHEXaHjZy92Q8jAffje5_e4DBNS7WFibWFA@mail.gmail.com>
	<1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAMfh4teE2cK7HJP3KOVSq12erWmgMbd6nSkHvkD22YM4eL88Yg@mail.gmail.com>

Hi Michiel,

Thanks for the reply. It seems that _pwm.c does the same thing, as you
said. I missed that part of the code. However, it seems that it is not
mentioned in the tutorial and it might be useful to mention it there.

Anyway, it was a good practice for re-implementing it. Thank you!

Sefa Kilic


On Tue, Feb 12, 2013 at 9:06 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

> Hi Sefa,
>
> Bio.Motif._Motif.search_instances() searches for exact instances of a
> motif, but it looks like your code searches for motifs based on its PSSM
> score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or
> Bio/motifs/_pwm.c)?
>
> Best,
> -Michiel.
>
> --- On Tue, 2/12/13, Sefa K?l?? <sefakilic at gmail.com> wrote:
>
> > From: Sefa K?l?? <sefakilic at gmail.com>
> > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
> > To: biopython-dev at biopython.org
> > Date: Tuesday, February 12, 2013, 6:18 PM
> > Hi all,
> >
> > I am working on comparative genomics and I frequently use
> > Motif module of
> > Biopython. One of the most frequent operations that I do is
> > to build a
> > motif out of sites and search a sequence to find instances
> > that are similar
> > to the motif [Bio.Motif._Motif.search_instances()].
> >
> > The problem is that the sequence that instances are searched
> > is huge.
> > Mostly it is the genome sequence itself, with its reverse
> > complement. For
> > example, scanning the E.coli genome + its reverse complement
> > with a motif
> > of length ~20 takes almost a minute in my machine.
> >
> > To make it faster, I implemented a C version of it and a
> > Python interface
> > so that you can call it from Python. It is pretty fast, it
> > takes about ~2.5
> > seconds.
> >
> > Current implementation can be found at:
> >
> > https://github.com/sefakilic/yassi
> >
> > If anyone is interested and it is appropriate, I would like
> > to modify the
> > current implementation and integrate it into Biopython.
> >
> > Thanks!
> >
> > Sefa Kilic
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
>


From saketkc at gmail.com  Thu Feb 14 16:02:21 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Thu, 14 Feb 2013 21:32:21 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
Message-ID: <CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>

Theres one more issue that I have run into . Consider the following command
, the outout generated is written by piping it to a file called aln_sa.sai,


bwa aln database.fasta short_read.fastq > aln_sa.sai

Now if we look into the _call method
here<https://github.com/saketkc/biopython/blob/master/Bio/Application/__init__.py#L372>
,
it takes as its inout a boolean for stdout. So should I modify this so that
it can take 'stdout' as on opened file  instance which I can invoke while
unvoking my BwaAlnCommandLine functions as follwos:

a=BwaAlnCommandLine()
b=a(stdout=open("aln_sa.sai","wb"))


On 12 February 2013 23:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Hi,
> >
> > I am writing a bwa wrapper for bio-python. I have infact got the "index"
> >  option working. However I have a concern:
> >
> > bwa has these options :
> >
> > bwa index -a bwtsw database.fasta
> >
> > bwa aln database.fasta short_read.fastq > aln_sa.sai
> >
> > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
> >
> > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq >
> aln.sam
> >
> > bwa bwasw database.fasta long_read.fastq > aln.sam
> >
> >
> > If you read the documentation here<
> http://bio-bwa.sourceforge.net/bwa.shtml>,
> > you will see that  "-r" is an option with "aln" command as well as the
> > "samse" command. In the former it is of type INT and in the latter of
> type
> > STR. Now I am not sure how can this be taken care of in the wrapper,
> > because I also plan to implement a checker_function.  One way is to make
> a
> > new class, lets say BwaAlignCommand which will take care of all options
> > inside the "aln" command and separately implement another class say
> > "BwaSamseCommand", and implement all the options of the "samse" command.
> > But I am not sure if that is indeed the correct way of addressing the
> > problem.
> >
> >
> > Any pointers on this issue ?
>
> I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and
> write a wrapper class for each of them. This would probably fit under the
> Bio.Sequencing.Applications namespace.
>
> Peter
>


From p.j.a.cock at googlemail.com  Thu Feb 14 16:19:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 16:19:59 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
Message-ID: <CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>

On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> Theres one more issue that I have run into . Consider the following command
> , the outout generated is written by piping it to a file called aln_sa.sai,
>
> bwa aln database.fasta short_read.fastq > aln_sa.sai
>
> Now if we look into the _call method here , it takes as its inout a boolean
> for stdout. So should I modify this so that it can take 'stdout' as on
> opened file  instance which I can invoke while unvoking my BwaAlnCommandLine
> functions as follwos:
>
> a=BwaAlnCommandLine()
> b=a(stdout=open("aln_sa.sai","wb"))

Is that possible?

For complex use of subprocess and pipes, we've previously recommend
the user handle this explicitly themselves, just use str() on the command
line wrapper object to get 'bwa aln database.fasta short_read.fastq' in this
case. There are some examples in the Tutorial with (multiple sequence)
alignment tools.

Peter


From saketkc at gmail.com  Thu Feb 14 17:04:04 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Thu, 14 Feb 2013 22:34:04 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
Message-ID: <CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>

I was thinking of adding one more parameter to the _call function, lets say
'stdout_to_filepath'.
If this is set then I add one more if condition
here<https://github.com/saketkc/biopython/blob/master/Bio/Application/__init__.py#L415>
to set the stdout as


stdout_arg = open(stdout_to_filepath, "w")

I tried it and it did work, but I am not sure if it  this standard can be
incorporated in the biopython codebase ?

Thanks

Saket

On 14 February 2013 21:49, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Theres one more issue that I have run into . Consider the following
> command
> > , the outout generated is written by piping it to a file called
> aln_sa.sai,
> >
> > bwa aln database.fasta short_read.fastq > aln_sa.sai
> >
> > Now if we look into the _call method here , it takes as its inout a
> boolean
> > for stdout. So should I modify this so that it can take 'stdout' as on
> > opened file  instance which I can invoke while unvoking my
> BwaAlnCommandLine
> > functions as follwos:
> >
> > a=BwaAlnCommandLine()
> > b=a(stdout=open("aln_sa.sai","wb"))
>
> Is that possible?
>
> For complex use of subprocess and pipes, we've previously recommend
> the user handle this explicitly themselves, just use str() on the command
> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in
> this
> case. There are some examples in the Tutorial with (multiple sequence)
> alignment tools.
>
> Peter
>


From saketkc at gmail.com  Thu Feb 14 18:52:31 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Fri, 15 Feb 2013 00:22:31 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
Message-ID: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>

In short , am I allowed to play with this extra parameter thing as per
the code standards of biopython ?

On 14 February 2013 22:34, Saket Choudhary <saketkc at gmail.com> wrote:
> I was thinking of adding one more parameter to the _call function, lets say
> 'stdout_to_filepath'.
> If this is set then I add one more if condition here  to set the stdout as
>
>
> stdout_arg = open(stdout_to_filepath, "w")
>
> I tried it and it did work, but I am not sure if it  this standard can be
> incorporated in the biopython codebase ?
>
> Thanks
>
> Saket
>
>
> On 14 February 2013 21:49, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary <saketkc at gmail.com>
>> wrote:
>> > Theres one more issue that I have run into . Consider the following
>> > command
>> > , the outout generated is written by piping it to a file called
>> > aln_sa.sai,
>> >
>> > bwa aln database.fasta short_read.fastq > aln_sa.sai
>> >
>> > Now if we look into the _call method here , it takes as its inout a
>> > boolean
>> > for stdout. So should I modify this so that it can take 'stdout' as on
>> > opened file  instance which I can invoke while unvoking my
>> > BwaAlnCommandLine
>> > functions as follwos:
>> >
>> > a=BwaAlnCommandLine()
>> > b=a(stdout=open("aln_sa.sai","wb"))
>>
>> Is that possible?
>>
>> For complex use of subprocess and pipes, we've previously recommend
>> the user handle this explicitly themselves, just use str() on the command
>> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in
>> this
>> case. There are some examples in the Tutorial with (multiple sequence)
>> alignment tools.
>>
>> Peter
>
>


From arklenna at gmail.com  Thu Feb 14 19:43:18 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 14 Feb 2013 14:43:18 -0500
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
Message-ID: <CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>

>
> On 14 February 2013 22:34, Saket Choudhary <saketkc at gmail.com> wrote:
> > I was thinking of adding one more parameter to the _call function, lets
> say
> > 'stdout_to_filepath'.
> > If this is set then I add one more if condition here  to set the stdout
> as
> >
> >
> > stdout_arg = open(stdout_to_filepath, "w")
> >
> >
>


What's wrong with accepting the stdout string that the current
implementation provides and explicitly writing it to your file?

Cheers,

Lenna


From arklenna at gmail.com  Thu Feb 14 19:52:54 2013
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 14 Feb 2013 14:52:54 -0500
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tL7caRXcM1BmW-ksphdDP6E=J0bYE-x-rmYVbHtV+CCsA@mail.gmail.com>
	<1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>

On Sat, Feb 9, 2013 at 10:55 PM, Michiel de Hoon <mjldehoon at yahoo.com>wrote:

>
>
> Currently lex.yy.c contains lots of code that is generated automatically
> by lex but is not actually needed for the mmCIF parser. I was thinking to
> remove those parts, and to clean up the remainder so that the code is
> understandable (allowing us to fix any bugs, or to convert it to pure
> Python).
>

Whoops, failed to reply all. Sorry for the double email, Michiel.

---

But generated C is by definition not understandable or debuggable. The only
function of lex.yy.c is to tokenize the mmCIF input.

All of the communication to Python is handled by MMCIFlexmodule.c, which is
70 lines and a header with 3 statements. In parallel to the PLY version, I
rewrote the C to be object-oriented, which pushed it to 101 lines.

Cheers,

Lenna


From p.j.a.cock at googlemail.com  Thu Feb 14 20:33:37 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 20:33:37 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CALfq9tKJPk0bYuUnKUbW70Xhnu1HY8r2o80RpTeKAAq1vZh7yg@mail.gmail.com>
Message-ID: <CAKVJ-_6q3fHd4o-M1LKLxwr7Ab-6w5LW_f5=fbcCLGbUdw6_MQ@mail.gmail.com>

On Thu, Feb 14, 2013 at 7:43 PM, Lenna Peterson <arklenna at gmail.com> wrote:
>
> What's wrong with accepting the stdout string that the current
> implementation provides and explicitly writing it to your file?
>

That is only a good idea for short output, say up to a few kb.

With bwa (and samtools etc), quite often the output defaults
to (or only goes to) stdout - and can be very large. It can also
be binary rather than text, which is an additional complication
with Python 2 vs Python 3 (byte strings versus unicode strings).

See http://bio-bwa.sourceforge.net/bwa.shtml

Peter


From p.j.a.cock at googlemail.com  Thu Feb 14 20:38:59 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 20:38:59 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
Message-ID: <CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>

On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
> In short , am I allowed to play with this extra parameter thing as per
> the code standards of biopython ?

If you can come up with a nice extension to the current interface
for the application wrapper's __call__ method, which is backward
compatible, then we could be convinced.

One idea would be stdout=True and stderr=True are treated as
subprocess.PIPE (as now), and a false value would continue
to mean don't capture the output (send it to /dev/null), but a
(non-empty) string argument could be interpreted as a filename
instead. You might be able to accept a handle, but I'm not sure
if all Python handles would work or not here - it requires some
careful cross platform testing.

Peter


From mjldehoon at yahoo.com  Sat Feb 16 02:46:00 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 15 Feb 2013 18:46:00 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>
Message-ID: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>

Hi Lenna,

Maybe we are confusing each other..
I am looking for a solution that (a) doesn't introduce new dependencies, (b) is pure-Python so it can run on Jython, and (c) if that is not possible and we do need to use C, then that C code should be understandable so that it can be debugged if necessary.

I was suggesting to clean up lex.yy.c so that we can at least achieve (c). The alternative is to start from the PLY-based parser and remove the dependency on PLY.

Best,
-Michiel.

--- On Thu, 2/14/13, Lenna Peterson <arklenna at gmail.com> wrote:

> From: Lenna Peterson <arklenna at gmail.com>
> Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
> To: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, February 14, 2013, 2:52 PM
> On Sat, Feb 9, 2013 at 10:55 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>wrote:
> 
> >
> >
> > Currently lex.yy.c contains lots of code that is
> generated automatically
> > by lex but is not actually needed for the mmCIF parser.
> I was thinking to
> > remove those parts, and to clean up the remainder so
> that the code is
> > understandable (allowing us to fix any bugs, or to
> convert it to pure
> > Python).
> >
> 
> Whoops, failed to reply all. Sorry for the double email,
> Michiel.
> 
> ---
> 
> But generated C is by definition not understandable or
> debuggable. The only
> function of lex.yy.c is to tokenize the mmCIF input.
> 
> All of the communication to Python is handled by
> MMCIFlexmodule.c, which is
> 70 lines and a header with 3 statements. In parallel to the
> PLY version, I
> rewrote the C to be object-oriented, which pushed it to 101
> lines.
> 
> Cheers,
> 
> Lenna
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From saketkc at gmail.com  Sat Feb 16 07:08:46 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 16 Feb 2013 12:38:46 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
Message-ID: <CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>

On 15 February 2013 02:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>> In short , am I allowed to play with this extra parameter thing as per
>> the code standards of biopython ?
>
> If you can come up with a nice extension to the current interface
> for the application wrapper's __call__ method, which is backward
> compatible, then we could be convinced.
>
> One idea would be stdout=True and stderr=True are treated as
> subprocess.PIPE (as now), and a false value would continue
> to mean don't capture the output (send it to /dev/null), but a
> (non-empty) string argument could be interpreted as a filename
> instead. You might be able to accept a handle, but I'm not sure
> if all Python handles would work or not here - it requires some
> careful cross platform testing.
>
> Peter


HI Everyone,

I have pushed the wrapper to
https://github.com/saketkc/biopython/tree/bwa_wrapper

Should I send a pull request ? I am in the middle of my University
mid-semester examinations and hence this is not completely tested. I
need to perform some more tests with more parameters after I am done
with my examinations the next week.


I would like to hear comments or have it code-reviewed, since this is
the first time I am contributing to biopython and I might have missed
out on some of the coding practices being followed.

Thanks

Saket


From p.j.a.cock at googlemail.com  Sat Feb 16 10:42:50 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 16 Feb 2013 10:42:50 +0000
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>
References: <CALfq9tKCWrQQRAB60ZAfP0Ld7jov9CB1FXT8G1x2gBg8WWDYfA@mail.gmail.com>
	<1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>

On Sat, Feb 16, 2013 at 2:46 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Lenna,
>
> Maybe we are confusing each other..
> I am looking for a solution that (a) doesn't introduce new dependencies,

+1

> (b) is pure-Python so it can run on Jython,

+1 And on PyPy (which to me is more interesting that Jython) etc.

> and (c) if that is not possible and we do need to use C, then that C code
> should be understandable so that it can be debugged if necessary.
>
> I was suggesting to clean up lex.yy.c so that we can at least achieve (c).

This does mean we essentially give up on ever regenerating the lex.yy.c
file every again - could that be a problem if Flex itself changes much?

> The alternative is to start from the PLY-based parser and remove the
> dependency on PLY.
>
> Best,
> -Michiel.

Peter


From saketkc at gmail.com  Sat Feb 16 11:48:43 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Sat, 16 Feb 2013 17:18:43 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
Message-ID: <CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>

Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now :

https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23

On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
> On 15 February 2013 02:08, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>>> In short , am I allowed to play with this extra parameter thing as per
>>> the code standards of biopython ?
>>
>> If you can come up with a nice extension to the current interface
>> for the application wrapper's __call__ method, which is backward
>> compatible, then we could be convinced.
>>
>> One idea would be stdout=True and stderr=True are treated as
>> subprocess.PIPE (as now), and a false value would continue
>> to mean don't capture the output (send it to /dev/null), but a
>> (non-empty) string argument could be interpreted as a filename
>> instead. You might be able to accept a handle, but I'm not sure
>> if all Python handles would work or not here - it requires some
>> careful cross platform testing.
>>
>> Peter
>
>
> HI Everyone,
>
> I have pushed the wrapper to
> https://github.com/saketkc/biopython/tree/bwa_wrapper
>
> Should I send a pull request ? I am in the middle of my University
> mid-semester examinations and hence this is not completely tested. I
> need to perform some more tests with more parameters after I am done
> with my examinations the next week.
>
>
> I would like to hear comments or have it code-reviewed, since this is
> the first time I am contributing to biopython and I might have missed
> out on some of the coding practices being followed.
>
> Thanks
>
> Saket


From mjldehoon at yahoo.com  Sat Feb 16 12:09:22 2013
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 16 Feb 2013 04:09:22 -0800 (PST)
Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619)
In-Reply-To: <CAKVJ-_7Py6yQm=MGnjoa+cE596a7z48HXpukg3XWzkRPjsF-Bg@mail.gmail.com>
Message-ID: <1361016562.57361.YahooMailClassic@web164001.mail.gq1.yahoo.com>

--- On Sat, 2/16/13, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> This does mean we essentially give up on ever regenerating
> the lex.yy.c file every again - could that be a problem if Flex
> itself changes much?

The lex.yy.c file was generated by Flex, but otherwise it's independent of it. It doesn't #include Flex's header files, and we don't link it to the Flex libraries. So we can do with it whatever we want.

We may find though that a stripped-down version of lex.yy.c will be rather trivial, and converting it to Python may be straightforward.

Best,
-Michiel.


From tiagoantao at gmail.com  Mon Feb 18 13:57:15 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 18 Feb 2013 13:57:15 +0000
Subject: [Biopython-dev] Support for BioSQL on Java/Jython
Message-ID: <CAA9RGEOfOfWRihvJmEn9heK3g3UHdF605sQpea9NKF1AFKwqwQ@mail.gmail.com>

Dear All,

I have implemented a set of changes to allow for BioSQL support in Jython.
Features:

1. Totally transparent in terms of API. Indeed the existing tests on BioSQL
work out of the box

2. MySQL and PostgreSQL.

3. No sqllite3 support. This library (standard in C-Python) does not exist
in Jython


You can find the changes here:
https://github.com/tiagoantao/biopython/commits/master
(top two commits)

Comments appreciated. If there is no opposition, I will commit these soon
(after incorporating feedback) to the main repo.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Mon Feb 18 17:44:30 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 18 Feb 2013 17:44:30 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
Message-ID: <CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>

> On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
>> HI Everyone,
>>
>> I have pushed the wrapper to
>> https://github.com/saketkc/biopython/tree/bwa_wrapper
>>
>> Should I send a pull request ? I am in the middle of my University
>> mid-semester examinations and hence this is not completely tested. I
>> need to perform some more tests with more parameters after I am done
>> with my examinations the next week.
>>
>>
>> I would like to hear comments or have it code-reviewed, since this is
>> the first time I am contributing to biopython and I might have missed
>> out on some of the coding practices being followed.
>>
>> Thanks
>>
>> Saket


On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary <saketkc at gmail.com> wrote:
> Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now :
>
> https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23
>

This looks sensible. I think if we are going to extend the __call__ interface
to allow stdout to be a filename, then we should do the same for stderr
as well. Also this needs to be explained in the docstring (and perhaps
also the Tutorial somewhere).

Separately some simple unit tests for the wrapper would be good too
(which can be as much work as the original code itself), and would
be beneficial for cross-platform testing.

Thanks,

Peter


From tiagoantao at gmail.com  Tue Feb 19 11:42:22 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 19 Feb 2013 11:42:22 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
Message-ID: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>

Hi,

I had a cursory look at the documentation for installing Biopython under
Jython and there seems to be none. If it is OK, I would extend the
documentation to cover Jython

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Tue Feb 19 12:01:15 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 12:01:15 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
Message-ID: <CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>

On Tue, Feb 19, 2013 at 11:42 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I had a cursory look at the documentation for installing Biopython under
> Jython and there seems to be none. If it is OK, I would extend the
> documentation to cover Jython

I just use the usual mantra:

jython setup.py build
jython setup.py test
jython setup.py install

Perhaps there are pitfalls I'm not aware of?

(Updating Doc/install/Installation.tex is still a good idea though)

Peter


From tiagoantao at gmail.com  Tue Feb 19 12:02:18 2013
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 19 Feb 2013 12:02:18 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
	<CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
Message-ID: <CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>

On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Perhaps there are pitfalls I'm not aware of?
>
>
JDBC driver for the new BioSQL code ;)

Tiago


-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Tue Feb 19 12:07:39 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 12:07:39 +0000
Subject: [Biopython-dev] Jython (non-existing) docs
In-Reply-To: <CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>
References: <CAA9RGEM3PE_CfEONjr-F_SUe4t+J1oEo2fG-2dnGFDRNma7pjg@mail.gmail.com>
	<CAKVJ-_5dTKO9=ZXbFWtKeUN1c7Dpr1wveLBY1Uh=sOJtbfOQ_w@mail.gmail.com>
	<CAA9RGENVf=5yTr-k60iq-5L2EbYaVdwSjB6=Lky6cTPbtZK6wA@mail.gmail.com>
Message-ID: <CAKVJ-_6grdoyjBJFi-R6pt69M2A8uduZM8gNMPgDCP=2N3wnSQ@mail.gmail.com>

On Tue, Feb 19, 2013 at 12:02 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>
> On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Perhaps there are pitfalls I'm not aware of?
>>
>
> JDBC driver for the new BioSQL code ;)
>
> Tiago

Good answer :)

Yes, advice on installing optional dependencies like that makes sense.

Peter


From saketkc at gmail.com  Tue Feb 19 13:15:56 2013
From: saketkc at gmail.com (Saket Choudhary)
Date: Tue, 19 Feb 2013 18:45:56 +0530
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
Message-ID: <CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>

On 18 February 2013 23:14, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> > On 16 February 2013 12:38, Saket Choudhary <saketkc at gmail.com> wrote:
> >> HI Everyone,
> >>
> >> I have pushed the wrapper to
> >> https://github.com/saketkc/biopython/tree/bwa_wrapper
> >>
> >> Should I send a pull request ? I am in the middle of my University
> >> mid-semester examinations and hence this is not completely tested. I
> >> need to perform some more tests with more parameters after I am done
> >> with my examinations the next week.
> >>
> >>
> >> I would like to hear comments or have it code-reviewed, since this is
> >> the first time I am contributing to biopython and I might have missed
> >> out on some of the coding practices being followed.
> >>
> >> Thanks
> >>
> >> Saket
>
>
> On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary <saketkc at gmail.com>
> wrote:
> > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed
> now :
> >
> >
> https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23
> >
>
> This looks sensible. I think if we are going to extend the __call__
> interface
> to allow stdout to be a filename, then we should do the same for stderr
> as well. Also this needs to be explained in the docstring (and perhaps
> also the Tutorial somewhere).
>
> Separately some simple unit tests for the wrapper would be good too
> (which can be as much work as the original code itself), and would
> be beneficial for cross-platform testing.
>
> Thanks,
>
> Peter
>

Thanks Peter.

I will add that. Any pointers to what would be a good reference test_aba.py
file in Tests/ directory for writing unit tests for this ?

I have worked on BDD before but Unit Tests are new for me, so it may take
some time.I plan to finish it the coming week once my university
examinations are done

Thanks

Saket


From p.j.a.cock at googlemail.com  Tue Feb 19 14:25:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 19 Feb 2013 14:25:40 +0000
Subject: [Biopython-dev] BWA Wrapper
In-Reply-To: <CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
References: <CAEDHeitb1cnbY169aWk9BmuZNAHMUJu6WuER6FR8Yc=nEfypeQ@mail.gmail.com>
	<CAKVJ-_7zOHYOz3L=SgzQZSuojPvGfysic9ef397yoB0XsQh_1w@mail.gmail.com>
	<CAEDHeivB1pBxabNHmNDtV-hKvzfQcQv2MC5kVtrEYrs86RYMHA@mail.gmail.com>
	<CAKVJ-_41osE0yVUWc7=b3Z3jq=hPB+QiCuoCZau-Zp8gTPnsvQ@mail.gmail.com>
	<CAEDHeivHNHEKDbPjMJf-xAntYULVFvU1+wzJD_ebGLvUBe31CQ@mail.gmail.com>
	<CAEDHeitrezkj4nrc4W4DFy0OyuVtWtdczFDJd+0e5=9GzB6yYg@mail.gmail.com>
	<CAKVJ-_7MRu5tSSZ+S0dsOzpc=ojyC0oOxmg4e+A1Or5phfTz8w@mail.gmail.com>
	<CAEDHeivazvXaaCo2ZFP8sR3BOfv5tgEj6Bf7f_UZ6g_x-v9RZQ@mail.gmail.com>
	<CAEDHeisW+ymrW-Bb5XcgAm1mSPbDWMY7evhvyspVFxzTptW2Pg@mail.gmail.com>
	<CAKVJ-_5vn3p_iiOYTpnMs6W8qF-3DFwgzO2-HqL6=W-mHByaTA@mail.gmail.com>
	<CAEDHeitQA0JQ=vuRKEohgRwkRxLqs+NuV9dAYguhrc1-eYnCsQ@mail.gmail.com>
Message-ID: <CAKVJ-_40jy-NS5csf55Q_ds-DdGRpaH18rZxZXdcypNEijpcJA@mail.gmail.com>

On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary <saketkc at gmail.com> wrote:
>
> Thanks Peter.
>
> I will add that. Any pointers to what would be a good reference test_aba.py
> file in Tests/ directory for writing unit tests for this ?
>
> I have worked on BDD before but Unit Tests are new for me, so it may take
> some time.I plan to finish it the coming week once my university
> examinations are done
>
> Thanks
>
> Saket

There's a chapter in the Tutorial about our test framework. In this
case existing command line tool wrappers are the best reference,
e.g. test_Emboss.py or test_Muscle.py

Also if you want to use doctests and have them included in the
test suite, add the module to the list in Tests/run_tests.py - however
this does not handle optional dependencies (other than NumPy).
Therefore all the application wrapper doctests to date have carefully
avoided actually invoking the command line - and instead most
print the string representation instead. This allows us to check
the example use cases should run (and catches silly errors in
the examples like a typo in an argument name).

Thanks,

Peter


From p.j.a.cock at googlemail.com  Sun Feb 24 12:42:47 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 24 Feb 2013 12:42:47 +0000
Subject: [Biopython-dev] [Biopython] blastdbcmd
In-Reply-To: <5127B8D1.5090705@usp.br>
References: <5127A44E.2030403@usp.br>
	<CAKVJ-_4LosCsm4940My0Y5O6L45a-NqxUa6sziUK-wkKm51mJA@mail.gmail.com>
	<5127B8D1.5090705@usp.br>
Message-ID: <CAKVJ-_69XXfAZpqwMZAPoDmR1Wh=SL54-=uNUysaVK3tWiPt6w@mail.gmail.com>

Great - let us know on the list if you have any questions.

Peter

On Fri, Feb 22, 2013 at 6:28 PM, Frederico Moraes Ferreira
<ferreirafm at usp.br> wrote:
> Hi Peter,
> Yes, I meant a Biopython Blast application for blastdbcmd.
> Thanks for the link.
> Best,
> Fred
>
> Em 22-02-2013 14:23, Peter Cock escreveu:
>
>> On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira
>> <ferreirafm at usp.br>  wrote:
>>>
>>> Hi there Biopythoneers,
>>> As long as I know, there isnt't a blastdbcmd submodule into Biopython.
>>> So,
>>> I've been writing the blast matched sequences ID's to a file, fetching
>>> them
>>> all with a subprocess and reading with SeqIO afterwards. In some cases,
>>> however, I miss a blastdbcmd parser to make things easy. How do you guys
>>> are
>>> dealing with this?
>>> Best,
>>> Fred
>>
>> Are you talking about a command line wrapper for blastdbcmd, to go in
>> Bio/Blast/Applications.py? That seems a good idea.
>>
>> Personally I find the blastdbcmd tool quite handicapped due to the
>> introduction of generated sequence identifiers, and rarely use it:
>>
>> http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html
>>
>> Instead I would use Bio.SeqIO to index the FASTA file used for the
>> database, and get the sequences that way.
>>
>> Peter
>>
>
> --
> Dr. Frederico Moraes Ferreira
> Universidade de S?o Paulo
> Faculdade de Medicina
> Instituto do Cora??o - Imunologia
> Av. Dr. En?as de Carvalho Aguiar, 44
> 05403-900     S?o Paulo - SP
> Brasil
>


From anaryin at gmail.com  Tue Feb 26 16:14:52 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Feb 2013 17:14:52 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
Message-ID: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>

Hello all,

I've modified slightly PDBIO to allow writing of any object of our PDB
representation. Right now it accepts only Models or Structures (IIRC) and
sometimes it's useful to have only a chain or a residue written. I've added
a layer of code that builds the "missing" parts using StructureBuilder.

I pushed it to a branch in my github account:

https://github.com/JoaoRodrigues/biopython/tree/pdbio

I've been using it for a while now so often I completely forgot about it..
Only noticed when I changed computers and the version there could not
handle this. So I guess it should be solid enough.

Cheers,

Jo?o


From eric.talevich at gmail.com  Tue Feb 26 19:35:56 2013
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 26 Feb 2013 14:35:56 -0500
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
Message-ID: <CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>

On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> Hello all,
>
> I've modified slightly PDBIO to allow writing of any object of our PDB
> representation. Right now it accepts only Models or Structures (IIRC) and
> sometimes it's useful to have only a chain or a residue written. I've added
> a layer of code that builds the "missing" parts using StructureBuilder.
>
> I pushed it to a branch in my github account:
>
> https://github.com/JoaoRodrigues/biopython/tree/pdbio
>
> I've been using it for a while now so often I completely forgot about it..
> Only noticed when I changed computers and the version there could not
> handle this. So I guess it should be solid enough.
>
>
Awesome. I support the idea. Could you do a pull request, so TravisCI runs
the tests automatically?

-Eric


From anaryin at gmail.com  Tue Feb 26 19:39:08 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 26 Feb 2013 20:39:08 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
Message-ID: <CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>

There's some discussion about some implementation details:

https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4

What does everyone else think?

Thanks for the input btw. Should I make a test too? I reckon it would be a
good thing to add?


2013/2/26 Eric Talevich <eric.talevich at gmail.com>

> On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com>wrote:
>
>> Hello all,
>>
>> I've modified slightly PDBIO to allow writing of any object of our PDB
>> representation. Right now it accepts only Models or Structures (IIRC) and
>> sometimes it's useful to have only a chain or a residue written. I've
>> added
>> a layer of code that builds the "missing" parts using StructureBuilder.
>>
>> I pushed it to a branch in my github account:
>>
>> https://github.com/JoaoRodrigues/biopython/tree/pdbio
>>
>> I've been using it for a while now so often I completely forgot about it..
>> Only noticed when I changed computers and the version there could not
>> handle this. So I guess it should be solid enough.
>>
>>
> Awesome. I support the idea. Could you do a pull request, so TravisCI runs
> the tests automatically?
>
> -Eric
>


From davidjosephcain at gmail.com  Tue Feb 26 19:47:32 2013
From: davidjosephcain at gmail.com (David Cain)
Date: Tue, 26 Feb 2013 14:47:32 -0500
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
Message-ID: <CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>

I failed to mention this sooner, but I'm an enthusiastic proponent of what
you've done. Your new set_structure() would  be immensely helpful to me, as
I've been using some workarounds to achieve the functionality you've
implemented.

Personally, I think a unit test would be really helpful in ensuring
chain-less residues and the like will save appropriately.


David Cain
+1 (339) 222 4452


On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:

> There's some discussion about some implementation details:
>
>
> https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4
>
> What does everyone else think?
>
> Thanks for the input btw. Should I make a test too? I reckon it would be a
> good thing to add?
>
>
> 2013/2/26 Eric Talevich <eric.talevich at gmail.com>
>
> > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues <anaryin at gmail.com
> >wrote:
> >
> >> Hello all,
> >>
> >> I've modified slightly PDBIO to allow writing of any object of our PDB
> >> representation. Right now it accepts only Models or Structures (IIRC)
> and
> >> sometimes it's useful to have only a chain or a residue written. I've
> >> added
> >> a layer of code that builds the "missing" parts using StructureBuilder.
> >>
> >> I pushed it to a branch in my github account:
> >>
> >> https://github.com/JoaoRodrigues/biopython/tree/pdbio
> >>
> >> I've been using it for a while now so often I completely forgot about
> it..
> >> Only noticed when I changed computers and the version there could not
> >> handle this. So I guess it should be solid enough.
> >>
> >>
> > Awesome. I support the idea. Could you do a pull request, so TravisCI
> runs
> > the tests automatically?
> >
> > -Eric
> >
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Tue Feb 26 21:45:00 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Feb 2013 21:45:00 +0000
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
Message-ID: <CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>

On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
>> Should I make a test too? I reckon it would be a good thing to add?
>>

On Tue, Feb 26, 2013 at 7:47 PM, David Cain <davidjosephcain at gmail.com> wrote:
> ...
>
> Personally, I think a unit test would be really helpful in ensuring
> chain-less residues and the like will save appropriately.

Absolutely, +1 on adding a test or two for this new functionality.

And if there is anywhere in the Tutorial or docstrings which would
benefit from mentioning this too, could you update that too please?

Thanks,

Peter


From anaryin at gmail.com  Wed Feb 27 09:25:26 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 27 Feb 2013 10:25:26 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
Message-ID: <CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>

I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ).

The whitespace issue is solved I think. What are the rules exactly? Sorry
if I'm at a bit of a loss here..

I added tests for the save functions (a full structure and a single
residue) as well as one for the chainless residue. I added the suggestion
from David to keep the id in the parent if there is one.

I reverted the commit and added the same (- the whitespace) and another
with tests. If it looks ok, I'll make a pull request (if I can find the
button, never did that..).

Cheers,

Jo?o


2013/2/26 Peter Cock <p.j.a.cock at googlemail.com>

> On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> >> Should I make a test too? I reckon it would be a good thing to add?
> >>
>
> On Tue, Feb 26, 2013 at 7:47 PM, David Cain <davidjosephcain at gmail.com>
> wrote:
> > ...
> >
> > Personally, I think a unit test would be really helpful in ensuring
> > chain-less residues and the like will save appropriately.
>
> Absolutely, +1 on adding a test or two for this new functionality.
>
> And if there is anywhere in the Tutorial or docstrings which would
> benefit from mentioning this too, could you update that too please?
>
> Thanks,
>
> Peter
>


From p.j.a.cock at googlemail.com  Wed Feb 27 16:34:42 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 16:34:42 +0000
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
	<CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
Message-ID: <CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>

On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ).
>
> The whitespace issue is solved I think. What are the rules exactly? Sorry if
> I'm at a bit of a loss here..

PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines

(Currently an aspiration for the Biopython code, rather than a strict
requirement)

> I added tests for the save functions (a full structure and a single residue)
> as well as one for the chainless residue. I added the suggestion from David
> to keep the id in the parent if there is one.
>
> I reverted the commit and added the same (- the whitespace) and another with
> tests. If it looks ok, I'll make a pull request (if I can find the button,
> never did that..).

GitHub have made it quite easy, but the first time is always the hardest.
Good luck, and if you get stuck we can try to help or just pull the commits
in directly from your fork.

Thanks,

Peter


From anaryin at gmail.com  Wed Feb 27 16:41:45 2013
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 27 Feb 2013 17:41:45 +0100
Subject: [Biopython-dev] Slight suggestion for PDBIO
In-Reply-To: <CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>
References: <CAJ9sUYOjsWE-E2YbO_Gx7ApK8jnp=WeaivZY7udndkWu9tMV8w@mail.gmail.com>
	<CAMC681kvoAoV=zDSzxv1B=+0JCgYKP1Km4VGTucGc=zzkpEaEg@mail.gmail.com>
	<CAJ9sUYMFccQaFqZOX8=gP=B8MzOHao1LqUVE7mt1H2CwMMOAAg@mail.gmail.com>
	<CAPyP4uKGXTDqJ74ejZJuo6OrD+s7LnmUQvaSRjSz04dhD0RqZQ@mail.gmail.com>
	<CAKVJ-_7r5fRR3WsqXZu4Hn_7=hRDUU4q_7PwjmrCjnGmTu6NWw@mail.gmail.com>
	<CAJ9sUYN__8-8V773GRU5MqXVQtK+Dm=hAEWbaNCUXcGAV0jqXA@mail.gmail.com>
	<CAKVJ-_6_QKoVXB=5YKK_w_063snmWORp4LMjLCYEbi5+iV1tzg@mail.gmail.com>
Message-ID: <CAJ9sUYNkNB0Hcna3PDOCB6GnDLJ2ZZQB0xKHcK6_eTCwBGg01Q@mail.gmail.com>

Ok, done I guess: https://github.com/biopython/biopython/pull/165/files

Thanks for all the input!


2013/2/27 Peter Cock <p.j.a.cock at googlemail.com>

> On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> > I'll have a look at the tutorial later (I think it is in the Bio.PDB
> FAQ).
> >
> > The whitespace issue is solved I think. What are the rules exactly?
> Sorry if
> > I'm at a bit of a loss here..
>
> PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines
>
> (Currently an aspiration for the Biopython code, rather than a strict
> requirement)
>
> > I added tests for the save functions (a full structure and a single
> residue)
> > as well as one for the chainless residue. I added the suggestion from
> David
> > to keep the id in the parent if there is one.
> >
> > I reverted the commit and added the same (- the whitespace) and another
> with
> > tests. If it looks ok, I'll make a pull request (if I can find the
> button,
> > never did that..).
>
> GitHub have made it quite easy, but the first time is always the hardest.
> Good luck, and if you get stuck we can try to help or just pull the commits
> in directly from your fork.
>
> Thanks,
>
> Peter
>


From p.j.a.cock at googlemail.com  Wed Feb 27 22:32:35 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 22:32:35 +0000
Subject: [Biopython-dev] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for
	abstracts
In-Reply-To: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
Message-ID: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>

The new bioinformatics mini-symposium this year makes SciPy 2013
especially interesting.

Peter

---------- Forwarded message ----------
From: *Jonathan Rocher*
Date: Wednesday, February 27, 2013
Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
Discussion of Numerical Python <numpy-discussion at scipy.org>


[Apologies for cross-posts]

Dear all,

The annual SciPy Conference (Scientific Computing with
Python)<http://conference.scipy.org/scipy2013/about.php> allows
participants from academic, commercial, and governmental organizations to
showcase their latest projects, learn from skilled users and developers,
and collaborate on code development. *The deadline for abstract submissions
is March 20th, 2013. *

Submissions are welcome that address general Scientific Computing with
Python, one of the two special themes for this years conference (machine
learning & reproducible science), or the domain-specific
mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
during the conference (Meteorology, climatology, and atmospheric and
oceanic science, Astronomy and astrophysics, Medical imaging,
Bio-informatics).

Please submit your abstract at the SciPy 2013 website abstract submission
form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
Abstracts will be accepted for posters or presentations. Optional papers to
be published in the conference proceedings will be requested following
abstract submission. This year the proceedings will be made available prior
to the conference to help attendees navigate the conference.

We look forward to an exciting and interesting set of talks, posters, and
discussions and hope to see you at the conference.
The SciPy 2013 Program Committee Chairs

Matt McCormick, Kitware, Inc.
Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory


From redmine at redmine.open-bio.org  Thu Feb 28 01:53:22 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 01:53:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO
Message-ID: <redmine.issue-3419.20130228015322@redmine.open-bio.org>


Issue #3419 has been reported by Jason Stajich.

----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb 28 01:53:23 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 01:53:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO
Message-ID: <redmine.issue-3419.20130228015322@redmine.open-bio.org>


Issue #3419 has been reported by Jason Stajich.

----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb 28 07:08:50 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 07:08:50 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO
References: <redmine.issue-3419.20130228015322@redmine.open-bio.org>
Message-ID: <redmine.journal-15112.20130228070850@redmine.open-bio.org>


Issue #3419 has been updated by Wibowo Arindrarto.


Hi Jason,

Thanks for the report :). Do you have an example file handy which I can try to include in our test suite? The FASTA parser was not tested using [t]fast[y|x], so there may be some lines / cases which the parser couldn't handle.
----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Feb 28 07:38:20 2013
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 28 Feb 2013 07:38:20 +0000
Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO
References: <redmine.issue-3419.20130228015322@redmine.open-bio.org>
Message-ID: <redmine.journal-15113.20130228073820@redmine.open-bio.org>


Issue #3419 has been updated by Jason Stajich.

File bll0026-vs-94.tfasty added

Here is a -m 10 report.

I made this local patch to get it to report the strands, but this is not quite right because you actually don't have a strand for the query which is the protein.

diff --git a/Bio/SearchIO/FastaIO.py b/Bio/SearchIO/FastaIO.py
index ca08797..794efb8 100644
--- a/Bio/SearchIO/FastaIO.py
+++ b/Bio/SearchIO/FastaIO.py
@@ -197,7 +197,7 @@ def _set_hsp_seqs(hsp, parsed, program):
         # set seq and alphabet
         setattr(hsp.fragment, seq_type, parsed[seq_type]['seq'])
 
-        if alphabet is not generic_protein:
+        if alphabet is not generic_protein or 'tfast' in program:
             # get strand from coordinate; start <= end is plus
             # start > end is minus
             if start <= end:

In BioPerl I solved this by writing explicit code for the TBLASTN/TFAST[XY] and BLASTX/FAST[XY] situations which then new whether the query or subject was translated DNA with a strand or input DNA.
----------------------------------------
Bug #3419: Bio.SearchIO.FastaIO
https://redmine.open-bio.org/issues/3419

Author: Jason Stajich
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports.

from Bio import SearchIO
qresults = SearchIO.parse('test.FASTY.out','fasta-m10')
for qresult in qresults:
    for hit in qresult:
        for hsp in hit.hsps:
                print qresult.id, " ", hit.id, " ", \
                hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \
                hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org