From w.arindrarto at gmail.com  Thu Nov  1 04:19:58 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 1 Nov 2012 09:19:58 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
Message-ID: <CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>

Hi Kai, Michiel,

(I hope this gets through to the mailing list. I'm CC-ing several
people in the discussion as well, just in case).

I've made a new branch based on Kai's SearchIO rebase here:
https://github.com/bow/biopython/tree/searchio-rebase, with the
following important changes:

>>Does anyone have preference between '.acc' or '.accession'? If not, I
>>can change the current '.acc' into '.accession'.
>
> I would prefer .accession for clarity.

1. All accession attributes now use the 'accession' name
(https://github.com/bow/biopython/commit/002b08df91040e6bcf3f0dd3d087b3d378005632).
There's a similar attribute from blast-tab, which is the accession
number and its version. This has also been renamed from 'acc_ver' to
'accession_version'. The docs have been updated accordingly.

> See the attached hmmpfam output. You'll notice that the domain table
> is not in the order of the hit table. As I'd like to preserve the
> order of the hit table, the current setup of the API forces me to
> either repeatedly parse the domain annotations until I find the
> correct domain annotations for my hit, or to create the hits in the
> order of the domain annotation table and then reshuffle them to make
> sure they're in the order of the hit table.
>
> If I could just create "empty" hit objects when parsing the hit table,
> I could easily preserve the order of the hits but still add the hsps
> as I parse them.

2. Regarding the Hit object API change, I've changed it so that Hit
objects can now be created without any HSPs
(https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4).
However, per my explanation about keeping as few places possible to
store the same value (in this case the hit and query ID and
description), the empty Hit object will raise errors if any of these
attributes are accessed. Setting and getting these attributes will
only work if there is at least one HSP in the Hit. Other Hit
functions, like append, should work ok as long as it doesn't involve
accessing these attributes. I think this will allow parsing of file
formats like HMMER2 plain text while maintaining the attribute storage
constraint.


Hope these help :).

regards,
Bow

From kai.blin at biotech.uni-tuebingen.de  Thu Nov  1 05:10:11 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 01 Nov 2012 10:10:11 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
	<CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
Message-ID: <50923C73.8060609@biotech.uni-tuebingen.de>

On 2012-11-01 09:19, Wibowo Arindrarto wrote:

Hi Bow,

> 2. Regarding the Hit object API change, I've changed it so that Hit
> objects can now be created without any HSPs
> (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4).
> However, per my explanation about keeping as few places possible to
> store the same value (in this case the hit and query ID and
> description), the empty Hit object will raise errors if any of these
> attributes are accessed. Setting and getting these attributes will
> only work if there is at least one HSP in the Hit. Other Hit
> functions, like append, should work ok as long as it doesn't involve
> accessing these attributes. I think this will allow parsing of file
> formats like HMMER2 plain text while maintaining the attribute storage
> constraint.

I totally agree the Hit object isn't valid until it has at least one
HSP. Thanks for that change.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben

From redmine at redmine.open-bio.org  Thu Nov  1 06:48:11 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 1 Nov 2012 10:48:11 +0000
Subject: [Biopython-dev] [Biopython - Bug #3297] (Rejected) newline added in
	quated features
References: <redmine.issue-3297.20110926204742@redmine.open-bio.org>
Message-ID: <redmine.journal-14993.20121101104811@redmine.open-bio.org>


Issue #3297 has been updated by Peter Cock.

Status changed from New to Rejected

Was this really files a year ago or is that an oddity in RedMine? All the discussion is in the last day...

This to me is a bug in the GenBank data, rather than this:

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"
</pre> 

the data should have been line-split in a more sensible place, e.g.

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC
                     1.4.1.13)"
</pre>

In any case, the suggested fix is inappropriate for two reasons. First, as noted by Paul, it would remove the white space between words (the typical case). Second, the GenBank parser uses a scanner/consumer, with the GenBank specific consumer attempting to closely model the underlying data (and in this case keep the new lines as given) while the SeqRecord consumer (used by SeqIO) would convert the newlines into spaces. As noted by Paul, the translation value is a special case.

Closing issue.
----------------------------------------
Bug #3297: newline added in quated features
https://redmine.open-bio.org/issues/3297

Author: Jesse van Dam
Status: Rejected
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system

When I have a feature line like (which spans multiple lines) in a genbank file

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

</pre>

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
<pre>
  print(source[0].qualifiers["product"])
</pre>

It will print (with the an unwanted space) 
<pre>
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
</pre>

Changed the following thing in scanner.py to fix this problem
<pre>
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

</pre>


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Thu Nov  1 10:36:36 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 1 Nov 2012 15:36:36 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <50923C73.8060609@biotech.uni-tuebingen.de>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
	<CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
	<50923C73.8060609@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF4=x2Bt0k6gAg=tRwP7Po9wu-sLncPcjT=gyRJ8cjsGaw@mail.gmail.com>

Hi Kai,

You're welcome :). I was thinking changing Hit similar to QueryResult,
which you can create without containing any items. The trade off is that
there's more attributes to keep track of (4 instead of 2) due to them being
stored apart from the contained objects, so I chose not to do it for now.

Anyway, let me know if there are still parsing difficulties because of the
object model.

cheers,
Bow


On Thu, Nov 1, 2012 at 10:10 AM, Kai Blin <kai.blin at biotech.uni-tuebingen.de
> wrote:

> On 2012-11-01 09:19, Wibowo Arindrarto wrote:
>
> Hi Bow,
>
> > 2. Regarding the Hit object API change, I've changed it so that Hit
> > objects can now be created without any HSPs
> > (
> https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4
> ).
> > However, per my explanation about keeping as few places possible to
> > store the same value (in this case the hit and query ID and
> > description), the empty Hit object will raise errors if any of these
> > attributes are accessed. Setting and getting these attributes will
> > only work if there is at least one HSP in the Hit. Other Hit
> > functions, like append, should work ok as long as it doesn't involve
> > accessing these attributes. I think this will allow parsing of file
> > formats like HMMER2 plain text while maintaining the attribute storage
> > constraint.
>
> I totally agree the Hit object isn't valid until it has at least one
> HSP. Thanks for that change.
>
> Cheers,
> Kai
>
> --
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
>


From eric.talevich at gmail.com  Thu Nov  1 14:10:17 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 1 Nov 2012 14:10:17 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
Message-ID: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>

On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> > Peter;
> >
> >> In the case of Bow's SearchIO code, what would you prefer?
> >> e.g. Bio.SearchIO as it is now on his branch?
> >
> > I like plain ol' Search the best but don't have a strong preference. I'm
> > terrible at naming things so trust everyone's judgment on this.
> >
> > Brad
>
> Since we have no clear consensus, I propose we add Bow's code
> as Bio.SearchIO (which is how it is written right now), with the new
> BiopythonExperimentalWarning in place (to alert people that it may
> change in the next release). We can then rename or move it at a
> later date. This will make it easier for people to test the code, and
> also suggest further changes or additions (e.g. Kai's HMMER work).
>
> If we and when we agree a consolidation of the Bio.SeqXXX
> modules, then Bio.SearchIO could move too. If this happens
> before any public release as Bio.SearchIO so much the better.
>
> Adopting lower case module names under Python 3 is also a
> separate issue.
>
> Peter
>
>
+1

Regarding the "great upheaval" of module renaming and reorganization:

0. If the only change is to combine the SeqIO, Seq, SeqRecord and
SeqFeature classes under a single module, we probably can do that in a
backwards-compatible way. But that means keeping our StudlyCaps module
names for the most part.

1. If we're going to change the API substantially, we might as well "do it
right". Besides our PEP8 non-compliance, there are some dark, dusty corners
of Biopython that we ought to clean up while we're at it -- reorganize the
little historical fiefdoms into a coherent structure. We'd call it
Biopython 2.

2. Observing BioPerl and BioRuby, it could make sense to split the
distribution into multiple, with a sequence- and data-oriented
"biopython-core" package and separate packages for, say, 3D structures
("biopython-struct") and perhaps other existing components that have ready
maintainers and which the "core" of Biopython doesn't rely on. I don't
think we need to fragment the code base much, primarily just extract PDB,
SCOP and the other parts that depend on NumPy. On GitHub, these
repositories would still be under the biopython organization name.

3. If we've decided to focus on Python 3 for the reorganization, we can
take advantage of new features in that lineage for packaging, organization
and distribution. These features could make it easier to have side-by-side
Biopython 1 and 2 installations (maybe), and also plugging additional
modules into the main "bio" package (namespace packages, new in Py3.3).

4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't
know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy
convention.

5. Porting: I, personally, would keep using the old Biopython for
everything that's meant to run on Python 2, which is, currently,
everything. Biopython2 running on Python 3 would give me an excuse to start
using Python 3 for new code. Keeping these separate would be more difficult
if the lowercasing were done under the same "Bio" namespace.

Thoughts?

-Eric

From p.j.a.cock at googlemail.com  Thu Nov  1 14:46:36 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 18:46:36 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
	<CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
Message-ID: <CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>

On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Since we have no clear consensus, I propose we add Bow's code
>> as Bio.SearchIO (which is how it is written right now), with the new
>> BiopythonExperimentalWarning in place (to alert people that it may
>> change in the next release). We can then rename or move it at a
>> later date. This will make it easier for people to test the code, and
>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>
>> If we and when we agree a consolidation of the Bio.SeqXXX
>> modules, then Bio.SearchIO could move too. If this happens
>> before any public release as Bio.SearchIO so much the better.
>>
>> Adopting lower case module names under Python 3 is also a
>> separate issue.
>>
>> Peter
>>
>
> +1
>
> Regarding the "great upheaval" of module renaming and reorganization:
>
> 0. If the only change is to combine the SeqIO, Seq, SeqRecord and
> SeqFeature classes under a single module, we probably can do that
> in a backwards-compatible way. But that means keeping our
> StudlyCaps module names for the most part.

Yes, that is something we could do in a backwards compatible way,
with the old "StdulyCaps" Bio.SeqXXX modules persisting as legacy
imports for at least a year (say). But it is worth it? See below.

> 1. If we're going to change the API substantially, we might as well "do it
> right". Besides our PEP8 non-compliance, there are some dark, dusty corners
> of Biopython that we ought to clean up while we're at it -- reorganize the
> little historical fiefdoms into a coherent structure. We'd call it Biopython
> 2.

Absolutely there are things we've lived with out of backwards
compatibility - the Alphabet objects are one example (foremost
the way gaps and stops codons were done with wrapper objects).
I'd also like us to switch the restriction digest module to using zero
based counting as Guido intended, and simplify some of the
more 'magical' code which has caused trouble porting to the
other Python implementations.

> 2. Observing BioPerl and BioRuby, it could make sense to split the
> distribution into multiple, with a sequence- and data-oriented
> "biopython-core" package and separate packages for, say, 3D structures
> ("biopython-struct") and perhaps other existing components that have ready
> maintainers and which the "core" of Biopython doesn't rely on. I don't think
> we need to fragment the code base much, primarily just extract PDB, SCOP and
> the other parts that depend on NumPy. On GitHub, these repositories would
> still be under the biopython organization name.

A clearer divide would be good - something we have at some level
already along the lines with and without numpy. However, given
the still unclear future for python packaging I'm not quite so sure
if we can/should go all the way to separate packages. Perhaps I
am being unduly worried by the concerns in the numpy/scipy
community? After all, we have no fortran code!

> 3. If we've decided to focus on Python 3 for the reorganization, we can take
> advantage of new features in that lineage for packaging, organization and
> distribution. These features could make it easier to have side-by-side
> Biopython 1 and 2 installations (maybe), and also plugging additional
> modules into the main "bio" package (namespace packages, new in Py3.3).

We can and should port the current namespace to Python 3, but
writing "Biopython 2" for Python 3 only (not Python 2) sounds wise.
More on this below.

> 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't
> know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy
> convention.

As noted before, we couldn't use "bio" on the average Mac either - the
default file system is like Windows, case insensitive.

The name biopy is in-line with bumpy/scipy, which is a plus. I know
not everyone liked this name, but personally it seems fine. Better
than bio2 in my view.

> 5. Porting: I, personally, would keep using the old Biopython for everything
> that's meant to run on Python 2, which is, currently, everything. Biopython2
> running on Python 3 would give me an excuse to start using Python 3 for new
> code. Keeping these separate would be more difficult if the lowercasing were
> done under the same "Bio" namespace.
>
> Thoughts?

As noted above, I'm on board with planning a Biopython 2 requiring Python 3
or later. I would regard this as effectively be forking from the current code
base, porting individual modules on a case by case basis (doing a final 2to3
conversion manually as part of this). The code could be shared as a series
of 'alpha' level releases for early testing - assume we want to make some
releases, particularly for Windows where fewer potential testers would
have all the compilers setup to follow the repository.

However, if we do that, we would still support Biopython 1.xx under
Python 3 as well (via 2to3 as we are now, currently 'beta' level support)
for some time in parallel (although likely not getting major new features -
just bug fixes and if required updates for format changes).

Is there enough enthusiasm now to start planning what we'd change for
a (potentially Python 3 only) Biopython 2 yet?

Peter

From p.j.a.cock at googlemail.com  Thu Nov  1 15:40:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 19:40:32 +0000
Subject: [Biopython-dev] Fwd: OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
Message-ID: <CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>

FYI regarding the Biopython website and recent mailing list outage.

Peter

PS you also keep an eye on @Biopython and @OBF_news on Twitter,
which are a useful alternative when the mailing lists are down.

---------- Forwarded message ----------
From: *Peter Cock*
Date: Thursday, November 1, 2012
Subject: OBF server outage announcement / call for SysAdmin volunteers
To: open-bio-l at lists.open-bio.org, OBF Members <members at lists.open-bio.org>
Cc: Chris Dagdigian <chris at bioteam.net>, OBF Board <board at open-bio.org>


Dear all,

As many of you may have noticed, yesterday the Open Bioinformatics
Foundation (OBF) server hosting the mailing lists and most of the
Bio* websites went down.

The mailing lists and simple static webpages (e.g. download pages
for Bio* releases) seem to be back online, as is the OBF news blog:
http://news.open-bio.org/news/ - but the wiki pages are down
(which unfortunately means the Bio* homepages are unavailable).

Services on the failing server are being moved to virtual machines
on the Amazon Cloud, so it may take a few days until everything
has been set up properly and the wiki will be back.

If there is anybody from the Bio* projects who wants to join the OBF's
SysAdmin team and help out with projects like this one, this would be
a good moment to volunteer - please email me or Chris Dagdigian
(the OBF Treasurer and our head Systems Administrator).

Thank you, and please bear with us,

Peter
On behalf of the OBF Board of Directors.

From p.j.a.cock at googlemail.com  Thu Nov  1 15:50:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 19:50:50 +0000
Subject: [Biopython-dev] OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
	<CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
Message-ID: <CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>

On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> FYI regarding the Biopython website and recent mailing list outage.
>
> Peter
>
> PS you also keep an eye on @Biopython and @OBF_news on Twitter,
> which are a useful alternative when the mailing lists are down.
>
> <snip>

I should have added that while the wiki is down (which does
unfortunately include the Biopython home page), the Biopython
downloads remain available via http://biopython.org/DIST/ and
other 'static' content like the Tutorial and API pages are up:

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/DIST/docs/api/

Our source code repository is on GitHub, also fine:
https://github.com/biopython/biopython

Issue tracking is on our RedMine server, also fine:
https://redmine.open-bio.org/projects/biopython

Nightly unit tests are on our Buildbot server, also fine:
http://testing.open-bio.org/biopython/tgrid

Continuous integration testing is on TravisCI, also fine:
http://travis-ci.org/biopython/biopython

Regards,

Peter

From andrewscz at gmail.com  Thu Nov  1 16:32:10 2012
From: andrewscz at gmail.com (Andrew Sczesnak)
Date: Thu, 1 Nov 2012 13:32:10 -0700
Subject: [Biopython-dev] Pull Request: MafIO.py
In-Reply-To: <CAFMxBqGxbTSvPkeE2MeKdM4owLCjpzSE2B3-uezem1mA7=gAPw@mail.gmail.com>
References: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
	<620A45B10433AE4C81D3F931A02812F93BE3FB5721@LESMBX1.adf.bham.ac.uk>
	<CAFMxBqGxbTSvPkeE2MeKdM4owLCjpzSE2B3-uezem1mA7=gAPw@mail.gmail.com>
Message-ID: <CAMNDT_jyUR4tHOhOHSLqUCUvxnd=Wz3Le3wu26bPgp4h9cz9wg@mail.gmail.com>

Thanks Nick! I updated the MafIO branch to allow reading of other key
names not specified in the MAF spec. However, writing is still
restricted to "score" and "pass" keys.

On Thu, Nov 1, 2012 at 4:51 AM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
> Hi Andrew
>
> Here you go:
>
> https://gist.github.com/58bc53d492ecc112d926
>
> Thanks for your help
>
> Regards
>
> Nick
>
>
>
> On Wed, Oct 31, 2012 at 6:10 PM, Andrew Sczesnak <andrewscz at gmail.com>
> wrote:
>>
>> Nick,
>>
>> Can you provide a snippet of a file from mugsy for the unit tests?
>>
>> Thanks,
>> Andrew
>>
>> On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org
>> wrote:
>>
>> > From: Nick Loman <n.j.loman at bham.ac.uk>
>> > Date: Tue, Oct 30, 2012 at 6:34 AM
>> > Subject: Pull Request: MafIO.py
>> >
>> >
>> > Hi there
>> >
>> > Thanks for the MafIO branch. In order to get it to read MAF files
>> > produced
>> > by Mugsy (mugsy.sourceforge.net) I had to make the following change:
>> >
>> > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
>> > index 6eda0ca..4bb1407 100644
>> > --- a/Bio/AlignIO/MafIO.py
>> > +++ b/Bio/AlignIO/MafIO.py
>> > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
>> > single_letter_alphabet):
>> >
>> >              annotations = dict([x.split("=") for x in
>> > line.strip().split()[1:]])
>> >
>> > -            if len([x for x in annotations.keys() if x not in ("score",
>> > "pass")]) > 0:
>> > +            if len([x for x in annotations.keys() if x not in ("score",
>> > "pass", "label", "mult")]) > 0:
>> >                 raise ValueError("Error parsing alignment - invalid key
>> > in
>> > 'a' line")
>> >         elif line.startswith("#"):
>> >             # ignore comments
>> >
>> >
>> > My Python fork is a bit confusing right now so hope you don't mind me
>> > sending this pull request via email!
>> >
>> > Cheers
>> >
>> > Nick
>
>

From eric.talevich at gmail.com  Thu Nov  1 22:47:56 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 1 Nov 2012 22:47:56 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
	<CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>
Message-ID: <CAMC681kXJedKQKkHp82ar6ndwRwe7ymMsfD6sm6j5Ok2RunjCg@mail.gmail.com>

On Thu, Nov 1, 2012 at 2:46 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
>
> > 2. Observing BioPerl and BioRuby, it could make sense to split the
> > distribution into multiple, with a sequence- and data-oriented
> > "biopython-core" package and separate packages for, say, 3D structures
> > ("biopython-struct") and perhaps other existing components that have
> ready
> > maintainers and which the "core" of Biopython doesn't rely on. I don't
> think
> > we need to fragment the code base much, primarily just extract PDB, SCOP
> and
> > the other parts that depend on NumPy. On GitHub, these repositories would
> > still be under the biopython organization name.
>
> A clearer divide would be good - something we have at some level
> already along the lines with and without numpy. However, given
> the still unclear future for python packaging I'm not quite so sure
> if we can/should go all the way to separate packages. Perhaps I
> am being unduly worried by the concerns in the numpy/scipy
> community? After all, we have no fortran code!
>

My own use of packaging features and setuptools in particular is pretty
primitive, so I'm not sure what the risks are.

Having a separate repository for structure-related code would make it much
easier for me and Jo?o to hack on a Bio.PDB successor, I think. It would
also be nice to have a dependency-free "core" and then a bit more
flexibility in using dependencies for add-on packages -- there are a lot of
good existing libraries for structural biology, for instance, and since
performance is so important there we even might want to start using Cython
for some of that code. Then there's Lenna's pure-Python mmCIF parser which
depends on PLY.


> > 5. Porting: I, personally, would keep using the old Biopython for
> everything
> > that's meant to run on Python 2, which is, currently, everything.
> Biopython2
> > running on Python 3 would give me an excuse to start using Python 3 for
> new
> > code. Keeping these separate would be more difficult if the lowercasing
> were
> > done under the same "Bio" namespace.
> >
> > Thoughts?
>
>
> As noted above, I'm on board with planning a Biopython 2 requiring Python 3
> or later. I would regard this as effectively be forking from the current
> code
> base, porting individual modules on a case by case basis (doing a final
> 2to3
> conversion manually as part of this). The code could be shared as a series
> of 'alpha' level releases for early testing - assume we want to make some
> releases, particularly for Windows where fewer potential testers would
> have all the compilers setup to follow the repository.
>
>
Sounds good to me.


> However, if we do that, we would still support Biopython 1.xx under
> Python 3 as well (via 2to3 as we are now, currently 'beta' level support)
> for some time in parallel (although likely not getting major new features -
> just bug fixes and if required updates for format changes).
>
>
Sure. I'm assuming it will be some time before we have a Biopython2 we're
happy with, sorting out the module organization, dusting off old code,
dealing with module-specific dependencies and so on, and I'm OK with that.


> Is there enough enthusiasm now to start planning what we'd change for
> a (potentially Python 3 only) Biopython 2 yet?
>
> Peter
>

Maybe a good time to create the initial fork would be after we've merged
the latest GSoC work and any feasible long-running branches. The
Bio.PDB-related GSoC work, on the other hand, seems to be held up
specifically because we're afraid to muck with the existing sub-package too
much with unstable new code, and I can imagine it would be easier to land
it in a new namespace.

-Eric


From mjldehoon at yahoo.com  Fri Nov  2 12:01:35 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 2 Nov 2012 09:01:35 -0700 (PDT)
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
Message-ID: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi everybody,

--- On Thu, 11/1/12, Eric Talevich <eric.talevich at gmail.com> wrote:
> 1. If we're going to change the API substantially, we might
> as well "do it right". Besides our PEP8 non-compliance, there
> are some dark, dusty corners of Biopython that we ought to clean
> up while we're at it -- reorganize the little historical fiefdoms
> into a coherent structure. We'd call it Biopython 2.

+1.

> 2. Observing BioPerl and BioRuby, it could make sense to
> split the distribution into multiple, with a sequence- and
> data-oriented "biopython-core" package and separate packages
> for, say, 3D structures ("biopython-struct") and perhaps other 
> existing components that have ready
> maintainers and which the "core" of Biopython doesn't rely
> on. I don't think we need to fragment the code base much,
> primarily just extract PDB, SCOP and the other parts that
> depend on NumPy.

This goes against the "coherent structure" in point 1. What is the advantage of splitting the distribution according to whether a module needs NumPy or not? I don't see an advantage to the user, and I don't see an advantage to the developers either. Already I feel that we need to install too many packages to get going with Python in bioinformatics (Python itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to explain to people new to bioinformatics or new to Python. So I would prefer to keep one distribution.

We can be more lenient in terms of dependencies, especially those that don't occur at compile time.

> 4. Naming: "bio" is clean but might cause problems on
> Windows? (I wouldn't know, nyah); "bio2" is nearly as clean;
> "biopy" follows the numpy/scipy convention.

Any problems on Windows will only occur during a transition period, so I wouldn't worry about that too much. Perhaps we should check if there would be any problems; if they are severe, we could check for an existing Biopython installation in setup.py.

bio2 would stay with us forever (well at least until bio3) and is just plain ugly, especially to new users who are not aware of the transition. Then there is the issue that "bio2" would not be for Python 2 but for Python 3.

The "py" is needed in numpy and scipy because otherwise it would be "num" and "sci", which is too short. On the other hand, "bio" is used as a prefix in lots of words, and can stand on its own. Therefore, hurray for "bio".

> 5. Porting: I, personally, would keep using the old Biopython for
> everything that's meant to run on Python 2, which is, currently,
> everything. Biopython2 running on Python 3 would give me an
> excuse to start using Python 3 for new code. Keeping these 
> separate would be more difficult if the lowercasing were done
> under the same "Bio" namespace.

Yes that makes sense.

Best,
-Michiel.

From anaryin at gmail.com  Sat Nov  3 07:12:37 2012
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Sat, 3 Nov 2012 12:12:37 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYOwa1CF4-WTNJ36=yK2yHh0ijrMtwZtxLMKXvNKwRb3yw@mail.gmail.com>

Hi everyone,

A bit late for the party but my two cents.

I agree with Eric in that we should take the opportunity to review some
"dark corners" of the code. Regarding what I can contribute to, there are a
lot of changes planned for Bio.PDB that could benefit from a "cleaner
start".

However, and also in line with Michiel, splitting the distribution in
core/extras would be more cumbersome for new users. However, what about
having in the setup file a part where the user can turn on/off installation
of particular parts of the package. This way you can control if you need
the dependencies or not. By default you would install everything as it is
now, but it would give you a larger degree of control.

As for the namespace and lowercase, I don't really have strong arguments,
but I like 'bio'.

Cheers,

Jo?o

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2012/11/2 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi everybody,
>
> --- On Thu, 11/1/12, Eric Talevich <eric.talevich at gmail.com> wrote:
> > 1. If we're going to change the API substantially, we might
> > as well "do it right". Besides our PEP8 non-compliance, there
> > are some dark, dusty corners of Biopython that we ought to clean
> > up while we're at it -- reorganize the little historical fiefdoms
> > into a coherent structure. We'd call it Biopython 2.
>
> +1.
>
> > 2. Observing BioPerl and BioRuby, it could make sense to
> > split the distribution into multiple, with a sequence- and
> > data-oriented "biopython-core" package and separate packages
> > for, say, 3D structures ("biopython-struct") and perhaps other
> > existing components that have ready
> > maintainers and which the "core" of Biopython doesn't rely
> > on. I don't think we need to fragment the code base much,
> > primarily just extract PDB, SCOP and the other parts that
> > depend on NumPy.
>
> This goes against the "coherent structure" in point 1. What is the
> advantage of splitting the distribution according to whether a module needs
> NumPy or not? I don't see an advantage to the user, and I don't see an
> advantage to the developers either. Already I feel that we need to install
> too many packages to get going with Python in bioinformatics (Python
> itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to
> compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to
> explain to people new to bioinformatics or new to Python. So I would prefer
> to keep one distribution.
>
> We can be more lenient in terms of dependencies, especially those that
> don't occur at compile time.
>
> > 4. Naming: "bio" is clean but might cause problems on
> > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean;
> > "biopy" follows the numpy/scipy convention.
>
> Any problems on Windows will only occur during a transition period, so I
> wouldn't worry about that too much. Perhaps we should check if there would
> be any problems; if they are severe, we could check for an existing
> Biopython installation in setup.py.
>
> bio2 would stay with us forever (well at least until bio3) and is just
> plain ugly, especially to new users who are not aware of the transition.
> Then there is the issue that "bio2" would not be for Python 2 but for
> Python 3.
>
> The "py" is needed in numpy and scipy because otherwise it would be "num"
> and "sci", which is too short. On the other hand, "bio" is used as a prefix
> in lots of words, and can stand on its own. Therefore, hurray for "bio".
>
> > 5. Porting: I, personally, would keep using the old Biopython for
> > everything that's meant to run on Python 2, which is, currently,
> > everything. Biopython2 running on Python 3 would give me an
> > excuse to start using Python 3 for new code. Keeping these
> > separate would be more difficult if the lowercasing were done
> > under the same "Bio" namespace.
>
> Yes that makes sense.
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From tiagoantao at gmail.com  Sun Nov  4 08:09:35 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 13:09:35 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAA9RGENhu2QLKYcxdf4VRPr+1oy6dHT-LjhRC9_bQY7m-KP5gg@mail.gmail.com>

Hi,


On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Already I feel that we need to install too many packages to get going with
> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
> SciPy, Biopython). I find this hard to explain to people new to
> bioinformatics or new to Python. So I would prefer to keep one distribution.
>
> We can be more lenient in terms of dependencies, especially those that
> don't occur at compile time.
>
>
One of the things that I always found lacking with biopython is a clear,
consistent policy on dependencies: Depending on the mood of the day it
could be either good/bad to add a library dependency. As an example, this
ended up with there being a dependency on reportlab, but not on scipy.

Whatever the policy, I think that is should be consistent all across.
Preferably simple to both users and developers.

A few ideas on policy:

1. I totally agree with the the idea of being as lenient as possible with
dependencies (as you say, especially with those that do not occur at
compile time).
2. Biopython belongs to a certain software ecology. I think it would make
sense to see as natural adding dependencies on well established python
libraries.
3. (1+2) If a developer wants to add a dependency on a package, that should
not be a major problem (as long as the package is maintained for long/well
known/stable). Users should only have to deal with the dependency if they
need the functionality that depends on that package.

Python being a dynamic language, there does not have to be a burden on
users/developers if a remote part of Biopython depends on something more
exotic (which most users/developers will never see/install in any case).
Again by "exotic" I mean well known libraries with a track record of years
of stability.

Tiago
PS - Another issue that it would be interesting see cleared-up would be the
policy on compile time (linkage) dependencies. Are new ones encouraged?
What about Java/Jython based?

From p.j.a.cock at googlemail.com  Sun Nov  4 09:01:16 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 14:01:16 +0000
Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names?
Message-ID: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>

Retitling thread

On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
>
> On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Already I feel that we need to install too many packages to get going with
>> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
>> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
>> SciPy, Biopython). I find this hard to explain to people new to
>> bioinformatics or new to Python. So I would prefer to keep one distribution.
>>
>> We can be more lenient in terms of dependencies, especially those that
>> don't occur at compile time.
>>
>
> One of the things that I always found lacking with biopython is a clear,
> consistent policy on dependencies:

It would be good to have something written down, just as we
did with the deprecation policy.

> Depending on the mood of the day it could be either good/bad
> to add a library dependency. As an example, this ended up
> with there being a dependency on reportlab, but not on scipy.

The ReportLab dependency is a 'run time only' dependency and
has been in Biopython for a very long time. You'd have to remind
me if there was any compile time issue with scipy, but my
recollection was we were loath to add a dependency on scipy
(which is quite a complex library to install if not using a package)
for just one or two functions - however you were planning something
more substantial in the PopGen code which would justify it (using
lots of statistics).

> Whatever the policy, I think that is should be consistent all across.
> Preferably simple to both users and developers.
>
> A few ideas on policy:
>
> 1. I totally agree with the the idea of being as lenient as possible with
> dependencies (as you say, especially with those that do not occur at
> compile time).
> 2. Biopython belongs to a certain software ecology. I think it would make
> sense to see as natural adding dependencies on well established python
> libraries.
> 3. (1+2) If a developer wants to add a dependency on a package, that should
> not be a major problem (as long as the package is maintained for long/well
> known/stable). Users should only have to deal with the dependency if they
> need the functionality that depends on that package.
>
> Python being a dynamic language, there does not have to be a burden on
> users/developers if a remote part of Biopython depends on something more
> exotic (which most users/developers will never see/install in any case).
> Again by "exotic" I mean well known libraries with a track record of years
> of stability.

That all sounds reasonable. It is compile time dependencies that I am
most wary of.

However, from an end user perspective having installed Biopython and
then trying a script from a colleague and only then finding 101 optional
run time dependencies are also needed would be annoying.

For Linux packages like Debian there is a 'recommends' field for this kind
of soft dependency. Where do we stand with declaring dependencies in
setup.py so that if using a package manager like pip this it less painful?

In fact, how many 'soft' dependencies like this do we already have?
Just from a quick look at the README file many are not mentioned
under the current 'System Requirements' text (e.g. Network X).

> Tiago
> PS - Another issue that it would be interesting see cleared-up would be the
> policy on compile time (linkage) dependencies. Are new ones encouraged?

Currently discouraged. They make installation much more painful, and
have tended to be left untested, e.g. mmCIF was for many years disabled
by default because no one could work out how to detect its requirements
at compile time.

> What about Java/Jython based?

I'm not so keen on something providing Java/Jython only functionality.
However, something where we could require library X under Jython
while using library Y under C Python makes sense. Database access
would be a perfect example - things like Python's sqlite3 don't yet exist
under Jython.

Peter


From sbassi at clubdelarazon.org  Sun Nov  4 12:34:55 2012
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Sun, 4 Nov 2012 14:34:55 -0300
Subject: [Biopython-dev] 403 link
Message-ID: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>

On page http://biopython.org/wiki/Documentation there are 2 links to a
403 error:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
I can't correct this doc since I don't know were they are.

From p.j.a.cock at googlemail.com  Sun Nov  4 13:08:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 18:08:40 +0000
Subject: [Biopython-dev] 403 link
In-Reply-To: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>
References: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>
Message-ID: <CAKVJ-_7PXgqT2PDd5-21pQe=nV_4UTMTcYX=uYDqGuK=t=iU=w@mail.gmail.com>

On Sun, Nov 4, 2012 at 5:34 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On page http://biopython.org/wiki/Documentation there are 2 links to a
> 403 error:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> I can't correct this doc since I don't know were they are.

The links are correct - this is a side effect of the
current migration from the (dying) OBF server to
an Amazon hosted virtual machine. As of yesterday
the static pages were up and the wiki down, for
now it is the other way round... its being worked on.

Regards,

Peter

From eric.talevich at gmail.com  Sun Nov  4 14:47:53 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 4 Nov 2012 14:47:53 -0500
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
Message-ID: <CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>

On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Retitling thread
>
> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> > Hi,
> >
> >
> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >>
> >> Already I feel that we need to install too many packages to get going
> with
> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
> >> SciPy, Biopython). I find this hard to explain to people new to
> >> bioinformatics or new to Python. So I would prefer to keep one
> distribution.
> >>
> >> We can be more lenient in terms of dependencies, especially those that
> >> don't occur at compile time.
> >>
> >
> > One of the things that I always found lacking with biopython is a clear,
> > consistent policy on dependencies:
>
> It would be good to have something written down, just as we
> did with the deprecation policy.
>

Should we start a page for this on the wiki?


> > Depending on the mood of the day it could be either good/bad
> > to add a library dependency. As an example, this ended up
> > with there being a dependency on reportlab, but not on scipy.
>
> The ReportLab dependency is a 'run time only' dependency and
> has been in Biopython for a very long time. You'd have to remind
> me if there was any compile time issue with scipy, but my
> recollection was we were loath to add a dependency on scipy
> (which is quite a complex library to install if not using a package)
> for just one or two functions - however you were planning something
> more substantial in the PopGen code which would justify it (using
> lots of statistics).
>
> > Whatever the policy, I think that is should be consistent all across.
> > Preferably simple to both users and developers.
> >
> > A few ideas on policy:
> >
> > 1. I totally agree with the the idea of being as lenient as possible with
> > dependencies (as you say, especially with those that do not occur at
> > compile time).
> > 2. Biopython belongs to a certain software ecology. I think it would make
> > sense to see as natural adding dependencies on well established python
> > libraries.
> > 3. (1+2) If a developer wants to add a dependency on a package, that
> should
> > not be a major problem (as long as the package is maintained for
> long/well
> > known/stable). Users should only have to deal with the dependency if they
> > need the functionality that depends on that package.
> >
> > Python being a dynamic language, there does not have to be a burden on
> > users/developers if a remote part of Biopython depends on something more
> > exotic (which most users/developers will never see/install in any case).
> > Again by "exotic" I mean well known libraries with a track record of
> years
> > of stability.
>
> That all sounds reasonable. It is compile time dependencies that I am
> most wary of.
>

Pure-Python dependencies seem less scary -- a package like PLY should work
on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the
dependencies that are most tempting are the ones with essential C
extensions (numpy, scipy, matplotlib).


However, from an end user perspective having installed Biopython and
> then trying a script from a colleague and only then finding 101 optional
> run time dependencies are also needed would be annoying.
>
> For Linux packages like Debian there is a 'recommends' field for this kind
> of soft dependency. Where do we stand with declaring dependencies in
> setup.py so that if using a package manager like pip this it less painful?
>
> In fact, how many 'soft' dependencies like this do we already have?
> Just from a quick look at the README file many are not mentioned
> under the current 'System Requirements' text (e.g. Network X).
>

I just used "git grep import Bio/" to find out. The only egregious
undocumented dependencies are the ones I added in Phylo for graphics:
networkx and matplotlib/pylab.

Other *possible* dependencies are sqlite3 in the case of Jython
(Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k).

Should we add these to the "install_recommends" list in setup.py?


> > Tiago
> > PS - Another issue that it would be interesting see cleared-up would be
> the
> > policy on compile time (linkage) dependencies. Are new ones encouraged?
>
> Currently discouraged. They make installation much more painful, and
> have tended to be left untested, e.g. mmCIF was for many years disabled
> by default because no one could work out how to detect its requirements
> at compile time.
>
> > What about Java/Jython based?
>
> I'm not so keen on something providing Java/Jython only functionality.
> However, something where we could require library X under Jython
> while using library Y under C Python makes sense. Database access
> would be a perfect example - things like Python's sqlite3 don't yet exist
> under Jython.
>
> Peter
>


From tiagoantao at gmail.com  Sun Nov  4 15:49:33 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 20:49:33 +0000
Subject: [Biopython-dev] Jython DB
Message-ID: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>

Howdy,


On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Retitling thread
>

Again ;)


> while using library Y under C Python makes sense. Database access
> would be a perfect example - things like Python's sqlite3 don't yet exist
> under Jython.
>
>
I noticed that there is 1 reference to sqlite3:
Bio.SeqIO._index

Other stuff on BioSQL is just really related to database configuration and
does not impair functionality (exception to a test case that really depends
on sqlite3).

I suppose that a "default" DB with Jython would probably be JavaDB (aka
Apache Derby)? It is available as a default on the Sun/Oracle JDK (though
not the JRE).

I could go ahead and have a try at evaluating the portability costs for
sqlite3->javadb. In theory it should be easy (
http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html)

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Sun Nov  4 15:49:58 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 20:49:58 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
Message-ID: <CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>

On Sunday, November 4, 2012, Eric Talevich wrote:

> On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock <p.j.a.cock at googlemail.com<javascript:_e({}, 'cvml', 'p.j.a.cock at googlemail.com');>
> > wrote:
>
>> Retitling thread
>>
>> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com<javascript:_e({}, 'cvml', 'tiagoantao at gmail.com');>>
>> wrote:
>> > Hi,
>> >
>> >
>> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com<javascript:_e({}, 'cvml', 'mjldehoon at yahoo.com');>>
>> wrote:
>> >>
>> >> Already I feel that we need to install too many packages to get going
>> with
>> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
>> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
>> >> SciPy, Biopython). I find this hard to explain to people new to
>> >> bioinformatics or new to Python. So I would prefer to keep one
>> distribution.
>> >>
>> >> We can be more lenient in terms of dependencies, especially those that
>> >> don't occur at compile time.
>> >>
>> >
>> > One of the things that I always found lacking with biopython is a clear,
>> > consistent policy on dependencies:
>>
>> It would be good to have something written down, just as we
>> did with the deprecation policy.
>>
>
> Should we start a page for this on the wiki?
>
>
The wiki is online again now :)

Maybe agree a draft by email first?


> > Depending on the mood of the day it could be either good/bad
>> > to add a library dependency. As an example, this ended up
>> > with there being a dependency on reportlab, but not on scipy.
>>
>> The ReportLab dependency is a 'run time only' dependency and
>> has been in Biopython for a very long time. You'd have to remind
>> me if there was any compile time issue with scipy, but my
>> recollection was we were loath to add a dependency on scipy
>> (which is quite a complex library to install if not using a package)
>> for just one or two functions - however you were planning something
>> more substantial in the PopGen code which would justify it (using
>> lots of statistics).
>>
>> > Whatever the policy, I think that is should be consistent all across.
>> > Preferably simple to both users and developers.
>> >
>> > A few ideas on policy:
>> >
>> > 1. I totally agree with the the idea of being as lenient as possible
>> with
>> > dependencies (as you say, especially with those that do not occur at
>> > compile time).
>> > 2. Biopython belongs to a certain software ecology. I think it would
>> make
>> > sense to see as natural adding dependencies on well established python
>> > libraries.
>> > 3. (1+2) If a developer wants to add a dependency on a package, that
>> should
>> > not be a major problem (as long as the package is maintained for
>> long/well
>> > known/stable). Users should only have to deal with the dependency if
>> they
>> > need the functionality that depends on that package.
>> >
>> > Python being a dynamic language, there does not have to be a burden on
>> > users/developers if a remote part of Biopython depends on something more
>> > exotic (which most users/developers will never see/install in any case).
>> > Again by "exotic" I mean well known libraries with a track record of
>> years
>> > of stability.
>>
>> That all sounds reasonable. It is compile time dependencies that I am
>> most wary of.
>>
>
> Pure-Python dependencies seem less scary -- a package like PLY should work
> on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the
> dependencies that are most tempting are the ones with essential C
> extensions (numpy, scipy, matplotlib).
>

But (for example) matplotlib wouldn't be a build time dependency
for us.


> However, from an end user perspective having installed Biopython and
>> then trying a script from a colleague and only then finding 101 optional
>> run time dependencies are also needed would be annoying.
>>
>> For Linux packages like Debian there is a 'recommends' field for this kind
>> of soft dependency. Where do we stand with declaring dependencies in
>> setup.py so that if using a package manager like pip this it less painful?
>>
>> In fact, how many 'soft' dependencies like this do we already have?
>> Just from a quick look at the README file many are not mentioned
>> under the current 'System Requirements' text (e.g. Network X).
>>
>
> I just used "git grep import Bio/" to find out. The only egregious
> undocumented dependencies are the ones I added in Phylo for graphics:
> networkx and matplotlib/pylab.
>

Could you add those to the README file then?


> Other *possible* dependencies are sqlite3 in the case of Jython
>
(Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k).
>
> Should we add these to the "install_recommends" list in setup.py?
>

No, they are in the standard lib on C Python, except in the case
of OrderedDict on older Pythons were we bundle a backport
anyway.

Jython has an open bug on including the sqlite3 module,
and might be worth mentioning under a new Jython
specific section of the README.

Peter


From tiagoantao at gmail.com  Sun Nov  4 16:00:10 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 21:00:10 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
	<CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
Message-ID: <CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>

On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Jython has an open bug on including the sqlite3 module,
>
>
This will go nowhere fast as it will be dependent on a JNI library (i.e.
linkage of C code).
The only durable option in the Java space would be a native implementation
of sqlite3.
All other options are not of the "embeddable" type (e.g. JDBC driver to
something running outside), defeating the main purpose of sqlite3.

To sum it up: I doubt that sqlite3 will be a realistic solution in the
Jython space. As per previous email, I suspect that a Python DBI to JDBC
bridge (bundled with Jython by default) + a default database (javadb/derby
or H2 or HSQLDB) is probably more realistic in the Java space.

On the OracleJDK javadb will require 0 dependencies. On other JDK or a JRE,
Apache derby.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Sun Nov  4 16:47:20 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 21:47:20 +0000
Subject: [Biopython-dev] Jython DB
In-Reply-To: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>
References: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>
Message-ID: <CAKVJ-_7tQ=4YgUHosXte0nED8pg_QzSey4pOJOeB+Dw6bBW65Q@mail.gmail.com>

Hi Tiago,

On Sun, Nov 4, 2012 at 8:49 PM, Tiago Ant?o wrote:
> Howdy,
>
> On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote:
>>
>> Retitling thread
>
>
> Again ;)
>
>
>>
>> while using library Y under C Python makes sense. Database access
>> would be a perfect example - things like Python's sqlite3 don't yet exist
>> under Jython.
>>
>
> I noticed that there is 1 reference to sqlite3:
> Bio.SeqIO._index
>
> Other stuff on BioSQL is just really related to database configuration and
> does not impair functionality (exception to a test case that really depends
> on sqlite3).
>
> I suppose that a "default" DB with Jython would probably be JavaDB (aka
> Apache Derby)? It is available as a default on the Sun/Oracle JDK (though
> not the JRE).
>
> I could go ahead and have a try at evaluating the portability costs for
> sqlite3->javadb. In theory it should be easy
> (http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html)

The database stuff in Biopython currently is BioSQL (which under
C Python supports a MySQL, PostgreSQL or SQLite back end)
and things like SeqIO.index which use SQLite3 directly. None of
this currently works under Jython :(

I was hoping Jython would implement an sqlite3 module which we
(and any other Python library) could just use - there seems to be
no progress on that: http://bugs.jython.org/issue1682864

Likewise the MySQLdb and PostgreSQL modules. Failing a port
allowing our current code to "just work", someone could write
alternative code for Biopython to all an appropriate Java DB
interface directly. For our BioSQL we already have a structure
to cope with a range of backends, so this should be quite clean.

In the case of Bio.SeqIO.index_db, we probably only use a fraction
of the full sqlite3 module's capabilities, so special casing this
under Jython to call JavaDB might not be too complicated...
(for anyone who knows there way round Jython and JavaDB)?

If you fancy exploring SQLite3 under Jython, go for it :)

Peter


From p.j.a.cock at googlemail.com  Sun Nov  4 16:48:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 21:48:56 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
	<CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
	<CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>
Message-ID: <CAKVJ-_6kFvc7BOHJztBkAoQp3AMTkisLsG3OHpob3k8EmqGA=g@mail.gmail.com>

On Sun, Nov 4, 2012 at 9:00 PM, Tiago Ant?o wrote:
> On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote:
>>
>> Jython has an open bug on including the sqlite3 module,
>>
>
> This will go nowhere fast as it will be dependent on a JNI library (i.e.
> linkage of C code).
> The only durable option in the Java space would be a native implementation
> of sqlite3.
> All other options are not of the "embeddable" type (e.g. JDBC driver to
> something running outside), defeating the main purpose of sqlite3.

Let's continue this on the new thread:
http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010072.html

Peter


From redmine at redmine.open-bio.org  Sun Nov  4 17:47:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3392.20121104224721@redmine.open-bio.org>


Issue #3392 has been reported by Brad Zoltick.

----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 17:47:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3392.20121104224721@redmine.open-bio.org>


Issue #3392 has been reported by Brad Zoltick.

----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 17:47:23 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3393.20121104224722@redmine.open-bio.org>


Issue #3393 has been reported by Brad Zoltick.

----------------------------------------
Bug #3393: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3393

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 17:47:22 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3393.20121104224722@redmine.open-bio.org>


Issue #3393 has been reported by Brad Zoltick.

----------------------------------------
Bug #3393: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3393

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 19:06:10 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 5 Nov 2012 00:06:10 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] unable to download almost
	any documentation - the download links are invalid
References: <redmine.issue-3392.20121104224721@redmine.open-bio.org>
Message-ID: <redmine.journal-14994.20121105000610@redmine.open-bio.org>


Issue #3392 has been updated by Peter Cock.

Category changed from Documentation to Website
Priority changed from Normal to Urgent

Yep, we know about it - but thanks for letting us know just in case:
http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010069.html

The same issue affects our release downloads too which is more annoying. Its a side effect during server migration from a dying machine to a virtual machine on the Amazon Cloud,
http://lists.open-bio.org/pipermail/biopython/2012-November/008248.html

Leaving this bug open until the new server is fixed...
----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Urgent
Assignee: Biopython Dev Mailing List
Category: Website
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Nov  5 18:07:09 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 5 Nov 2012 23:07:09 +0000
Subject: [Biopython-dev] OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
	<CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
	<CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>
Message-ID: <CAKVJ-_6_6DYmm350QvanWNdA7hZyVqiJW0p6w8J1eww8D1xumQ@mail.gmail.com>

On Thu, Nov 1, 2012 at 7:50 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> FYI regarding the Biopython website and recent mailing list outage.
>>
>> Peter
>>
>> PS you also keep an eye on @Biopython and @OBF_news on Twitter,
>> which are a useful alternative when the mailing lists are down.
>>
>> <snip>
>
> I should have added that while the wiki is down (which does
> unfortunately include the Biopython home page), the Biopython
> downloads remain available via http://biopython.org/DIST/ and
> other 'static' content like the Tutorial and API pages are up:
>
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> http://biopython.org/DIST/docs/api/

Hosting of biopython.org (and the bioperl.org and open-bio.org
websites) was transferred to an Amazon cloud machine over
the weekend, which fixed the wiki but temporarily disabled the
static pages (like the Tutorial and downloads). Those should
all be working again now.

At some later date (to be announced) the server running the
OBF mailing lists will be transferred, which would make the
mailing lists unavailable for a short period.

Regards,

Peter

From redmine at redmine.open-bio.org  Mon Nov  5 18:13:43 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 5 Nov 2012 23:13:43 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (Resolved) unable to
	download almost any documentation - the download links are invalid
References: <redmine.issue-3392.20121104224721@redmine.open-bio.org>
Message-ID: <redmine.journal-14995.20121105231343@redmine.open-bio.org>


Issue #3392 has been updated by Peter Cock.

Status changed from New to Resolved
% Done changed from 0 to 100

This should be working again now :)
----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: Resolved
Priority: Urgent
Assignee: Biopython Dev Mailing List
Category: Website
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kai.blin at biotech.uni-tuebingen.de  Mon Nov 19 09:11:42 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 19 Nov 2012 15:11:42 +0100
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails
 when coordinates are outside of the parent_sequence.
Message-ID: <50AA3E1E.70407@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

I'm currently investigating an error caused by an invalid GenBank file
input that annotates CDS features with invalid coordinates. The
GenBank parser accepts these features, but later my program crashes.

It turns out the crash is because I'm calling the extract() method for
my seq features, which then return an empty Seq object for
out-of-range parent_sequence.

I have the feeling that raising an exception would be the best way of
dealing with this, but of course I can also check the result of
extract() to be different from an empty Seq object.

The line I'd like to throw a ValueError on out-of-bounds coordinates is
https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811

What are your thoughts on this?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQqj4eAAoJEKM5lwBiwTTP7rsIANURFpsEtHOIgJ1z3r6nV3mQ
rI0Vo0fBh59beZA0NYi2rMez+TUFXf87Ih3b9LGIH4xaFsAwpXJrUjvbqC1tuqBv
KFg65psNCnDlp9Pc4DZQnaAS7ycoDrDiJStV387XWE6CA7dTiCkBUfKwuaf7S/om
m1je0XMJ6j6J5+Jn2qW/QMpf2G9e8lAkZyeNIQyYtGF+RbPkBPSxpZFTEn6KsymT
dOLoCQVhlf1R9X0S+nLBAh9Q29akf6/tkUcqdUg5ROoNqvqjudDWbz0JgoTgsf7n
j24rlTIpxktl3KKna6DtoX5ig4EKF5IOnQmo00JrWWL8Liy0oKTY/LRkF5CB85k=
=djFF
-----END PGP SIGNATURE-----

From p.j.a.cock at googlemail.com  Mon Nov 19 11:10:15 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 Nov 2012 16:10:15 +0000
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently
 fails when coordinates are outside of the parent_sequence.
In-Reply-To: <50AA3E1E.70407@biotech.uni-tuebingen.de>
References: <50AA3E1E.70407@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5PcJ_GC=YbyG70+HSXrMoeqs8ZxUn3-wKU=uKqXKxm6w@mail.gmail.com>

On Mon, Nov 19, 2012 at 2:11 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi folks,
>
> I'm currently investigating an error caused by an invalid GenBank file
> input that annotates CDS features with invalid coordinates. The
> GenBank parser accepts these features, but later my program crashes.

Perhaps we should have a parser error/warning at that point?
(as well as any fix to the extract method)

> It turns out the crash is because I'm calling the extract() method for
> my seq features, which then return an empty Seq object for
> out-of-range parent_sequence.
>
> I have the feeling that raising an exception would be the best way of
> dealing with this, but of course I can also check the result of
> extract() to be different from an empty Seq object.
>
> The line I'd like to throw a ValueError on out-of-bounds coordinates is
> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>
> What are your thoughts on this?

Some might find this surprising given the (initially rather odd)
Python slicing behviour with out of range coordindates (which
indirectly cause the behaviour ovserved here):

>>> "hello"[100:200]
''

i.e. Slicing a string outside its bounds gives an empty string.

On balance you're probably right that an error in this situation
makes more sense (a discrepancy between feature location
and the given parent sequence not being long enough).

Peter

From p.j.a.cock at googlemail.com  Mon Nov 19 11:32:11 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 Nov 2012 16:32:11 +0000
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently
 fails when coordinates are outside of the parent_sequence.
In-Reply-To: <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com>
References: <50AA3E1E.70407@biotech.uni-tuebingen.de>
	<CAKVJ-_5PcJ_GC=YbyG70+HSXrMoeqs8ZxUn3-wKU=uKqXKxm6w@mail.gmail.com>
	<8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com>
Message-ID: <CAKVJ-_56RjCYF=bq3Jq_xCnWuEaD-_kEAC66CQV8Fy-9Lai2xw@mail.gmail.com>

On Mon, Nov 19, 2012 at 4:25 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>>> GenBank parser accepts these features, but later my program crashes.
>>
>>Perhaps we should have a parser error/warning at that point?
>>(as well as any fix to the extract method)
>
> Probably a bit tricky because the GenBank file might not contain a
> sequence at all, and we can't tell until we either see the sequence or
> an end of record marker.

The first line should tell you the length, and we already have
a warning in place for naughty GenBank files where the actual
sequence has a different length. Those could be a problem for
this new warning, as you'd only know the expected sequence
length from the header while parsing the features.

>>> I have the feeling that raising an exception would be the best way
>>> of dealing with this, but of course I can also check the result
>>> of extract() to be different from an empty Seq object.
>>>
>>> The line I'd like to throw a ValueError on out-of-bounds coordinates
>>> is
>>>
>>> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>>>
>>> What are your thoughts on this?
>>
>>Some might find this surprising given the (initially rather odd)
>>Python slicing behviour with out of range coordindates (which
>>indirectly cause the behaviour ovserved here):
>>
>>>>> "hello"[100:200]
>>''
>>
>>i.e. Slicing a string outside its bounds gives an empty string.
>
> Yes, that is why we end up with an empty Seq object.
>
>>On balance you're probably right that an error in this situation
>>makes more sense (a discrepancy between feature location
>>and the given parent sequence not being long enough).
>
> Yes. The way I understand the intention of the parent sequence,
> the whole point is that the feature should be located on it.
>
> I'll gladly prepare a patch (and some test).
> Cheers,
>  Kai

OK.

Peter

From redmine at redmine.open-bio.org  Tue Nov 20 08:41:47 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 13:41:47 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie
	implementation can't load large data sets
Message-ID: <redmine.issue-3395.20121120134147@redmine.open-bio.org>


Issue #3395 has been reported by Micha? Nowotka.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 08:41:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 13:41:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie
	implementation can't load large data sets
Message-ID: <redmine.issue-3395.20121120134147@redmine.open-bio.org>


Issue #3395 has been reported by Micha? Nowotka.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 09:02:01 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 14:02:01 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15009.20121120140201@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Can you try the same test case without gzip? i.e. Can you load /tmp/trie.dat rather than /tmp/trie.dat.gz?

Also I would try explicitly opening the files in binary mode.

P.S. Which OS, which version of Python, which version of Biopython?
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 09:18:46 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 14:18:46 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15010.20121120141846@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


Sure, I'll update this issue as soon as I check that.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 11:31:13 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 16:31:13 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15011.20121120163113@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


OK, I tried using standard python file handler with explicit binary mode and it also failed. The file is now 165.5MB.
I also tried bz2 and zip compression, without any luck...
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 12:02:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:02:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15012.20121120170248@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Well that is progress - it means this isn't a problem coming from reading a compressed file on disk - you've made the test case simpler. Can you actually share a self contained example script? If not, I suggest you try halving the dataset (only record the first half of the tries), and retest. Then repeat - this should tell you if the problem is as you suspect a large dataset, or something specific about a special value.

Alternatively can you share the (compressed) file? I could at least check if it fails the same way here, and perhaps add some debugging code to get more information.

The error message itself is coming from some C code, which hasn't changed for some time:
https://github.com/biopython/biopython/blob/master/Bio/triemodule.c

The error itself is likely triggered in function _deserialize_transition in trie.c:
https://github.com/biopython/biopython/blob/master/Bio/triemodule.c

You still haven't told us the important information of which OS, which version of Python, which version of Biopython. Given it is C code, I'd also like to know how Biopython was installed (e.g. did you compile it from source yourself).
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 12:14:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:14:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15013.20121120171421@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I'm using Ubuntu 12.04 LTS, Biopython 1.6 and Python 2.7.3.
Can you tell me where should I place compressed file?

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 12:21:58 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:21:58 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15014.20121120172158@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Sadly RedMine is limited to 5MB attachments. You could use DropBox or something similar, or if you have your own server put the file online temporarily for me to download it?

You probably have Biopython 1.60 (one dot sixty), there was no Biopython 1.6, one dot six. Did you install Biopython using the Ubuntu package manager? i.e. the GUI tool, or at the command line with something like 'apt-get install biopython'?
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 12:43:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:43:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15015.20121120174321@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I put the file here: http://mnowotka.kei.pl/trie.4.dat.gz
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 12:56:47 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:56:47 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15016.20121120175647@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I confirm, it's 1.60 version, I'm using. I installed it either by apt-get install or pip.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Nov 26 08:29:58 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 13:29:58 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
Message-ID: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>

On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> >
>> > Peter;
>> >
>> >> In the case of Bow's SearchIO code, what would you prefer?
>> >> e.g. Bio.SearchIO as it is now on his branch?
>> >
>> > I like plain ol' Search the best but don't have a strong preference. I'm
>> > terrible at naming things so trust everyone's judgment on this.
>> >
>> > Brad
>>
>> Since we have no clear consensus, I propose we add Bow's code
>> as Bio.SearchIO (which is how it is written right now), with the new
>> BiopythonExperimentalWarning in place (to alert people that it may
>> change in the next release). We can then rename or move it at a
>> later date. This will make it easier for people to test the code, and
>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>
>> If we and when we agree a consolidation of the Bio.SeqXXX
>> modules, then Bio.SearchIO could move too. If this happens
>> before any public release as Bio.SearchIO so much the better.
>>
>> Adopting lower case module names under Python 3 is also a
>> separate issue.
>>
>> Peter
>>
>
> +1
>
> Regarding ...

I plan to do the commit today, barring any last minute objections.

I am leaning towards a merge from Bow's original (un-rebased) branch,
which had only three trivial conflicts to handle.

Peter

From w.arindrarto at gmail.com  Mon Nov 26 08:38:23 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 14:38:23 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
Message-ID: <CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>

Hi Peter and everyone,

If it helps, I've done the rebase (also resolving the three conflicts)
with the latest master branch. On top of it, I've also added the new
BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's
available here: https://github.com/bow/biopython/tree/searchio.

However if you're interested in inspecting the non-rebased branch,
I've also kept it here:
https://github.com/bow/biopython/tree/searchio-nonrebased. Note that
this one doesn't have the new experimental warning since it's a
feature added more recently.

Also, in both branches, the tutorial has been changed with the
addition of the (draft) Bio.SearchIO tutorial.

Let me know which one you prefer and I'll submit a pull request :).

cheers,
Bow

On Mon, Nov 26, 2012 at 2:29 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>>>
>>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> >
>>> > Peter;
>>> >
>>> >> In the case of Bow's SearchIO code, what would you prefer?
>>> >> e.g. Bio.SearchIO as it is now on his branch?
>>> >
>>> > I like plain ol' Search the best but don't have a strong preference. I'm
>>> > terrible at naming things so trust everyone's judgment on this.
>>> >
>>> > Brad
>>>
>>> Since we have no clear consensus, I propose we add Bow's code
>>> as Bio.SearchIO (which is how it is written right now), with the new
>>> BiopythonExperimentalWarning in place (to alert people that it may
>>> change in the next release). We can then rename or move it at a
>>> later date. This will make it easier for people to test the code, and
>>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>>
>>> If we and when we agree a consolidation of the Bio.SeqXXX
>>> modules, then Bio.SearchIO could move too. If this happens
>>> before any public release as Bio.SearchIO so much the better.
>>>
>>> Adopting lower case module names under Python 3 is also a
>>> separate issue.
>>>
>>> Peter
>>>
>>
>> +1
>>
>> Regarding ...
>
> I plan to do the commit today, barring any last minute objections.
>
> I am leaning towards a merge from Bow's original (un-rebased) branch,
> which had only three trivial conflicts to handle.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From p.j.a.cock at googlemail.com  Mon Nov 26 08:49:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 13:49:44 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
Message-ID: <CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>

On Mon, Nov 26, 2012 at 1:38 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter and everyone,
>
> If it helps, I've done the rebase (also resolving the three conflicts)
> with the latest master branch. On top of it, I've also added the new
> BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's
> available here: https://github.com/bow/biopython/tree/searchio.
>
> However if you're interested in inspecting the non-rebased branch,
> I've also kept it here:
> https://github.com/bow/biopython/tree/searchio-nonrebased. Note that
> this one doesn't have the new experimental warning since it's a
> feature added more recently.
>
> Also, in both branches, the tutorial has been changed with the
> addition of the (draft) Bio.SearchIO tutorial.
>
> Let me know which one you prefer and I'll submit a pull request :).
>
> cheers,
> Bow

That's fine - I found both branches :)

I've actually done a trial merge on the non-rebased one and
then cherry-picked the experimental warning - looks good.

Once that's done there is some housekeeping to do, like
the indexing code duplication with Bio.SeqIO, and tackling
indexing BGZF compressed files with Bio.SearchIO which
I will have a go at.

Peter

P.S. I had intended to do this earlier this month, but we
had the OBF server issues to deal with.

From w.arindrarto at gmail.com  Mon Nov 26 09:06:03 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 15:06:03 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
Message-ID: <CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>

> That's fine - I found both branches :)
>
> I've actually done a trial merge on the non-rebased one and
> then cherry-picked the experimental warning - looks good.

Ah, good then :).

> Once that's done there is some housekeeping to do, like
> the indexing code duplication with Bio.SeqIO, and tackling
> indexing BGZF compressed files with Bio.SearchIO which
> I will have a go at.

Yes. I'm pretty sure there will also be changes we need to implement
after more feedback from users.

> P.S. I had intended to do this earlier this month, but we
> had the OBF server issues to deal with.

That's ok, I also noticed that it's not until quite recently that the
commits become frequent again.

From mauriceling at gmail.com  Mon Nov 26 09:48:24 2012
From: mauriceling at gmail.com (Maurice Ling)
Date: Mon, 26 Nov 2012 08:48:24 -0600
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
Message-ID: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>

Hi

I am setting an error running this:

from Bio import Entrez
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline",
retmode="text")

The traceback is

Traceback (most recent call last):
  File "C:\Users\Maurice.Ling\Desktop\muscorian\archive\pubmed_dump.py",
line 16, in <module>
    retmode="text")
  File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 133, in
efetch
    keywords["id"] = ",".join(keywds["id"])
TypeError: sequence item 0: expected string, int found

When I changed line 133 of Bio.Entrez.__init__ from

keywords["id"] = ",".join(keywds["id"])

to

keywords["id"] = ",".join(str(keywds["id"]))

The error disappeared.

Maurice LING
mobile: +1(605)5920300, +6596669233
www: http://maurice.vodien.com
CV: http://maurice.vodien.com/maurice_resume.pdf
Linkedin: http://www.linkedin.com/in/mauriceling
ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling

From p.j.a.cock at googlemail.com  Mon Nov 26 09:57:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 14:57:28 +0000
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>
References: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>
Message-ID: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>

On Mon, Nov 26, 2012 at 2:48 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Hi
>
> I am setting an error running this:
>
> from Bio import Entrez
> from Bio import Medline
> handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline",
> retmode="text")
>

I would have used this:

Entrez.efetch(db="pubmed", id=["19300000"], rettype="medline", retmode="text")

In general the NCBI identifiers are arbitrary strings, although
perhaps the pubmed identifiers could be treated as integers.
This is perhaps worth changing in the Bio.Entrez code...

What do you think Michael?

Peter

From mauriceling at gmail.com  Mon Nov 26 10:23:31 2012
From: mauriceling at gmail.com (Maurice Ling)
Date: Mon, 26 Nov 2012 09:23:31 -0600
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
Message-ID: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>

Hi

I found something strange in my download script to pull a list of pubmed
citations. This was working in the past (back in 2008 period)...

The script is

ID_start = 19000000
ID_stop = 19000010
downtime = 1.2

from Bio import Entrez
from Bio import Medline
import string
import time
import cPickle

Entrez.email = 'maurice.ling at sdstate.edu'

while (ID_start < ID_stop):
    try:
        handle = Entrez.efetch(db="pubmed", id=[str(ID_start)],
rettype="medline",
                           retmode="text")
        records = list(Medline.parse(handle))[0]
        print records
        cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1)
        ID_start = ID_start + 1
        time.sleep(downtime)
        print 'ID count: ', str(ID_start)
    except:
        print 'ID count: error ', str(ID_start)
        ID_start = ID_start + 1

But the results from print records kept showing the same thing:

{'STAT': 'MEDLINE', 'IP': '2', 'JT': 'Biochemical medicine', 'DA':
'19760116', 'FAU': ['Makar, A B', 'McMartin, K E', 'Palese, M', 'Tephly, T
R'], 'DP': '1975 Jun', 'OWN': 'NLM', 'PT': ['Journal Article', "Research
Support, U.S. Gov't, P.H.S."], 'LA': ['eng'], 'CRDT': ['1975/06/01 00:00'],
'DCOM': '19760116', 'LR': '20091111', 'PG': '117-26', 'TI': 'Formate assay
in body fluids: application in methanol poisoning.', 'RN': ['0 (Formates)',
'124-38-9 (Carbon Dioxide)', '67-56-1 (Methanol)', 'EC 1.2.- (Aldehyde
Oxidoreductases)'], 'PL': 'UNITED STATES', 'TA': 'Biochem Med', 'JID':
'0151424', 'VI': '13', 'IS': '0006-2944 (Print) 0006-2944 (Linking)', 'AU':
['Makar AB', 'McMartin KE', 'Palese M', 'Tephly TR'], 'MHDA': '1975/06/01
00:01', 'MH': ['Aldehyde Oxidoreductases/metabolism', 'Animals', 'Body
Fluids/*analysis', 'Carbon Dioxide/blood', 'Formates/blood/*poisoning',
'Haplorhini', 'Humans', 'Hydrogen-Ion Concentration', 'Kinetics',
'Methanol/blood', 'Methods', 'Pseudomonas/enzymology'], 'EDAT':
'1975/06/01', 'SO': 'Biochem Med. 1975 Jun;13(2):117-26.', 'SB': 'IM',
'PMID': '1', 'PST': 'ppublish'}

It seems to keep efetching PMID 1 (http://www.ncbi.nlm.nih.gov/pubmed/1)

Any idea?

Thanks in advance.

Maurice LING
mobile: +1(605)5920300, +6596669233
www: http://maurice.vodien.com
CV: http://maurice.vodien.com/maurice_resume.pdf
Linkedin: http://www.linkedin.com/in/mauriceling
ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling

From p.j.a.cock at googlemail.com  Mon Nov 26 10:36:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 15:36:13 +0000
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
In-Reply-To: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
References: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
Message-ID: <CAKVJ-_7SFdCN+hsEGEQ6b3tXGx8bDomEiRVvQDevspMVB-xmOw@mail.gmail.com>

On Mon, Nov 26, 2012 at 3:23 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Hi
>
> I found something strange in my download script to pull a list of pubmed
> citations. This was working in the past (back in 2008 period)...
>
> The script is
>
> ID_start = 19000000
> ID_stop = 19000010
> downtime = 1.2
>
> from Bio import Entrez
> from Bio import Medline
> import string
> import time
> import cPickle
>
> Entrez.email = 'maurice.ling at sdstate.edu'
>
> while (ID_start < ID_stop):
>     try:
>         handle = Entrez.efetch(db="pubmed", id=[str(ID_start)],
> rettype="medline",
>                            retmode="text")
>         records = list(Medline.parse(handle))[0]
>         print records
>         cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1)
>         ID_start = ID_start + 1
>         time.sleep(downtime)
>         print 'ID count: ', str(ID_start)
>     except:
>         print 'ID count: error ', str(ID_start)
>         ID_start = ID_start + 1

Are you sure you didn't run something slightly different? The
simplest possibility would be a line accidentally setting
ID_start to equal 1, rather than increasing it.

Also, using a for loop would be much cleaner (with the identifiers
as either integers or as strings). For instance,

for identifier in range(19000000, 19000010):
   #Do stuff

Note you have a discrepancy with ID_stop vs ID_end

This seems to work for me:

ID_start = 19000000
ID_stop = 19000010
downtime = 1.2
from Bio import Entrez
from Bio import Medline
import string
import time
import cPickle
Entrez.email = 'maurice.ling at sdstate.edu'
for identifier in range(ID_start, ID_stop):
    identifier = str(identifier)
    try:
        handle = Entrez.efetch(db="pubmed", id=identifier,
                               rettype="medline", retmode="text")
        records = list(Medline.parse(handle))[0]
        print records
        cPickle.dump(records, open('%s.txt' % identifier, 'w'), -1)
    except Excpetion, error:
        print "Error for %s - %s" % (identifier, error)

However, rather than parsing the Medline records and saving
the pickled object, I would save the plain text Medline data itself.
That way you can use the files outside of Python (e.g. working at
the Unix command line with grep).

Peter

From p.j.a.cock at googlemail.com  Mon Nov 26 11:08:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:08:28 +0000
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
In-Reply-To: <CAFO915GmHKCRcupbxQAJK23TdbQsKJwwgp=LAcDntV3Ti2ummw@mail.gmail.com>
References: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
	<CAKVJ-_7SFdCN+hsEGEQ6b3tXGx8bDomEiRVvQDevspMVB-xmOw@mail.gmail.com>
	<CAFO915GmHKCRcupbxQAJK23TdbQsKJwwgp=LAcDntV3Ti2ummw@mail.gmail.com>
Message-ID: <CAKVJ-_7WWtEAfmCGejhzg7Xxg99_8jY6G-erjc+8gEoU0_RSXQ@mail.gmail.com>

On Mon, Nov 26, 2012 at 3:42 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Thanks Peter
>
> Now, that seems to work... still scratching my uncaffeinated head though....
>

Great. I'm sure a coffee will help :)

Peter

P.S. Next time could you use the main list for usage queries, rather
than the development list, biopython-dev - thanks!

From p.j.a.cock at googlemail.com  Mon Nov 26 11:46:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:46:44 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
Message-ID: <CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>

On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>> That's fine - I found both branches :)
>>
>> I've actually done a trial merge on the non-rebased one and
>> then cherry-picked the experimental warning - looks good.
>
> Ah, good then :).

Done,
https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513

>> Once that's done there is some housekeeping to do, like
>> the indexing code duplication with Bio.SeqIO, and tackling
>> indexing BGZF compressed files with Bio.SearchIO which
>> I will have a go at.
>
> Yes.

Started, it seems the two _index.py files have diverged a
little more than I'd expected:
https://github.com/biopython/biopython/commit/ad1786b99afd2a50248246d877ff00a53949546b

>> P.S. I had intended to do this earlier this month, but we
>> had the OBF server issues to deal with.
>
> That's ok, I also noticed that it's not until quite recently that the
> commits become frequent again.

Christian Brueffer deserves some of the credit for the recent
burst of commits - he's been very busy sending pull requests!

Peter

From p.j.a.cock at googlemail.com  Mon Nov 26 11:55:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:55:32 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
Message-ID: <CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>

On Mon, Nov 26, 2012 at 4:46 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>>> That's fine - I found both branches :)
>>>
>>> I've actually done a trial merge on the non-rebased one and
>>> then cherry-picked the experimental warning - looks good.
>>
>> Ah, good then :).
>
> Done,
> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513

I've put a short note in the NEWS file,
https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7

Congratulations Bow :)

I guess this would be a good excuse for you to write another blog post ;)

Speaking of which, unless we expect to release Biopython 1.61
soon, we should probably have something on the news blog too
(which reminds me I was supposed to co-ordinate a general
OBF GSoC 2012 post). Maybe I will manage that will on leave
in December?

Regards,

Peter

From w.arindrarto at gmail.com  Mon Nov 26 12:05:43 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 18:05:43 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
	<CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
Message-ID: <CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>

>>>> That's fine - I found both branches :)
>>>>
>>>> I've actually done a trial merge on the non-rebased one and
>>>> then cherry-picked the experimental warning - looks good.
>>>
>>> Ah, good then :).
>>
>> Done,
>> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513
>
> I've put a short note in the NEWS file,
> https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7
>
> Congratulations Bow :)

Thank you :D! It feels great to see the code in master.

> I guess this would be a good excuse for you to write another blog post ;)

It is, and one should come up in the next couple of days :).

Now I'm anxiously waiting for the next Biopython release ~ and the
submodule's 'final' form after more feedback ;).

cheers,
Bow

From p.j.a.cock at googlemail.com  Mon Nov 26 12:22:00 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 17:22:00 +0000
Subject: [Biopython-dev] [GSoC] GSoC python variant final update
In-Reply-To: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
References: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
Message-ID: <CAKVJ-_6ZqitFQh-FEr6gqA0bp5VNO9MEUHjDy9NWEERREMWq6g@mail.gmail.com>

On Mon, Aug 20, 2012 at 5:22 AM, Lenna Peterson <arklenna at gmail.com> wrote:
> Post: http://arklenna.tumblr.com/post/29808300789/
>
> The coordinate mapper, with updated documentation, is now located on
> this branch: https://github.com/lennax/biopython/tree/f_loc4
> It awaits the merging of Peter's f_loc4 branch.
>
> I've written an entry on coordinate mapping for the Cookbook:
> http://biopython.org/wiki/Coordinate_mapping

Hi Lenna,

Do you need my f_loc4 branch for the main GSoC variants work,
or just the coordinate mapper?

Thanks,

Peter

From chapmanb at 50mail.com  Mon Nov 26 15:18:09 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 26 Nov 2012 15:18:09 -0500
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
	<CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
	<CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>
Message-ID: <87vccs15ku.fsf@fastmail.fm>


Bow and Peter;

>> Congratulations Bow :)
>
> Thank you :D! It feels great to see the code in master.

Awesome, nice work on this project and congratulations on getting it
integrated. It's great to see this go in,
Brad


From p.j.a.cock at googlemail.com  Tue Nov 27 04:35:46 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 27 Nov 2012 09:35:46 +0000
Subject: [Biopython-dev] Minor buildbot issues from SearchIO
Message-ID: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>

Hi all,

The BuildBot flagged two new issues overnight,
http://testing.open-bio.org/biopython/tgrid

Python 2.5 on Windows - doctests are failing due to floating point decimal place
differences in the exponent (down to C library differences, something fixed in
later Python releases). Perhaps a Python 2.5 hack is the way to go here?
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio

Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity.
Perhaps there is some encoding setting needed under Python 3 for the BLAST
XML files?
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio

There is a separate cross-platform issue on Python 3.1, "TypeError:
invalid event tuple"
again with XML parsing. Curiously this had started a few days back in
the UniprotIO
tests on one machine, pre-dating the SearchIO merge. I'm not sure what
triggered it.
http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767
http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio

(Note TravisCI doesn't officially support Python 3.1, although until recently
they did offer it unofficially - Python 3.3 support is happening soon through).

Peter

From diego_zea at yahoo.com.ar  Tue Nov 27 09:25:48 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Tue, 27 Nov 2012 06:25:48 -0800 (PST)
Subject: [Biopython-dev] Numpy/Scipy and Biopython
Message-ID: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>

Hi!!!
This is my firts mail in the list.
I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project.
I did this post in Stackoverflow, and I want to share my question to all of you ;)

http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
Best wishes,

?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}

From anaryin at gmail.com  Tue Nov 27 10:40:58 2012
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 27 Nov 2012 16:40:58 +0100
Subject: [Biopython-dev] Numpy/Scipy and Biopython
In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Message-ID: <CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>

Hi Diego,

Nice post and nice ideas. As for Bio.PDB, indeed representing the entire
structure as a Nx3 matrix of coordinates is super attractive, but would
require a deep change in the current framework. Also, manipulation of the
structure (removing atoms, adding atoms, etc) would become a bit more
complicated.. If you have good ideas to do this, please do share them. I
know for example ProDy and csb use a similar approach.

Cheers,

Jo?o

2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>

>
> http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated


From redmine at redmine.open-bio.org  Tue Nov 27 19:46:22 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 28 Nov 2012 00:46:22 +0000
Subject: [Biopython-dev] [Biopython - Feature #3396] (New) Add alignment
	score, % identity, % similarity, % gaps, etc to EmbossIO
Message-ID: <redmine.issue-3396.20121128004622@redmine.open-bio.org>


Issue #3396 has been reported by Olga Botvinnik.

----------------------------------------
Feature #3396: Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO
https://redmine.open-bio.org/issues/3396

Author: Olga Botvinnik
Status: New
Priority: Normal
Assignee: Olga Botvinnik
Category: 
Target version: 
URL: 


As of BioPython 1.59, if an alignment is read in with Bio.AlignIO(handle, 'emboss'), the metadata such as the substitution matrix used, gap_penalty, extend_penalty, identity, similarity, gaps, and score in the header is ignored:

<pre>
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# Score: 100.0
#
#
#=======================================
</pre>

I edited the EmbossIO.py file to read these metadata and add it as an annotation to each SeqRecord in the MultipleSequenceAlignment object, since the MultipleSequenceAlignment object does not have the option for annotations. I also added the appropriate unit tests. Please let me know if there is a bug in the code that I missed.

For example, for the above alignment, the SeqRecord objects would have the following annotations:

<pre>
{'identity_denominator': 131, 'matrix': 'EBLOSUM62', 'similarity': 0.8549618320610687, 'similarity_numerator': 112, 'similarity_denominator': 131, 'gaps': 0.1450381679389313, 'identity_numerator': 112, 'gap_penalty': 10.0, 'extend_penalty': 0.5, 'gaps_denominator': 131, 'score': 591.5, 'identity': 0.8549618320610687, 'gaps_numerator': 19}
</pre>

I decided to keep the numerators and denominators separately from the identity, similarity, and gap percentages just in case a user wanted to do something else with them.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From diego_zea at yahoo.com.ar  Tue Nov 27 22:09:58 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Tue, 27 Nov 2012 19:09:58 -0800 (PST)
Subject: [Biopython-dev] Numpy/Scipy and Biopython
In-Reply-To: <CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
	<CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>
Message-ID: <1354072198.13226.YahooMailNeo@web140606.mail.bf1.yahoo.com>

""""
Hi Jo?o (and others)!!! Thanks :)

I think someone with more Numpy knowledgement can do this better, but this is my idea:

1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser)
2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how )
[ The indice is the same, and you can use one for slice the other with boolean arrays ;) ]
3- Define methods for the most commons operations

This is and example of my idea (work on 1AB0 from PDB)...

""""

import numpy

names=[]
descript=[]
xyz = []

# The example structure is 
# http://www.rcsb.org/pdb/explore.do?structureId=1ab0 
with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh:
??? """ Very naive parser.I write this in a couple of minutes.
??? It's bad, but it's only for show the idea """
??? for line in fh:
??????? if line[0:4]=='ATOM':
??????????? temp =[]
??????????? temp2 =[]
??????????? temp.append(line[4:11].replace(" ",""))
??????????? temp2.append(line[11:16].replace(" ",""))
??????????? temp2.append(line[17:21].replace(" ",""))
??????????? temp.append(line[22:27].replace(" ",""))
??????????? xyz.append(line[31:56].split())
??????????? temp.append(line[55:60].replace(" ",""))
??????????? temp.append(line[60:67].replace(" ",""))
??????????? temp2.append(line[-5:].replace(" ","").replace("\n",""))
??????????? descript.append(temp)
??????????? names.append(temp2)

# I don't good for using different dtypes 
# In different columns
# But can be better columns with names instead of this:
names_array = numpy.array(names,numpy.character)???????????? 
descript_array = numpy.array(descript,numpy.float16)
xyz_array = numpy.array(xyz,numpy.float16)

def select_atom(names,xyz,descript,atom='CA'):
??? xyz_s = xyz[names[:,0]==atom,:]
??? names_s = names[names[:,0]==atom,:]
??? descript_s = descript[names[:,0]==atom,:]
??? return names_s,xyz_s,descript_s

def delete_res_num(names,xyz,descript,num=20):
??? xyz_s = xyz[descript[:,1]!=num,:]
??? names_s = names[descript[:,1]!=num,:]
??? descript_s = descript[descript[:,1]!=num,:]
??? return names_s,xyz_s,descript_s

def delete_atom_num(names,xyz,descript,num=20):
??? xyz_s = xyz[descript[:,0]!=num,:]
??? names_s = names[descript[:,0]!=num,:]
??? descript_s = descript[descript[:,0]!=num,:]
??? return names_s,xyz_s,descript_s

def add_atom(new_name,new_xyz,new_descript,names,xyz,descript):
??? # Using vstack ;)
??? new_name = numpy.array(new_name,numpy.character)
??? new_descript = numpy.array(new_descript,numpy.float16)
??? new_xyz = numpy.array(new_xyz,numpy.float16)
??? xyz_s = numpy.vstack((xyz,new_xyz))
??? names_s = numpy.vstack((names,new_name))
??? descript_s = numpy.vstack((descript,new_descript))
??? return names_s,xyz_s,descript_s

## Example (works!!!)
xyz_array.shape
delete_atom_num(names_array,xyz_array,descript_array)[1].shape
add_atom(['H','H','H'],[0,0,0],[0,0,0,0],names_array,xyz_array,descript_array)[1].shape


?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}


>________________________________
> De: Jo?o Rodrigues <anaryin at gmail.com>
>Para: Diego Zea <diego_zea at yahoo.com.ar> 
>CC: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
>Enviado: martes, 27 de noviembre de 2012 12:40
>Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython
> 
>
>Hi Diego,
>
>
>Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach.
>
>
>Cheers,
>
>
>Jo?o
>
>
>2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>
>
>http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
>
>
>

From redmine at redmine.open-bio.org  Thu Nov 29 04:09:49 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 09:09:49 +0000
Subject: [Biopython-dev] [Biopython - Feature #3398] (New) Oracle BioSQL
Message-ID: <redmine.issue-3398.20121129090949@redmine.open-bio.org>


Issue #3398 has been reported by Hyungyong Kim.

----------------------------------------
Feature #3398: Oracle BioSQL
https://redmine.open-bio.org/issues/3398

Author: Hyungyong Kim
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


I just tested Oracle BioSQL for Biopython using cx_Oracle. It includes some Biopython modification due to my genbank file test. I attached this patch and describe how it was generated.

<pre>
[yong27 at dev biopython]$ git ls-remote --heads origin
902947a7df49d8529faeb7e1bfb55b2d06252272        refs/heads/master
[yong27 at dev biopython]$ git diff origin/master master > oracle_biosql.diff
[yong27 at dev biopython]$
</pre>

This is a example how to use Oracle BioSQL. Oracle, Oracle BioSQL schema, cx_Oracle has to be installed.

<pre>
from context lib import contextmanager
from BioSQL import BioSeqDatabase

@contextmanager
def biosqlconn(dbname):
    server = BioSeqDatabase.open_database(driver='cx_Oracle, user='USER', passwd='PASS')
    conn = server[dbname]
    try:
        yield conn
    except:
        conn.adaptor.rollback()
        raise
    else:
        conn.adaptor.commit()
    finally:
        conn.adaptor.close()

with biosqlconn('mydb') as biosqldb:
    record = biosqldb.lookup(accession='1234')

</pre>


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Nov 29 05:56:04 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 10:56:04 +0000
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <50B6F8FF.2090206@brueffer.de>
References: <50B6CBB1.9040706@brueffer.de>
	<50B6F8FF.2090206@brueffer.de>
Message-ID: <CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>

Can we continue this on the biopython-dev mailing list (CC'd)?

On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer
<christian at brueffer.de> wrote:
> On 11/29/2012 10:42 AM, Christian Brueffer wrote:
>>
>> Hi,
>>
>> in preparation of cleaning up the AlignACE wrapper, I wanted to test
>> the current wrapper.   However, it doesn't seem to work at all ...
>>
>> For the record, I'm testing with the Linux version of the binary
>> (AlignACE version 2.3  October 27, 1998).
>>
>
> Some of the test files in the Tests directory mention the following AlignACE
> version: "AlignACE 4.0 05/13/04"
>
> This may be the answer to my problems.  Does anyone know where to get hold
> of this version?
>
> The website (http://atlas.med.harvard.edu/) is down and the only
> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html)
> only distributes the old 2.3 version that I have.

Hmm, I don't see any existing unit tests dedicated to this wrapper.
There should really be a file named test_AlignACE_tool.py or similar.

I would also like some doctests in Bio/Motif/Applications/_AlignAce.py
which must be non-executing so they can be run without dependencies,
which of course isn't actually a functional test but it does still catch some
issues - but primarily would be as documentation to demonstrate typical
usage.

I don't appear to have AlignAce installed on my own machines - in
particular, the nightly buildslaves don't have it. I don't think there is
a Debian/Ubuntu package for AlignAce, so testing this under
TravisCI is non-trivial - it looks like their licence agreement could
block packaging it.

Thanks,

Peter

From p.j.a.cock at googlemail.com  Thu Nov 29 06:22:51 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 11:22:51 +0000
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <50B74199.6020904@brueffer.de>
References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de>
	<CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
	<50B74199.6020904@brueffer.de>
Message-ID: <CAKVJ-_7CMuK7uwATmgY4fya9QO+hHaSmxdHwVsVKOZv4UbS9tQ@mail.gmail.com>

On Thu, Nov 29, 2012 at 11:06 AM, Christian Brueffer
<christian at brueffer.de> wrote:
> On 11/29/2012 06:56 PM, Peter Cock wrote:
>>
>> Can we continue this on the biopython-dev mailing list (CC'd)?
>>
>
> (moved to biopython-dev)
>

Thanks.

> Indeed.  I already have a cleaned up wrapper and unit tests in my local
> tree, but I don't want to submit them without actually testing them with an
> up to date binary ;-)

Excellent - I suspected you'd been doing something like this ;)

> archive.org has a version of http://atlas.med.harvard.edu/ from 2011,
> I have contacted the responsible person mentioned on the page.

It was Bartek who wrote the original wrapper (I only made re-factoring
changes since then), hopefully he still has a working AliceACE
installation and can tell us the version numbers etc that he was using.

Regards,

Peter

From christian at brueffer.de  Thu Nov 29 06:06:01 2012
From: christian at brueffer.de (Christian Brueffer)
Date: Thu, 29 Nov 2012 19:06:01 +0800
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de>
	<CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
Message-ID: <50B74199.6020904@brueffer.de>

On 11/29/2012 06:56 PM, Peter Cock wrote:
> Can we continue this on the biopython-dev mailing list (CC'd)?
>
> On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer
> <christian at brueffer.de> wrote:
>> On 11/29/2012 10:42 AM, Christian Brueffer wrote:
>>>
>>> Hi,
>>>
>>> in preparation of cleaning up the AlignACE wrapper, I wanted to test
>>> the current wrapper.   However, it doesn't seem to work at all ...
>>>
>>> For the record, I'm testing with the Linux version of the binary
>>> (AlignACE version 2.3  October 27, 1998).
>>>
>>
>> Some of the test files in the Tests directory mention the following AlignACE
>> version: "AlignACE 4.0 05/13/04"
>>
>> This may be the answer to my problems.  Does anyone know where to get hold
>> of this version?
>>
>> The website (http://atlas.med.harvard.edu/) is down and the only
>> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html)
>> only distributes the old 2.3 version that I have.
>
> Hmm, I don't see any existing unit tests dedicated to this wrapper.
> There should really be a file named test_AlignACE_tool.py or similar.
>
> I would also like some doctests in Bio/Motif/Applications/_AlignAce.py
> which must be non-executing so they can be run without dependencies,
> which of course isn't actually a functional test but it does still catch some
> issues - but primarily would be as documentation to demonstrate typical
> usage.
>
> I don't appear to have AlignAce installed on my own machines - in
> particular, the nightly buildslaves don't have it. I don't think there is
> a Debian/Ubuntu package for AlignAce, so testing this under
> TravisCI is non-trivial - it looks like their licence agreement could
> block packaging it.
>

(moved to biopython-dev)

Indeed.  I already have a cleaned up wrapper and unit tests in my local 
tree, but I don't want to submit them without actually testing them with 
an up to date binary ;-)

archive.org has a version of http://atlas.med.harvard.edu/ from 2011,
I have contacted the responsible person mentioned on the page.

Cheers,

Chris


From mjldehoon at yahoo.com  Thu Nov 29 09:33:12 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 29 Nov 2012 06:33:12 -0800 (PST)
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>
Message-ID: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Mon, 11/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> In general the NCBI identifiers are arbitrary strings,
> although perhaps the pubmed identifiers could be treated as
> integers.
> This is perhaps worth changing in the Bio.Entrez code...
> 
> What do you think Michael?

If we change this in the Bio.Entrez code, we should put str(..) around all NCBI identifiers, not just the pubmed ones. Otherwise we'd have special treatment for one of the Entrez databases, which may cause problems in the future.
I'm OK if somebody else adds the calls to str(..), but I wouldn't champion it myself.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Nov 29 09:49:42 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 14:49:42 +0000
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>
	<1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_50-b6LMu8rN2MRFbU2ssYMraNgqdN=vJSn9YNUJos85w@mail.gmail.com>

On Thu, Nov 29, 2012 at 2:33 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Mon, 11/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> In general the NCBI identifiers are arbitrary strings,
>> although perhaps the pubmed identifiers could be treated as
>> integers.
>> This is perhaps worth changing in the Bio.Entrez code...
>>
>> What do you think Michael?
>
> If we change this in the Bio.Entrez code, we should put str(..) around
> all NCBI identifiers, not just the pubmed ones. Otherwise we'd have
> special treatment for one of the Entrez databases, which may cause
> problems in the future.

Yes, after all there are other Entrez database with 'numerical' identifiers.

> I'm OK if somebody else adds the calls to str(..), but I wouldn't champion
> it myself.

I don't mind doing the commit (and a unit test), but do you have any
specific concern in mind?

Peter

From redmine at redmine.open-bio.org  Thu Nov 29 12:12:31 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 17:12:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15021.20121129171231@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.

File trie_debug.patch added

I can reproduce the problem with your saved file under Mac OS X, using the latest Biopython from github, e.g.

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33) 
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import trie
>>> import gzip
>>> with gzip.open("trie.4.dat.gz") as handle:
...     t = trie.load(handle)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
RuntimeError: loading failed for some reason

Adding a little debugging to the C code tells us where this fails (see attachment), line 669:

668    if(has_value) {
669        if(!(trie->value = (*read_value)(data)))
670            goto _deserialize_trie_error;
371    }

What kind of CPU does your machine have? i.e. is it a normal Intel or AMD CPU, or something unusual like a PowerPC where we have to worry about the bit order interpretation?

We may need a complete example creating the trie as well - the problem could be in the trie itself, the serialisation (writing to disk), or de-serialisation (loading from disk).
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 29 12:21:30 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 17:21:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15022.20121129172130@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I'm using ubuntu virtual machine running on MacBookPro using single Intel? Core? i7-2720QM CPU @ 2.20GHz processor. I will try to prepare code and data for which it fails.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Thu Nov 29 21:35:25 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 30 Nov 2012 03:35:25 +0100
Subject: [Biopython-dev] Minor buildbot issues from SearchIO
In-Reply-To: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>
References: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>
Message-ID: <CADEGkF4RLmQDMS2sBNTs=Rwag_CypmU6WX-Q71R=Xsbuc4_GQg@mail.gmail.com>

Hi everyone,

I've done some digging around to see how to deal with these issues.
Here's what I found:

> The BuildBot flagged two new issues overnight,
> http://testing.open-bio.org/biopython/tgrid
>
> Python 2.5 on Windows - doctests are failing due to floating point decimal place
> differences in the exponent (down to C library differences, something fixed in
> later Python releases). Perhaps a Python 2.5 hack is the way to go here?
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio

I've submitted a pull request to fix this here:
https://github.com/biopython/biopython/pull/98

> Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity.
> Perhaps there is some encoding setting needed under Python 3 for the BLAST
> XML files?
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio

I've also addressed these failures here:
https://github.com/biopython/biopython/pull/99

> There is a separate cross-platform issue on Python 3.1, "TypeError:
> invalid event tuple"
> again with XML parsing. Curiously this had started a few days back in
> the UniprotIO
> tests on one machine, pre-dating the SearchIO merge. I'm not sure what
> triggered it.
> http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767
> http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio

As for this one, it seems that it's caused by a bug in Python3.1
(http://bugs.python.org/issue9257) due to the way
`xml.etree.cElemenTree.iterparse` accepts the `event` argument. I
haven't submitted any pull request for this bug, since the fix looks
quite messy. Should we try to address this or simply make note that
XML parsing in Python3.1 will not work? Like Peter noted, currently
this bug involves Bio.SearchIO blast xml parsing, SeqIO.UniprotIO, and
Phylo.PhyloXMLIO.

regards,
Bow

From diego_zea at yahoo.com.ar  Fri Nov 30 08:00:20 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Fri, 30 Nov 2012 05:00:20 -0800 (PST)
Subject: [Biopython-dev]  Numpy/Scipy and Biopython
In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Message-ID: <1354280420.4305.YahooMailNeo@web140605.mail.bf1.yahoo.com>

Hi! I were checking the Seq/AlignIO, and I think can be possible avoid the overhead of create Bio objects after Numpy object. Adding an optional funci?n in __init__ with a argument setting in False for default. When this arguments became True, objects based on Numpy are generate too. At the time, maybe can be more easy interchange between simple python objects and numpy based objects. And use all functionality of Bio and fast numerical operations of Numpy arrays... It's only and idea, what do you think? Thanks!!! :)

?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}


>________________________________
> De: Diego Zea <diego_zea at yahoo.com.ar>
>Para: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
>Enviado: martes, 27 de noviembre de 2012 11:25
>Asunto: [Biopython-dev] Numpy/Scipy and Biopython
> 
>Hi!!!
>This is my firts mail in the list.
>I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project.
>I did this post in Stackoverflow, and I want to share my question to all of you ;)
>
>http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
>Best wishes,
>
>?
>if ((dx*dp)>=(h/(2*pi)))
>{
>printf("Diego Javier Zea\n");
>}
>_______________________________________________
>Biopython-dev mailing list
>Biopython-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>

From w.arindrarto at gmail.com  Thu Nov  1 08:19:58 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 1 Nov 2012 09:19:58 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
Message-ID: <CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>

Hi Kai, Michiel,

(I hope this gets through to the mailing list. I'm CC-ing several
people in the discussion as well, just in case).

I've made a new branch based on Kai's SearchIO rebase here:
https://github.com/bow/biopython/tree/searchio-rebase, with the
following important changes:

>>Does anyone have preference between '.acc' or '.accession'? If not, I
>>can change the current '.acc' into '.accession'.
>
> I would prefer .accession for clarity.

1. All accession attributes now use the 'accession' name
(https://github.com/bow/biopython/commit/002b08df91040e6bcf3f0dd3d087b3d378005632).
There's a similar attribute from blast-tab, which is the accession
number and its version. This has also been renamed from 'acc_ver' to
'accession_version'. The docs have been updated accordingly.

> See the attached hmmpfam output. You'll notice that the domain table
> is not in the order of the hit table. As I'd like to preserve the
> order of the hit table, the current setup of the API forces me to
> either repeatedly parse the domain annotations until I find the
> correct domain annotations for my hit, or to create the hits in the
> order of the domain annotation table and then reshuffle them to make
> sure they're in the order of the hit table.
>
> If I could just create "empty" hit objects when parsing the hit table,
> I could easily preserve the order of the hits but still add the hsps
> as I parse them.

2. Regarding the Hit object API change, I've changed it so that Hit
objects can now be created without any HSPs
(https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4).
However, per my explanation about keeping as few places possible to
store the same value (in this case the hit and query ID and
description), the empty Hit object will raise errors if any of these
attributes are accessed. Setting and getting these attributes will
only work if there is at least one HSP in the Hit. Other Hit
functions, like append, should work ok as long as it doesn't involve
accessing these attributes. I think this will allow parsing of file
formats like HMMER2 plain text while maintaining the attribute storage
constraint.


Hope these help :).

regards,
Bow


From kai.blin at biotech.uni-tuebingen.de  Thu Nov  1 09:10:11 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 01 Nov 2012 10:10:11 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
	<CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
Message-ID: <50923C73.8060609@biotech.uni-tuebingen.de>

On 2012-11-01 09:19, Wibowo Arindrarto wrote:

Hi Bow,

> 2. Regarding the Hit object API change, I've changed it so that Hit
> objects can now be created without any HSPs
> (https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4).
> However, per my explanation about keeping as few places possible to
> store the same value (in this case the hit and query ID and
> description), the empty Hit object will raise errors if any of these
> attributes are accessed. Setting and getting these attributes will
> only work if there is at least one HSP in the Hit. Other Hit
> functions, like append, should work ok as long as it doesn't involve
> accessing these attributes. I think this will allow parsing of file
> formats like HMMER2 plain text while maintaining the attribute storage
> constraint.

I totally agree the Hit object isn't valid until it has at least one
HSP. Thanks for that change.

Cheers,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From redmine at redmine.open-bio.org  Thu Nov  1 10:48:11 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 1 Nov 2012 10:48:11 +0000
Subject: [Biopython-dev] [Biopython - Bug #3297] (Rejected) newline added in
	quated features
References: <redmine.issue-3297.20110926204742@redmine.open-bio.org>
Message-ID: <redmine.journal-14993.20121101104811@redmine.open-bio.org>


Issue #3297 has been updated by Peter Cock.

Status changed from New to Rejected

Was this really files a year ago or is that an oddity in RedMine? All the discussion is in the last day...

This to me is a bug in the GenBank data, rather than this:

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"
</pre> 

the data should have been line-split in a more sensible place, e.g.

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC
                     1.4.1.13)"
</pre>

In any case, the suggested fix is inappropriate for two reasons. First, as noted by Paul, it would remove the white space between words (the typical case). Second, the GenBank parser uses a scanner/consumer, with the GenBank specific consumer attempting to closely model the underlying data (and in this case keep the new lines as given) while the SeqRecord consumer (used by SeqIO) would convert the newlines into spaces. As noted by Paul, the translation value is a special case.

Closing issue.
----------------------------------------
Bug #3297: newline added in quated features
https://redmine.open-bio.org/issues/3297

Author: Jesse van Dam
Status: Rejected
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


Note: sorry for the duplicate reporting, did not notice the makeup of the bug reporting system

When I have a feature line like (which spans multiple lines) in a genbank file

<pre>
                     /product="Glutamate synthase [NADPH] small chain (EC 1.4.1
                     .13)"

</pre>

Then a space/newline will be added between 1.4.1 and .13 in the result so when printing the feature with the following code
<pre>
  print(source[0].qualifiers["product"])
</pre>

It will print (with the an unwanted space) 
<pre>
Glutamate synthase [NADPH] small chain (EC 1.4.1 .13)
</pre>

Changed the following thing in scanner.py to fix this problem
<pre>
                    elif value[0]=='"':
                        #Quoted...
                        if value[-1]!='"' or value!='"':
                            #No closing quote on the first line...
                            while value[-1] != '"':
-                               value += "\n" + iterator.next() 
+                               value += iterator.next() 
                        else:
                            #One single line (quoted)
                            assert value == '"'
                            if self.debug : print "Quoted line %s:%s" % (key, value)
                        #DO NOT remove the quotes...
                        qualifiers.append((key,value))

</pre>


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Thu Nov  1 14:36:36 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 1 Nov 2012 15:36:36 +0100
Subject: [Biopython-dev] Working with the new SearchIO API
In-Reply-To: <50923C73.8060609@biotech.uni-tuebingen.de>
References: <1351645938.62302.BPMail_high_noncarrier@web164001.mail.gq1.yahoo.com>
	<CADEGkF4xUKRGWO4e7jHKu9u+itVarvXm7NkotkpnG=wWqe54OQ@mail.gmail.com>
	<50923C73.8060609@biotech.uni-tuebingen.de>
Message-ID: <CADEGkF4=x2Bt0k6gAg=tRwP7Po9wu-sLncPcjT=gyRJ8cjsGaw@mail.gmail.com>

Hi Kai,

You're welcome :). I was thinking changing Hit similar to QueryResult,
which you can create without containing any items. The trade off is that
there's more attributes to keep track of (4 instead of 2) due to them being
stored apart from the contained objects, so I chose not to do it for now.

Anyway, let me know if there are still parsing difficulties because of the
object model.

cheers,
Bow


On Thu, Nov 1, 2012 at 10:10 AM, Kai Blin <kai.blin at biotech.uni-tuebingen.de
> wrote:

> On 2012-11-01 09:19, Wibowo Arindrarto wrote:
>
> Hi Bow,
>
> > 2. Regarding the Hit object API change, I've changed it so that Hit
> > objects can now be created without any HSPs
> > (
> https://github.com/bow/biopython/commit/e9137c9ed88c09f6e488f50184292cac474327c4
> ).
> > However, per my explanation about keeping as few places possible to
> > store the same value (in this case the hit and query ID and
> > description), the empty Hit object will raise errors if any of these
> > attributes are accessed. Setting and getting these attributes will
> > only work if there is at least one HSP in the Hit. Other Hit
> > functions, like append, should work ok as long as it doesn't involve
> > accessing these attributes. I think this will allow parsing of file
> > formats like HMMER2 plain text while maintaining the attribute storage
> > constraint.
>
> I totally agree the Hit object isn't valid until it has at least one
> HSP. Thanks for that change.
>
> Cheers,
> Kai
>
> --
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
>


From eric.talevich at gmail.com  Thu Nov  1 18:10:17 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 1 Nov 2012 14:10:17 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
Message-ID: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>

On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> >
> > Peter;
> >
> >> In the case of Bow's SearchIO code, what would you prefer?
> >> e.g. Bio.SearchIO as it is now on his branch?
> >
> > I like plain ol' Search the best but don't have a strong preference. I'm
> > terrible at naming things so trust everyone's judgment on this.
> >
> > Brad
>
> Since we have no clear consensus, I propose we add Bow's code
> as Bio.SearchIO (which is how it is written right now), with the new
> BiopythonExperimentalWarning in place (to alert people that it may
> change in the next release). We can then rename or move it at a
> later date. This will make it easier for people to test the code, and
> also suggest further changes or additions (e.g. Kai's HMMER work).
>
> If we and when we agree a consolidation of the Bio.SeqXXX
> modules, then Bio.SearchIO could move too. If this happens
> before any public release as Bio.SearchIO so much the better.
>
> Adopting lower case module names under Python 3 is also a
> separate issue.
>
> Peter
>
>
+1

Regarding the "great upheaval" of module renaming and reorganization:

0. If the only change is to combine the SeqIO, Seq, SeqRecord and
SeqFeature classes under a single module, we probably can do that in a
backwards-compatible way. But that means keeping our StudlyCaps module
names for the most part.

1. If we're going to change the API substantially, we might as well "do it
right". Besides our PEP8 non-compliance, there are some dark, dusty corners
of Biopython that we ought to clean up while we're at it -- reorganize the
little historical fiefdoms into a coherent structure. We'd call it
Biopython 2.

2. Observing BioPerl and BioRuby, it could make sense to split the
distribution into multiple, with a sequence- and data-oriented
"biopython-core" package and separate packages for, say, 3D structures
("biopython-struct") and perhaps other existing components that have ready
maintainers and which the "core" of Biopython doesn't rely on. I don't
think we need to fragment the code base much, primarily just extract PDB,
SCOP and the other parts that depend on NumPy. On GitHub, these
repositories would still be under the biopython organization name.

3. If we've decided to focus on Python 3 for the reorganization, we can
take advantage of new features in that lineage for packaging, organization
and distribution. These features could make it easier to have side-by-side
Biopython 1 and 2 installations (maybe), and also plugging additional
modules into the main "bio" package (namespace packages, new in Py3.3).

4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't
know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy
convention.

5. Porting: I, personally, would keep using the old Biopython for
everything that's meant to run on Python 2, which is, currently,
everything. Biopython2 running on Python 3 would give me an excuse to start
using Python 3 for new code. Keeping these separate would be more difficult
if the lowercasing were done under the same "Bio" namespace.

Thoughts?

-Eric


From p.j.a.cock at googlemail.com  Thu Nov  1 18:46:36 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 18:46:36 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
	<CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
Message-ID: <CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>

On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Since we have no clear consensus, I propose we add Bow's code
>> as Bio.SearchIO (which is how it is written right now), with the new
>> BiopythonExperimentalWarning in place (to alert people that it may
>> change in the next release). We can then rename or move it at a
>> later date. This will make it easier for people to test the code, and
>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>
>> If we and when we agree a consolidation of the Bio.SeqXXX
>> modules, then Bio.SearchIO could move too. If this happens
>> before any public release as Bio.SearchIO so much the better.
>>
>> Adopting lower case module names under Python 3 is also a
>> separate issue.
>>
>> Peter
>>
>
> +1
>
> Regarding the "great upheaval" of module renaming and reorganization:
>
> 0. If the only change is to combine the SeqIO, Seq, SeqRecord and
> SeqFeature classes under a single module, we probably can do that
> in a backwards-compatible way. But that means keeping our
> StudlyCaps module names for the most part.

Yes, that is something we could do in a backwards compatible way,
with the old "StdulyCaps" Bio.SeqXXX modules persisting as legacy
imports for at least a year (say). But it is worth it? See below.

> 1. If we're going to change the API substantially, we might as well "do it
> right". Besides our PEP8 non-compliance, there are some dark, dusty corners
> of Biopython that we ought to clean up while we're at it -- reorganize the
> little historical fiefdoms into a coherent structure. We'd call it Biopython
> 2.

Absolutely there are things we've lived with out of backwards
compatibility - the Alphabet objects are one example (foremost
the way gaps and stops codons were done with wrapper objects).
I'd also like us to switch the restriction digest module to using zero
based counting as Guido intended, and simplify some of the
more 'magical' code which has caused trouble porting to the
other Python implementations.

> 2. Observing BioPerl and BioRuby, it could make sense to split the
> distribution into multiple, with a sequence- and data-oriented
> "biopython-core" package and separate packages for, say, 3D structures
> ("biopython-struct") and perhaps other existing components that have ready
> maintainers and which the "core" of Biopython doesn't rely on. I don't think
> we need to fragment the code base much, primarily just extract PDB, SCOP and
> the other parts that depend on NumPy. On GitHub, these repositories would
> still be under the biopython organization name.

A clearer divide would be good - something we have at some level
already along the lines with and without numpy. However, given
the still unclear future for python packaging I'm not quite so sure
if we can/should go all the way to separate packages. Perhaps I
am being unduly worried by the concerns in the numpy/scipy
community? After all, we have no fortran code!

> 3. If we've decided to focus on Python 3 for the reorganization, we can take
> advantage of new features in that lineage for packaging, organization and
> distribution. These features could make it easier to have side-by-side
> Biopython 1 and 2 installations (maybe), and also plugging additional
> modules into the main "bio" package (namespace packages, new in Py3.3).

We can and should port the current namespace to Python 3, but
writing "Biopython 2" for Python 3 only (not Python 2) sounds wise.
More on this below.

> 4. Naming: "bio" is clean but might cause problems on Windows? (I wouldn't
> know, nyah); "bio2" is nearly as clean; "biopy" follows the numpy/scipy
> convention.

As noted before, we couldn't use "bio" on the average Mac either - the
default file system is like Windows, case insensitive.

The name biopy is in-line with bumpy/scipy, which is a plus. I know
not everyone liked this name, but personally it seems fine. Better
than bio2 in my view.

> 5. Porting: I, personally, would keep using the old Biopython for everything
> that's meant to run on Python 2, which is, currently, everything. Biopython2
> running on Python 3 would give me an excuse to start using Python 3 for new
> code. Keeping these separate would be more difficult if the lowercasing were
> done under the same "Bio" namespace.
>
> Thoughts?

As noted above, I'm on board with planning a Biopython 2 requiring Python 3
or later. I would regard this as effectively be forking from the current code
base, porting individual modules on a case by case basis (doing a final 2to3
conversion manually as part of this). The code could be shared as a series
of 'alpha' level releases for early testing - assume we want to make some
releases, particularly for Windows where fewer potential testers would
have all the compilers setup to follow the repository.

However, if we do that, we would still support Biopython 1.xx under
Python 3 as well (via 2to3 as we are now, currently 'beta' level support)
for some time in parallel (although likely not getting major new features -
just bug fixes and if required updates for format changes).

Is there enough enthusiasm now to start planning what we'd change for
a (potentially Python 3 only) Biopython 2 yet?

Peter


From p.j.a.cock at googlemail.com  Thu Nov  1 19:40:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 19:40:32 +0000
Subject: [Biopython-dev] Fwd: OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
Message-ID: <CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>

FYI regarding the Biopython website and recent mailing list outage.

Peter

PS you also keep an eye on @Biopython and @OBF_news on Twitter,
which are a useful alternative when the mailing lists are down.

---------- Forwarded message ----------
From: *Peter Cock*
Date: Thursday, November 1, 2012
Subject: OBF server outage announcement / call for SysAdmin volunteers
To: open-bio-l at lists.open-bio.org, OBF Members <members at lists.open-bio.org>
Cc: Chris Dagdigian <chris at bioteam.net>, OBF Board <board at open-bio.org>


Dear all,

As many of you may have noticed, yesterday the Open Bioinformatics
Foundation (OBF) server hosting the mailing lists and most of the
Bio* websites went down.

The mailing lists and simple static webpages (e.g. download pages
for Bio* releases) seem to be back online, as is the OBF news blog:
http://news.open-bio.org/news/ - but the wiki pages are down
(which unfortunately means the Bio* homepages are unavailable).

Services on the failing server are being moved to virtual machines
on the Amazon Cloud, so it may take a few days until everything
has been set up properly and the wiki will be back.

If there is anybody from the Bio* projects who wants to join the OBF's
SysAdmin team and help out with projects like this one, this would be
a good moment to volunteer - please email me or Chris Dagdigian
(the OBF Treasurer and our head Systems Administrator).

Thank you, and please bear with us,

Peter
On behalf of the OBF Board of Directors.


From p.j.a.cock at googlemail.com  Thu Nov  1 19:50:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 1 Nov 2012 19:50:50 +0000
Subject: [Biopython-dev] OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
	<CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
Message-ID: <CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>

On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> FYI regarding the Biopython website and recent mailing list outage.
>
> Peter
>
> PS you also keep an eye on @Biopython and @OBF_news on Twitter,
> which are a useful alternative when the mailing lists are down.
>
> <snip>

I should have added that while the wiki is down (which does
unfortunately include the Biopython home page), the Biopython
downloads remain available via http://biopython.org/DIST/ and
other 'static' content like the Tutorial and API pages are up:

http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/DIST/docs/api/

Our source code repository is on GitHub, also fine:
https://github.com/biopython/biopython

Issue tracking is on our RedMine server, also fine:
https://redmine.open-bio.org/projects/biopython

Nightly unit tests are on our Buildbot server, also fine:
http://testing.open-bio.org/biopython/tgrid

Continuous integration testing is on TravisCI, also fine:
http://travis-ci.org/biopython/biopython

Regards,

Peter


From andrewscz at gmail.com  Thu Nov  1 20:32:10 2012
From: andrewscz at gmail.com (Andrew Sczesnak)
Date: Thu, 1 Nov 2012 13:32:10 -0700
Subject: [Biopython-dev] Pull Request: MafIO.py
In-Reply-To: <CAFMxBqGxbTSvPkeE2MeKdM4owLCjpzSE2B3-uezem1mA7=gAPw@mail.gmail.com>
References: <mailman.1.1351699203.6679.biopython-dev@lists.open-bio.org>
	<620A45B10433AE4C81D3F931A02812F93BE3FB5721@LESMBX1.adf.bham.ac.uk>
	<CAFMxBqGxbTSvPkeE2MeKdM4owLCjpzSE2B3-uezem1mA7=gAPw@mail.gmail.com>
Message-ID: <CAMNDT_jyUR4tHOhOHSLqUCUvxnd=Wz3Le3wu26bPgp4h9cz9wg@mail.gmail.com>

Thanks Nick! I updated the MafIO branch to allow reading of other key
names not specified in the MAF spec. However, writing is still
restricted to "score" and "pass" keys.

On Thu, Nov 1, 2012 at 4:51 AM, Nick Loman <n.j.loman at bham.ac.uk> wrote:
> Hi Andrew
>
> Here you go:
>
> https://gist.github.com/58bc53d492ecc112d926
>
> Thanks for your help
>
> Regards
>
> Nick
>
>
>
> On Wed, Oct 31, 2012 at 6:10 PM, Andrew Sczesnak <andrewscz at gmail.com>
> wrote:
>>
>> Nick,
>>
>> Can you provide a snippet of a file from mugsy for the unit tests?
>>
>> Thanks,
>> Andrew
>>
>> On Oct 31, 2012, at 9:00 AM, biopython-dev-request at lists.open-bio.org
>> wrote:
>>
>> > From: Nick Loman <n.j.loman at bham.ac.uk>
>> > Date: Tue, Oct 30, 2012 at 6:34 AM
>> > Subject: Pull Request: MafIO.py
>> >
>> >
>> > Hi there
>> >
>> > Thanks for the MafIO branch. In order to get it to read MAF files
>> > produced
>> > by Mugsy (mugsy.sourceforge.net) I had to make the following change:
>> >
>> > diff --git a/Bio/AlignIO/MafIO.py b/Bio/AlignIO/MafIO.py
>> > index 6eda0ca..4bb1407 100644
>> > --- a/Bio/AlignIO/MafIO.py
>> > +++ b/Bio/AlignIO/MafIO.py
>> > @@ -178,7 +178,7 @@ def MafIterator(handle, seq_count = None, alphabet =
>> > single_letter_alphabet):
>> >
>> >              annotations = dict([x.split("=") for x in
>> > line.strip().split()[1:]])
>> >
>> > -            if len([x for x in annotations.keys() if x not in ("score",
>> > "pass")]) > 0:
>> > +            if len([x for x in annotations.keys() if x not in ("score",
>> > "pass", "label", "mult")]) > 0:
>> >                 raise ValueError("Error parsing alignment - invalid key
>> > in
>> > 'a' line")
>> >         elif line.startswith("#"):
>> >             # ignore comments
>> >
>> >
>> > My Python fork is a bit confusing right now so hope you don't mind me
>> > sending this pull request via email!
>> >
>> > Cheers
>> >
>> > Nick
>
>


From eric.talevich at gmail.com  Fri Nov  2 02:47:56 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 1 Nov 2012 22:47:56 -0400
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>
References: <CAKVJ-_7=qK=_XjV4DYBgY8g1E5K=9dRVoe590HU_cwLfTdvCjQ@mail.gmail.com>
	<1346913117.35905.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_4M1q9fw4N9XZ+hQ4BzeWsg4vX5NBwjSbB0J3Yss-pAPw@mail.gmail.com>
	<508A694B.7030800@biotech.uni-tuebingen.de>
	<CAKVJ-_5WWiDQOH8QJRvsa92SO4iQnu-zn9U1v4ow=vT7TTtk4Q@mail.gmail.com>
	<508A8041.2020203@biotech.uni-tuebingen.de>
	<CAKVJ-_7zMFxHOcmawg9FMsApWQ_J5NqOyRofdi0pe3DgMG2NLQ@mail.gmail.com>
	<87pq42s9lt.fsf@fastmail.fm>
	<CAKVJ-_4GzU+5vMXd1XLvycV=tK6xcgMoSA53cjNmYC4fGoPM6w@mail.gmail.com>
	<874nldqi3t.fsf@fastmail.fm>
	<CAKVJ-_6uVs=VE6boPAHgPTHgrBS-Q9UrGL+_63V6Go=ch-8oEw@mail.gmail.com>
	<CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<CAKVJ-_5XPbUaNy=OWq7prO+Q+evmr+jtrgtW1xyM82_O+PeYfA@mail.gmail.com>
Message-ID: <CAMC681kXJedKQKkHp82ar6ndwRwe7ymMsfD6sm6j5Ok2RunjCg@mail.gmail.com>

On Thu, Nov 1, 2012 at 2:46 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
>
> > 2. Observing BioPerl and BioRuby, it could make sense to split the
> > distribution into multiple, with a sequence- and data-oriented
> > "biopython-core" package and separate packages for, say, 3D structures
> > ("biopython-struct") and perhaps other existing components that have
> ready
> > maintainers and which the "core" of Biopython doesn't rely on. I don't
> think
> > we need to fragment the code base much, primarily just extract PDB, SCOP
> and
> > the other parts that depend on NumPy. On GitHub, these repositories would
> > still be under the biopython organization name.
>
> A clearer divide would be good - something we have at some level
> already along the lines with and without numpy. However, given
> the still unclear future for python packaging I'm not quite so sure
> if we can/should go all the way to separate packages. Perhaps I
> am being unduly worried by the concerns in the numpy/scipy
> community? After all, we have no fortran code!
>

My own use of packaging features and setuptools in particular is pretty
primitive, so I'm not sure what the risks are.

Having a separate repository for structure-related code would make it much
easier for me and Jo?o to hack on a Bio.PDB successor, I think. It would
also be nice to have a dependency-free "core" and then a bit more
flexibility in using dependencies for add-on packages -- there are a lot of
good existing libraries for structural biology, for instance, and since
performance is so important there we even might want to start using Cython
for some of that code. Then there's Lenna's pure-Python mmCIF parser which
depends on PLY.


> > 5. Porting: I, personally, would keep using the old Biopython for
> everything
> > that's meant to run on Python 2, which is, currently, everything.
> Biopython2
> > running on Python 3 would give me an excuse to start using Python 3 for
> new
> > code. Keeping these separate would be more difficult if the lowercasing
> were
> > done under the same "Bio" namespace.
> >
> > Thoughts?
>
>
> As noted above, I'm on board with planning a Biopython 2 requiring Python 3
> or later. I would regard this as effectively be forking from the current
> code
> base, porting individual modules on a case by case basis (doing a final
> 2to3
> conversion manually as part of this). The code could be shared as a series
> of 'alpha' level releases for early testing - assume we want to make some
> releases, particularly for Windows where fewer potential testers would
> have all the compilers setup to follow the repository.
>
>
Sounds good to me.


> However, if we do that, we would still support Biopython 1.xx under
> Python 3 as well (via 2to3 as we are now, currently 'beta' level support)
> for some time in parallel (although likely not getting major new features -
> just bug fixes and if required updates for format changes).
>
>
Sure. I'm assuming it will be some time before we have a Biopython2 we're
happy with, sorting out the module organization, dusting off old code,
dealing with module-specific dependencies and so on, and I'm OK with that.


> Is there enough enthusiasm now to start planning what we'd change for
> a (potentially Python 3 only) Biopython 2 yet?
>
> Peter
>

Maybe a good time to create the initial fork would be after we've merged
the latest GSoC work and any feasible long-running branches. The
Bio.PDB-related GSoC work, on the other hand, seems to be held up
specifically because we're afraid to muck with the existing sub-package too
much with unstable new code, and I can imagine it would be easier to land
it in a new namespace.

-Eric


From mjldehoon at yahoo.com  Fri Nov  2 16:01:35 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 2 Nov 2012 09:01:35 -0700 (PDT)
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
Message-ID: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi everybody,

--- On Thu, 11/1/12, Eric Talevich <eric.talevich at gmail.com> wrote:
> 1. If we're going to change the API substantially, we might
> as well "do it right". Besides our PEP8 non-compliance, there
> are some dark, dusty corners of Biopython that we ought to clean
> up while we're at it -- reorganize the little historical fiefdoms
> into a coherent structure. We'd call it Biopython 2.

+1.

> 2. Observing BioPerl and BioRuby, it could make sense to
> split the distribution into multiple, with a sequence- and
> data-oriented "biopython-core" package and separate packages
> for, say, 3D structures ("biopython-struct") and perhaps other 
> existing components that have ready
> maintainers and which the "core" of Biopython doesn't rely
> on. I don't think we need to fragment the code base much,
> primarily just extract PDB, SCOP and the other parts that
> depend on NumPy.

This goes against the "coherent structure" in point 1. What is the advantage of splitting the distribution according to whether a module needs NumPy or not? I don't see an advantage to the user, and I don't see an advantage to the developers either. Already I feel that we need to install too many packages to get going with Python in bioinformatics (Python itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to explain to people new to bioinformatics or new to Python. So I would prefer to keep one distribution.

We can be more lenient in terms of dependencies, especially those that don't occur at compile time.

> 4. Naming: "bio" is clean but might cause problems on
> Windows? (I wouldn't know, nyah); "bio2" is nearly as clean;
> "biopy" follows the numpy/scipy convention.

Any problems on Windows will only occur during a transition period, so I wouldn't worry about that too much. Perhaps we should check if there would be any problems; if they are severe, we could check for an existing Biopython installation in setup.py.

bio2 would stay with us forever (well at least until bio3) and is just plain ugly, especially to new users who are not aware of the transition. Then there is the issue that "bio2" would not be for Python 2 but for Python 3.

The "py" is needed in numpy and scipy because otherwise it would be "num" and "sci", which is too short. On the other hand, "bio" is used as a prefix in lots of words, and can stand on its own. Therefore, hurray for "bio".

> 5. Porting: I, personally, would keep using the old Biopython for
> everything that's meant to run on Python 2, which is, currently,
> everything. Biopython2 running on Python 3 would give me an
> excuse to start using Python 3 for new code. Keeping these 
> separate would be more difficult if the lowercasing were done
> under the same "Bio" namespace.

Yes that makes sense.

Best,
-Michiel.


From anaryin at gmail.com  Sat Nov  3 11:12:37 2012
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Sat, 3 Nov 2012 12:12:37 +0100
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAJ9sUYOwa1CF4-WTNJ36=yK2yHh0ijrMtwZtxLMKXvNKwRb3yw@mail.gmail.com>

Hi everyone,

A bit late for the party but my two cents.

I agree with Eric in that we should take the opportunity to review some
"dark corners" of the code. Regarding what I can contribute to, there are a
lot of changes planned for Bio.PDB that could benefit from a "cleaner
start".

However, and also in line with Michiel, splitting the distribution in
core/extras would be more cumbersome for new users. However, what about
having in the setup file a part where the user can turn on/off installation
of particular parts of the package. This way you can control if you need
the dependencies or not. By default you would install everything as it is
now, but it would give you a larger degree of control.

As for the namespace and lowercase, I don't really have strong arguments,
but I like 'bio'.

Cheers,

Jo?o

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2012/11/2 Michiel de Hoon <mjldehoon at yahoo.com>

> Hi everybody,
>
> --- On Thu, 11/1/12, Eric Talevich <eric.talevich at gmail.com> wrote:
> > 1. If we're going to change the API substantially, we might
> > as well "do it right". Besides our PEP8 non-compliance, there
> > are some dark, dusty corners of Biopython that we ought to clean
> > up while we're at it -- reorganize the little historical fiefdoms
> > into a coherent structure. We'd call it Biopython 2.
>
> +1.
>
> > 2. Observing BioPerl and BioRuby, it could make sense to
> > split the distribution into multiple, with a sequence- and
> > data-oriented "biopython-core" package and separate packages
> > for, say, 3D structures ("biopython-struct") and perhaps other
> > existing components that have ready
> > maintainers and which the "core" of Biopython doesn't rely
> > on. I don't think we need to fragment the code base much,
> > primarily just extract PDB, SCOP and the other parts that
> > depend on NumPy.
>
> This goes against the "coherent structure" in point 1. What is the
> advantage of splitting the distribution according to whether a module needs
> NumPy or not? I don't see an advantage to the user, and I don't see an
> advantage to the developers either. Already I feel that we need to install
> too many packages to get going with Python in bioinformatics (Python
> itself, NumPy, Matplotlib and its dependencies, Pysam, Cython (needed to
> compile Pysam), ezsetup, perhaps SciPy, Biopython). I find this hard to
> explain to people new to bioinformatics or new to Python. So I would prefer
> to keep one distribution.
>
> We can be more lenient in terms of dependencies, especially those that
> don't occur at compile time.
>
> > 4. Naming: "bio" is clean but might cause problems on
> > Windows? (I wouldn't know, nyah); "bio2" is nearly as clean;
> > "biopy" follows the numpy/scipy convention.
>
> Any problems on Windows will only occur during a transition period, so I
> wouldn't worry about that too much. Perhaps we should check if there would
> be any problems; if they are severe, we could check for an existing
> Biopython installation in setup.py.
>
> bio2 would stay with us forever (well at least until bio3) and is just
> plain ugly, especially to new users who are not aware of the transition.
> Then there is the issue that "bio2" would not be for Python 2 but for
> Python 3.
>
> The "py" is needed in numpy and scipy because otherwise it would be "num"
> and "sci", which is too short. On the other hand, "bio" is used as a prefix
> in lots of words, and can stand on its own. Therefore, hurray for "bio".
>
> > 5. Porting: I, personally, would keep using the old Biopython for
> > everything that's meant to run on Python 2, which is, currently,
> > everything. Biopython2 running on Python 3 would give me an
> > excuse to start using Python 3 for new code. Keeping these
> > separate would be more difficult if the lowercasing were done
> > under the same "Bio" namespace.
>
> Yes that makes sense.
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From tiagoantao at gmail.com  Sun Nov  4 13:09:35 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 13:09:35 +0000
Subject: [Biopython-dev] PEP8 lower case module names?
In-Reply-To: <1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <CAMC681=_Bjms0jbb7+7TKRWtaeRVNbT-Jtx6wucv398KH0xO4A@mail.gmail.com>
	<1351872095.63086.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CAA9RGENhu2QLKYcxdf4VRPr+1oy6dHT-LjhRC9_bQY7m-KP5gg@mail.gmail.com>

Hi,


On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> Already I feel that we need to install too many packages to get going with
> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
> SciPy, Biopython). I find this hard to explain to people new to
> bioinformatics or new to Python. So I would prefer to keep one distribution.
>
> We can be more lenient in terms of dependencies, especially those that
> don't occur at compile time.
>
>
One of the things that I always found lacking with biopython is a clear,
consistent policy on dependencies: Depending on the mood of the day it
could be either good/bad to add a library dependency. As an example, this
ended up with there being a dependency on reportlab, but not on scipy.

Whatever the policy, I think that is should be consistent all across.
Preferably simple to both users and developers.

A few ideas on policy:

1. I totally agree with the the idea of being as lenient as possible with
dependencies (as you say, especially with those that do not occur at
compile time).
2. Biopython belongs to a certain software ecology. I think it would make
sense to see as natural adding dependencies on well established python
libraries.
3. (1+2) If a developer wants to add a dependency on a package, that should
not be a major problem (as long as the package is maintained for long/well
known/stable). Users should only have to deal with the dependency if they
need the functionality that depends on that package.

Python being a dynamic language, there does not have to be a burden on
users/developers if a remote part of Biopython depends on something more
exotic (which most users/developers will never see/install in any case).
Again by "exotic" I mean well known libraries with a track record of years
of stability.

Tiago
PS - Another issue that it would be interesting see cleared-up would be the
policy on compile time (linkage) dependencies. Are new ones encouraged?
What about Java/Jython based?


From p.j.a.cock at googlemail.com  Sun Nov  4 14:01:16 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 14:01:16 +0000
Subject: [Biopython-dev] Dependency policy; was PEP8 lower case module names?
Message-ID: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>

Retitling thread

On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
>
> On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Already I feel that we need to install too many packages to get going with
>> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
>> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
>> SciPy, Biopython). I find this hard to explain to people new to
>> bioinformatics or new to Python. So I would prefer to keep one distribution.
>>
>> We can be more lenient in terms of dependencies, especially those that
>> don't occur at compile time.
>>
>
> One of the things that I always found lacking with biopython is a clear,
> consistent policy on dependencies:

It would be good to have something written down, just as we
did with the deprecation policy.

> Depending on the mood of the day it could be either good/bad
> to add a library dependency. As an example, this ended up
> with there being a dependency on reportlab, but not on scipy.

The ReportLab dependency is a 'run time only' dependency and
has been in Biopython for a very long time. You'd have to remind
me if there was any compile time issue with scipy, but my
recollection was we were loath to add a dependency on scipy
(which is quite a complex library to install if not using a package)
for just one or two functions - however you were planning something
more substantial in the PopGen code which would justify it (using
lots of statistics).

> Whatever the policy, I think that is should be consistent all across.
> Preferably simple to both users and developers.
>
> A few ideas on policy:
>
> 1. I totally agree with the the idea of being as lenient as possible with
> dependencies (as you say, especially with those that do not occur at
> compile time).
> 2. Biopython belongs to a certain software ecology. I think it would make
> sense to see as natural adding dependencies on well established python
> libraries.
> 3. (1+2) If a developer wants to add a dependency on a package, that should
> not be a major problem (as long as the package is maintained for long/well
> known/stable). Users should only have to deal with the dependency if they
> need the functionality that depends on that package.
>
> Python being a dynamic language, there does not have to be a burden on
> users/developers if a remote part of Biopython depends on something more
> exotic (which most users/developers will never see/install in any case).
> Again by "exotic" I mean well known libraries with a track record of years
> of stability.

That all sounds reasonable. It is compile time dependencies that I am
most wary of.

However, from an end user perspective having installed Biopython and
then trying a script from a colleague and only then finding 101 optional
run time dependencies are also needed would be annoying.

For Linux packages like Debian there is a 'recommends' field for this kind
of soft dependency. Where do we stand with declaring dependencies in
setup.py so that if using a package manager like pip this it less painful?

In fact, how many 'soft' dependencies like this do we already have?
Just from a quick look at the README file many are not mentioned
under the current 'System Requirements' text (e.g. Network X).

> Tiago
> PS - Another issue that it would be interesting see cleared-up would be the
> policy on compile time (linkage) dependencies. Are new ones encouraged?

Currently discouraged. They make installation much more painful, and
have tended to be left untested, e.g. mmCIF was for many years disabled
by default because no one could work out how to detect its requirements
at compile time.

> What about Java/Jython based?

I'm not so keen on something providing Java/Jython only functionality.
However, something where we could require library X under Jython
while using library Y under C Python makes sense. Database access
would be a perfect example - things like Python's sqlite3 don't yet exist
under Jython.

Peter


From sbassi at clubdelarazon.org  Sun Nov  4 17:34:55 2012
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Sun, 4 Nov 2012 14:34:55 -0300
Subject: [Biopython-dev] 403 link
Message-ID: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>

On page http://biopython.org/wiki/Documentation there are 2 links to a
403 error:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
I can't correct this doc since I don't know were they are.


From p.j.a.cock at googlemail.com  Sun Nov  4 18:08:40 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 18:08:40 +0000
Subject: [Biopython-dev] 403 link
In-Reply-To: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>
References: <CAHpha49Bvusw=aYT3K22WHYGrfXPO_v62p++ObYOFuvZXMrPxA@mail.gmail.com>
Message-ID: <CAKVJ-_7PXgqT2PDd5-21pQe=nV_4UTMTcYX=uYDqGuK=t=iU=w@mail.gmail.com>

On Sun, Nov 4, 2012 at 5:34 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On page http://biopython.org/wiki/Documentation there are 2 links to a
> 403 error:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> I can't correct this doc since I don't know were they are.

The links are correct - this is a side effect of the
current migration from the (dying) OBF server to
an Amazon hosted virtual machine. As of yesterday
the static pages were up and the wiki down, for
now it is the other way round... its being worked on.

Regards,

Peter


From eric.talevich at gmail.com  Sun Nov  4 19:47:53 2012
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 4 Nov 2012 14:47:53 -0500
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
Message-ID: <CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>

On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Retitling thread
>
> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> > Hi,
> >
> >
> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> >>
> >> Already I feel that we need to install too many packages to get going
> with
> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
> >> SciPy, Biopython). I find this hard to explain to people new to
> >> bioinformatics or new to Python. So I would prefer to keep one
> distribution.
> >>
> >> We can be more lenient in terms of dependencies, especially those that
> >> don't occur at compile time.
> >>
> >
> > One of the things that I always found lacking with biopython is a clear,
> > consistent policy on dependencies:
>
> It would be good to have something written down, just as we
> did with the deprecation policy.
>

Should we start a page for this on the wiki?


> > Depending on the mood of the day it could be either good/bad
> > to add a library dependency. As an example, this ended up
> > with there being a dependency on reportlab, but not on scipy.
>
> The ReportLab dependency is a 'run time only' dependency and
> has been in Biopython for a very long time. You'd have to remind
> me if there was any compile time issue with scipy, but my
> recollection was we were loath to add a dependency on scipy
> (which is quite a complex library to install if not using a package)
> for just one or two functions - however you were planning something
> more substantial in the PopGen code which would justify it (using
> lots of statistics).
>
> > Whatever the policy, I think that is should be consistent all across.
> > Preferably simple to both users and developers.
> >
> > A few ideas on policy:
> >
> > 1. I totally agree with the the idea of being as lenient as possible with
> > dependencies (as you say, especially with those that do not occur at
> > compile time).
> > 2. Biopython belongs to a certain software ecology. I think it would make
> > sense to see as natural adding dependencies on well established python
> > libraries.
> > 3. (1+2) If a developer wants to add a dependency on a package, that
> should
> > not be a major problem (as long as the package is maintained for
> long/well
> > known/stable). Users should only have to deal with the dependency if they
> > need the functionality that depends on that package.
> >
> > Python being a dynamic language, there does not have to be a burden on
> > users/developers if a remote part of Biopython depends on something more
> > exotic (which most users/developers will never see/install in any case).
> > Again by "exotic" I mean well known libraries with a track record of
> years
> > of stability.
>
> That all sounds reasonable. It is compile time dependencies that I am
> most wary of.
>

Pure-Python dependencies seem less scary -- a package like PLY should work
on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the
dependencies that are most tempting are the ones with essential C
extensions (numpy, scipy, matplotlib).


However, from an end user perspective having installed Biopython and
> then trying a script from a colleague and only then finding 101 optional
> run time dependencies are also needed would be annoying.
>
> For Linux packages like Debian there is a 'recommends' field for this kind
> of soft dependency. Where do we stand with declaring dependencies in
> setup.py so that if using a package manager like pip this it less painful?
>
> In fact, how many 'soft' dependencies like this do we already have?
> Just from a quick look at the README file many are not mentioned
> under the current 'System Requirements' text (e.g. Network X).
>

I just used "git grep import Bio/" to find out. The only egregious
undocumented dependencies are the ones I added in Phylo for graphics:
networkx and matplotlib/pylab.

Other *possible* dependencies are sqlite3 in the case of Jython
(Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k).

Should we add these to the "install_recommends" list in setup.py?


> > Tiago
> > PS - Another issue that it would be interesting see cleared-up would be
> the
> > policy on compile time (linkage) dependencies. Are new ones encouraged?
>
> Currently discouraged. They make installation much more painful, and
> have tended to be left untested, e.g. mmCIF was for many years disabled
> by default because no one could work out how to detect its requirements
> at compile time.
>
> > What about Java/Jython based?
>
> I'm not so keen on something providing Java/Jython only functionality.
> However, something where we could require library X under Jython
> while using library Y under C Python makes sense. Database access
> would be a perfect example - things like Python's sqlite3 don't yet exist
> under Jython.
>
> Peter
>


From tiagoantao at gmail.com  Sun Nov  4 20:49:33 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 20:49:33 +0000
Subject: [Biopython-dev] Jython DB
Message-ID: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>

Howdy,


On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Retitling thread
>

Again ;)


> while using library Y under C Python makes sense. Database access
> would be a perfect example - things like Python's sqlite3 don't yet exist
> under Jython.
>
>
I noticed that there is 1 reference to sqlite3:
Bio.SeqIO._index

Other stuff on BioSQL is just really related to database configuration and
does not impair functionality (exception to a test case that really depends
on sqlite3).

I suppose that a "default" DB with Jython would probably be JavaDB (aka
Apache Derby)? It is available as a default on the Sun/Oracle JDK (though
not the JRE).

I could go ahead and have a try at evaluating the portability costs for
sqlite3->javadb. In theory it should be easy (
http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html)

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Sun Nov  4 20:49:58 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 20:49:58 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
Message-ID: <CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>

On Sunday, November 4, 2012, Eric Talevich wrote:

> On Sun, Nov 4, 2012 at 9:01 AM, Peter Cock <p.j.a.cock at googlemail.com<javascript:_e({}, 'cvml', 'p.j.a.cock at googlemail.com');>
> > wrote:
>
>> Retitling thread
>>
>> On Sun, Nov 4, 2012 at 1:09 PM, Tiago Ant?o <tiagoantao at gmail.com<javascript:_e({}, 'cvml', 'tiagoantao at gmail.com');>>
>> wrote:
>> > Hi,
>> >
>> >
>> > On Fri, Nov 2, 2012 at 4:01 PM, Michiel de Hoon <mjldehoon at yahoo.com<javascript:_e({}, 'cvml', 'mjldehoon at yahoo.com');>>
>> wrote:
>> >>
>> >> Already I feel that we need to install too many packages to get going
>> with
>> >> Python in bioinformatics (Python itself, NumPy, Matplotlib and its
>> >> dependencies, Pysam, Cython (needed to compile Pysam), ezsetup, perhaps
>> >> SciPy, Biopython). I find this hard to explain to people new to
>> >> bioinformatics or new to Python. So I would prefer to keep one
>> distribution.
>> >>
>> >> We can be more lenient in terms of dependencies, especially those that
>> >> don't occur at compile time.
>> >>
>> >
>> > One of the things that I always found lacking with biopython is a clear,
>> > consistent policy on dependencies:
>>
>> It would be good to have something written down, just as we
>> did with the deprecation policy.
>>
>
> Should we start a page for this on the wiki?
>
>
The wiki is online again now :)

Maybe agree a draft by email first?


> > Depending on the mood of the day it could be either good/bad
>> > to add a library dependency. As an example, this ended up
>> > with there being a dependency on reportlab, but not on scipy.
>>
>> The ReportLab dependency is a 'run time only' dependency and
>> has been in Biopython for a very long time. You'd have to remind
>> me if there was any compile time issue with scipy, but my
>> recollection was we were loath to add a dependency on scipy
>> (which is quite a complex library to install if not using a package)
>> for just one or two functions - however you were planning something
>> more substantial in the PopGen code which would justify it (using
>> lots of statistics).
>>
>> > Whatever the policy, I think that is should be consistent all across.
>> > Preferably simple to both users and developers.
>> >
>> > A few ideas on policy:
>> >
>> > 1. I totally agree with the the idea of being as lenient as possible
>> with
>> > dependencies (as you say, especially with those that do not occur at
>> > compile time).
>> > 2. Biopython belongs to a certain software ecology. I think it would
>> make
>> > sense to see as natural adding dependencies on well established python
>> > libraries.
>> > 3. (1+2) If a developer wants to add a dependency on a package, that
>> should
>> > not be a major problem (as long as the package is maintained for
>> long/well
>> > known/stable). Users should only have to deal with the dependency if
>> they
>> > need the functionality that depends on that package.
>> >
>> > Python being a dynamic language, there does not have to be a burden on
>> > users/developers if a remote part of Biopython depends on something more
>> > exotic (which most users/developers will never see/install in any case).
>> > Again by "exotic" I mean well known libraries with a track record of
>> years
>> > of stability.
>>
>> That all sounds reasonable. It is compile time dependencies that I am
>> most wary of.
>>
>
> Pure-Python dependencies seem less scary -- a package like PLY should work
> on any Python, PyPy, Jython, and Google App Engine. Unfortunately, the
> dependencies that are most tempting are the ones with essential C
> extensions (numpy, scipy, matplotlib).
>

But (for example) matplotlib wouldn't be a build time dependency
for us.


> However, from an end user perspective having installed Biopython and
>> then trying a script from a colleague and only then finding 101 optional
>> run time dependencies are also needed would be annoying.
>>
>> For Linux packages like Debian there is a 'recommends' field for this kind
>> of soft dependency. Where do we stand with declaring dependencies in
>> setup.py so that if using a package manager like pip this it less painful?
>>
>> In fact, how many 'soft' dependencies like this do we already have?
>> Just from a quick look at the README file many are not mentioned
>> under the current 'System Requirements' text (e.g. Network X).
>>
>
> I just used "git grep import Bio/" to find out. The only egregious
> undocumented dependencies are the ones I added in Phylo for graphics:
> networkx and matplotlib/pylab.
>

Could you add those to the README file then?


> Other *possible* dependencies are sqlite3 in the case of Jython
>
(Bio.SeqIO._index) and ordereddict for Pythons earlier than 2.7 (Bio._py3k).
>
> Should we add these to the "install_recommends" list in setup.py?
>

No, they are in the standard lib on C Python, except in the case
of OrderedDict on older Pythons were we bundle a backport
anyway.

Jython has an open bug on including the sqlite3 module,
and might be worth mentioning under a new Jython
specific section of the README.

Peter


From tiagoantao at gmail.com  Sun Nov  4 21:00:10 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 4 Nov 2012 21:00:10 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
	<CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
Message-ID: <CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>

On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Jython has an open bug on including the sqlite3 module,
>
>
This will go nowhere fast as it will be dependent on a JNI library (i.e.
linkage of C code).
The only durable option in the Java space would be a native implementation
of sqlite3.
All other options are not of the "embeddable" type (e.g. JDBC driver to
something running outside), defeating the main purpose of sqlite3.

To sum it up: I doubt that sqlite3 will be a realistic solution in the
Jython space. As per previous email, I suspect that a Python DBI to JDBC
bridge (bundled with Jython by default) + a default database (javadb/derby
or H2 or HSQLDB) is probably more realistic in the Java space.

On the OracleJDK javadb will require 0 dependencies. On other JDK or a JRE,
Apache derby.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Sun Nov  4 21:47:20 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 21:47:20 +0000
Subject: [Biopython-dev] Jython DB
In-Reply-To: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>
References: <CAA9RGENcDf3zTtWW2NfWPC7FK9PAiAsh_LW4PxkLkCdmTXrrWg@mail.gmail.com>
Message-ID: <CAKVJ-_7tQ=4YgUHosXte0nED8pg_QzSey4pOJOeB+Dw6bBW65Q@mail.gmail.com>

Hi Tiago,

On Sun, Nov 4, 2012 at 8:49 PM, Tiago Ant?o wrote:
> Howdy,
>
> On Sun, Nov 4, 2012 at 2:01 PM, Peter Cock wrote:
>>
>> Retitling thread
>
>
> Again ;)
>
>
>>
>> while using library Y under C Python makes sense. Database access
>> would be a perfect example - things like Python's sqlite3 don't yet exist
>> under Jython.
>>
>
> I noticed that there is 1 reference to sqlite3:
> Bio.SeqIO._index
>
> Other stuff on BioSQL is just really related to database configuration and
> does not impair functionality (exception to a test case that really depends
> on sqlite3).
>
> I suppose that a "default" DB with Jython would probably be JavaDB (aka
> Apache Derby)? It is available as a default on the Sun/Oracle JDK (though
> not the JRE).
>
> I could go ahead and have a try at evaluating the portability costs for
> sqlite3->javadb. In theory it should be easy
> (http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html)

The database stuff in Biopython currently is BioSQL (which under
C Python supports a MySQL, PostgreSQL or SQLite back end)
and things like SeqIO.index which use SQLite3 directly. None of
this currently works under Jython :(

I was hoping Jython would implement an sqlite3 module which we
(and any other Python library) could just use - there seems to be
no progress on that: http://bugs.jython.org/issue1682864

Likewise the MySQLdb and PostgreSQL modules. Failing a port
allowing our current code to "just work", someone could write
alternative code for Biopython to all an appropriate Java DB
interface directly. For our BioSQL we already have a structure
to cope with a range of backends, so this should be quite clean.

In the case of Bio.SeqIO.index_db, we probably only use a fraction
of the full sqlite3 module's capabilities, so special casing this
under Jython to call JavaDB might not be too complicated...
(for anyone who knows there way round Jython and JavaDB)?

If you fancy exploring SQLite3 under Jython, go for it :)

Peter


From p.j.a.cock at googlemail.com  Sun Nov  4 21:48:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 4 Nov 2012 21:48:56 +0000
Subject: [Biopython-dev] Dependency policy;
	was PEP8 lower case module names?
In-Reply-To: <CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>
References: <CAKVJ-_5BSJvdD=oADYMZBzHAr3i6PK9u+dXuk3TLMdJVKHkEMw@mail.gmail.com>
	<CAMC681k3Bweg6_KcCJLtLHn16ZO7Y-cGzPbtaKJtez3EN1qh8Q@mail.gmail.com>
	<CAKVJ-_7o1cnM2US5ZRB4C=bJ_TXoONutCEs1d7ehdpm_W0aX6w@mail.gmail.com>
	<CAA9RGEPwo2-az4qoL-XjAbh=cj42YABAN6QyibcWZoQWbpMk5w@mail.gmail.com>
Message-ID: <CAKVJ-_6kFvc7BOHJztBkAoQp3AMTkisLsG3OHpob3k8EmqGA=g@mail.gmail.com>

On Sun, Nov 4, 2012 at 9:00 PM, Tiago Ant?o wrote:
> On Sun, Nov 4, 2012 at 8:49 PM, Peter Cock wrote:
>>
>> Jython has an open bug on including the sqlite3 module,
>>
>
> This will go nowhere fast as it will be dependent on a JNI library (i.e.
> linkage of C code).
> The only durable option in the Java space would be a native implementation
> of sqlite3.
> All other options are not of the "embeddable" type (e.g. JDBC driver to
> something running outside), defeating the main purpose of sqlite3.

Let's continue this on the new thread:
http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010072.html

Peter


From redmine at redmine.open-bio.org  Sun Nov  4 22:47:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3392.20121104224721@redmine.open-bio.org>


Issue #3392 has been reported by Brad Zoltick.

----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 22:47:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3392.20121104224721@redmine.open-bio.org>


Issue #3392 has been reported by Brad Zoltick.

----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 22:47:23 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3393.20121104224722@redmine.open-bio.org>


Issue #3393 has been reported by Brad Zoltick.

----------------------------------------
Bug #3393: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3393

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sun Nov  4 22:47:22 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sun, 4 Nov 2012 22:47:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3393] (New) unable to download
	almost any documentation - the download links are invalid
Message-ID: <redmine.issue-3393.20121104224722@redmine.open-bio.org>


Issue #3393 has been reported by Brad Zoltick.

----------------------------------------
Bug #3393: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3393

Author: Brad Zoltick
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Documentation
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Nov  5 00:06:10 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 5 Nov 2012 00:06:10 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] unable to download almost
	any documentation - the download links are invalid
References: <redmine.issue-3392.20121104224721@redmine.open-bio.org>
Message-ID: <redmine.journal-14994.20121105000610@redmine.open-bio.org>


Issue #3392 has been updated by Peter Cock.

Category changed from Documentation to Website
Priority changed from Normal to Urgent

Yep, we know about it - but thanks for letting us know just in case:
http://lists.open-bio.org/pipermail/biopython-dev/2012-November/010069.html

The same issue affects our release downloads too which is more annoying. Its a side effect during server migration from a dying machine to a virtual machine on the Amazon Cloud,
http://lists.open-bio.org/pipermail/biopython/2012-November/008248.html

Leaving this bug open until the new server is fixed...
----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: New
Priority: Urgent
Assignee: Biopython Dev Mailing List
Category: Website
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Nov  5 23:07:09 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 5 Nov 2012 23:07:09 +0000
Subject: [Biopython-dev] OBF server outage announcement / call for
	SysAdmin volunteers
In-Reply-To: <CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>
References: <CAKVJ-_56TAQR4ULW=tviSrzYvjRaJBmoLdWFDT9UG3LqeM2EJA@mail.gmail.com>
	<CAKVJ-_4xgdurC8y54R7LFPPSvEqdTrY9gNnv7kmNGnaFpDmPCA@mail.gmail.com>
	<CAKVJ-_7_bnatVHfsKrNOoqjU+D8qbf05En9aFbi-4xXEpFFpSA@mail.gmail.com>
Message-ID: <CAKVJ-_6_6DYmm350QvanWNdA7hZyVqiJW0p6w8J1eww8D1xumQ@mail.gmail.com>

On Thu, Nov 1, 2012 at 7:50 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 1, 2012 at 7:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> FYI regarding the Biopython website and recent mailing list outage.
>>
>> Peter
>>
>> PS you also keep an eye on @Biopython and @OBF_news on Twitter,
>> which are a useful alternative when the mailing lists are down.
>>
>> <snip>
>
> I should have added that while the wiki is down (which does
> unfortunately include the Biopython home page), the Biopython
> downloads remain available via http://biopython.org/DIST/ and
> other 'static' content like the Tutorial and API pages are up:
>
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
> http://biopython.org/DIST/docs/api/

Hosting of biopython.org (and the bioperl.org and open-bio.org
websites) was transferred to an Amazon cloud machine over
the weekend, which fixed the wiki but temporarily disabled the
static pages (like the Tutorial and downloads). Those should
all be working again now.

At some later date (to be announced) the server running the
OBF mailing lists will be transferred, which would make the
mailing lists unavailable for a short period.

Regards,

Peter


From redmine at redmine.open-bio.org  Mon Nov  5 23:13:43 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 5 Nov 2012 23:13:43 +0000
Subject: [Biopython-dev] [Biopython - Bug #3392] (Resolved) unable to
	download almost any documentation - the download links are invalid
References: <redmine.issue-3392.20121104224721@redmine.open-bio.org>
Message-ID: <redmine.journal-14995.20121105231343@redmine.open-bio.org>


Issue #3392 has been updated by Peter Cock.

Status changed from New to Resolved
% Done changed from 0 to 100

This should be working again now :)
----------------------------------------
Bug #3392: unable to download almost any documentation - the download links are invalid
https://redmine.open-bio.org/issues/3392

Author: Brad Zoltick
Status: Resolved
Priority: Urgent
Assignee: Biopython Dev Mailing List
Category: Website
Target version: Not Applicable
URL: 


People probably are not aware of this problem. When you try to download the biopython documentation, you get the following response:

Forbidden

You don't have permission to access /DIST/docs/tutorial/Tutorial.pdf on this server.

Apache/2.2.23 (Amazon) Server at biopython.org Port 80


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From kai.blin at biotech.uni-tuebingen.de  Mon Nov 19 14:11:42 2012
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Mon, 19 Nov 2012 15:11:42 +0100
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails
 when coordinates are outside of the parent_sequence.
Message-ID: <50AA3E1E.70407@biotech.uni-tuebingen.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi folks,

I'm currently investigating an error caused by an invalid GenBank file
input that annotates CDS features with invalid coordinates. The
GenBank parser accepts these features, but later my program crashes.

It turns out the crash is because I'm calling the extract() method for
my seq features, which then return an empty Seq object for
out-of-range parent_sequence.

I have the feeling that raising an exception would be the best way of
dealing with this, but of course I can also check the result of
extract() to be different from an empty Seq object.

The line I'd like to throw a ValueError on out-of-bounds coordinates is
https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811

What are your thoughts on this?

Cheers,
Kai

- -- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQqj4eAAoJEKM5lwBiwTTP7rsIANURFpsEtHOIgJ1z3r6nV3mQ
rI0Vo0fBh59beZA0NYi2rMez+TUFXf87Ih3b9LGIH4xaFsAwpXJrUjvbqC1tuqBv
KFg65psNCnDlp9Pc4DZQnaAS7ycoDrDiJStV387XWE6CA7dTiCkBUfKwuaf7S/om
m1je0XMJ6j6J5+Jn2qW/QMpf2G9e8lAkZyeNIQyYtGF+RbPkBPSxpZFTEn6KsymT
dOLoCQVhlf1R9X0S+nLBAh9Q29akf6/tkUcqdUg5ROoNqvqjudDWbz0JgoTgsf7n
j24rlTIpxktl3KKna6DtoX5ig4EKF5IOnQmo00JrWWL8Liy0oKTY/LRkF5CB85k=
=djFF
-----END PGP SIGNATURE-----


From p.j.a.cock at googlemail.com  Mon Nov 19 16:10:15 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 Nov 2012 16:10:15 +0000
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently
 fails when coordinates are outside of the parent_sequence.
In-Reply-To: <50AA3E1E.70407@biotech.uni-tuebingen.de>
References: <50AA3E1E.70407@biotech.uni-tuebingen.de>
Message-ID: <CAKVJ-_5PcJ_GC=YbyG70+HSXrMoeqs8ZxUn3-wKU=uKqXKxm6w@mail.gmail.com>

On Mon, Nov 19, 2012 at 2:11 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi folks,
>
> I'm currently investigating an error caused by an invalid GenBank file
> input that annotates CDS features with invalid coordinates. The
> GenBank parser accepts these features, but later my program crashes.

Perhaps we should have a parser error/warning at that point?
(as well as any fix to the extract method)

> It turns out the crash is because I'm calling the extract() method for
> my seq features, which then return an empty Seq object for
> out-of-range parent_sequence.
>
> I have the feeling that raising an exception would be the best way of
> dealing with this, but of course I can also check the result of
> extract() to be different from an empty Seq object.
>
> The line I'd like to throw a ValueError on out-of-bounds coordinates is
> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>
> What are your thoughts on this?

Some might find this surprising given the (initially rather odd)
Python slicing behviour with out of range coordindates (which
indirectly cause the behaviour ovserved here):

>>> "hello"[100:200]
''

i.e. Slicing a string outside its bounds gives an empty string.

On balance you're probably right that an error in this situation
makes more sense (a discrepancy between feature location
and the given parent sequence not being long enough).

Peter


From p.j.a.cock at googlemail.com  Mon Nov 19 16:32:11 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 19 Nov 2012 16:32:11 +0000
Subject: [Biopython-dev] SeqFeature.FeatureLocation.extract() silently
 fails when coordinates are outside of the parent_sequence.
In-Reply-To: <8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com>
References: <50AA3E1E.70407@biotech.uni-tuebingen.de>
	<CAKVJ-_5PcJ_GC=YbyG70+HSXrMoeqs8ZxUn3-wKU=uKqXKxm6w@mail.gmail.com>
	<8045681f-e3ca-470c-973d-89b5fcc6d259@email.android.com>
Message-ID: <CAKVJ-_56RjCYF=bq3Jq_xCnWuEaD-_kEAC66CQV8Fy-9Lai2xw@mail.gmail.com>

On Mon, Nov 19, 2012 at 4:25 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
>>> GenBank parser accepts these features, but later my program crashes.
>>
>>Perhaps we should have a parser error/warning at that point?
>>(as well as any fix to the extract method)
>
> Probably a bit tricky because the GenBank file might not contain a
> sequence at all, and we can't tell until we either see the sequence or
> an end of record marker.

The first line should tell you the length, and we already have
a warning in place for naughty GenBank files where the actual
sequence has a different length. Those could be a problem for
this new warning, as you'd only know the expected sequence
length from the header while parsing the features.

>>> I have the feeling that raising an exception would be the best way
>>> of dealing with this, but of course I can also check the result
>>> of extract() to be different from an empty Seq object.
>>>
>>> The line I'd like to throw a ValueError on out-of-bounds coordinates
>>> is
>>>
>>> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>>>
>>> What are your thoughts on this?
>>
>>Some might find this surprising given the (initially rather odd)
>>Python slicing behviour with out of range coordindates (which
>>indirectly cause the behaviour ovserved here):
>>
>>>>> "hello"[100:200]
>>''
>>
>>i.e. Slicing a string outside its bounds gives an empty string.
>
> Yes, that is why we end up with an empty Seq object.
>
>>On balance you're probably right that an error in this situation
>>makes more sense (a discrepancy between feature location
>>and the given parent sequence not being long enough).
>
> Yes. The way I understand the intention of the parent sequence,
> the whole point is that the feature should be located on it.
>
> I'll gladly prepare a patch (and some test).
> Cheers,
>  Kai

OK.

Peter


From redmine at redmine.open-bio.org  Tue Nov 20 13:41:47 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 13:41:47 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie
	implementation can't load large data sets
Message-ID: <redmine.issue-3395.20121120134147@redmine.open-bio.org>


Issue #3395 has been reported by Micha? Nowotka.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 13:41:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 13:41:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] (New) Biopython trie
	implementation can't load large data sets
Message-ID: <redmine.issue-3395.20121120134147@redmine.open-bio.org>


Issue #3395 has been reported by Micha? Nowotka.

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 14:02:01 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 14:02:01 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15009.20121120140201@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Can you try the same test case without gzip? i.e. Can you load /tmp/trie.dat rather than /tmp/trie.dat.gz?

Also I would try explicitly opening the files in binary mode.

P.S. Which OS, which version of Python, which version of Biopython?
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 14:18:46 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 14:18:46 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15010.20121120141846@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


Sure, I'll update this issue as soon as I check that.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 16:31:13 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 16:31:13 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15011.20121120163113@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


OK, I tried using standard python file handler with explicit binary mode and it also failed. The file is now 165.5MB.
I also tried bz2 and zip compression, without any luck...
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 17:02:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:02:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15012.20121120170248@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Well that is progress - it means this isn't a problem coming from reading a compressed file on disk - you've made the test case simpler. Can you actually share a self contained example script? If not, I suggest you try halving the dataset (only record the first half of the tries), and retest. Then repeat - this should tell you if the problem is as you suspect a large dataset, or something specific about a special value.

Alternatively can you share the (compressed) file? I could at least check if it fails the same way here, and perhaps add some debugging code to get more information.

The error message itself is coming from some C code, which hasn't changed for some time:
https://github.com/biopython/biopython/blob/master/Bio/triemodule.c

The error itself is likely triggered in function _deserialize_transition in trie.c:
https://github.com/biopython/biopython/blob/master/Bio/triemodule.c

You still haven't told us the important information of which OS, which version of Python, which version of Biopython. Given it is C code, I'd also like to know how Biopython was installed (e.g. did you compile it from source yourself).
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 17:14:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:14:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15013.20121120171421@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I'm using Ubuntu 12.04 LTS, Biopython 1.6 and Python 2.7.3.
Can you tell me where should I place compressed file?

----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 17:21:58 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:21:58 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15014.20121120172158@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.


Sadly RedMine is limited to 5MB attachments. You could use DropBox or something similar, or if you have your own server put the file online temporarily for me to download it?

You probably have Biopython 1.60 (one dot sixty), there was no Biopython 1.6, one dot six. Did you install Biopython using the Ubuntu package manager? i.e. the GUI tool, or at the command line with something like 'apt-get install biopython'?
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 17:43:21 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:43:21 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15015.20121120174321@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I put the file here: http://mnowotka.kei.pl/trie.4.dat.gz
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 20 17:56:47 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 20 Nov 2012 17:56:47 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15016.20121120175647@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I confirm, it's 1.60 version, I'm using. I installed it either by apt-get install or pip.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Nov 26 13:29:58 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 13:29:58 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
Message-ID: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>

On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> >
>> > Peter;
>> >
>> >> In the case of Bow's SearchIO code, what would you prefer?
>> >> e.g. Bio.SearchIO as it is now on his branch?
>> >
>> > I like plain ol' Search the best but don't have a strong preference. I'm
>> > terrible at naming things so trust everyone's judgment on this.
>> >
>> > Brad
>>
>> Since we have no clear consensus, I propose we add Bow's code
>> as Bio.SearchIO (which is how it is written right now), with the new
>> BiopythonExperimentalWarning in place (to alert people that it may
>> change in the next release). We can then rename or move it at a
>> later date. This will make it easier for people to test the code, and
>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>
>> If we and when we agree a consolidation of the Bio.SeqXXX
>> modules, then Bio.SearchIO could move too. If this happens
>> before any public release as Bio.SearchIO so much the better.
>>
>> Adopting lower case module names under Python 3 is also a
>> separate issue.
>>
>> Peter
>>
>
> +1
>
> Regarding ...

I plan to do the commit today, barring any last minute objections.

I am leaning towards a merge from Bow's original (un-rebased) branch,
which had only three trivial conflicts to handle.

Peter


From w.arindrarto at gmail.com  Mon Nov 26 13:38:23 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 14:38:23 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
Message-ID: <CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>

Hi Peter and everyone,

If it helps, I've done the rebase (also resolving the three conflicts)
with the latest master branch. On top of it, I've also added the new
BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's
available here: https://github.com/bow/biopython/tree/searchio.

However if you're interested in inspecting the non-rebased branch,
I've also kept it here:
https://github.com/bow/biopython/tree/searchio-nonrebased. Note that
this one doesn't have the new experimental warning since it's a
feature added more recently.

Also, in both branches, the tutorial has been changed with the
addition of the (draft) Bio.SearchIO tutorial.

Let me know which one you prefer and I'll submit a pull request :).

cheers,
Bow

On Mon, Nov 26, 2012 at 2:29 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 1, 2012 at 6:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>> On Tue, Oct 30, 2012 at 7:03 AM, Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>>>
>>> On Mon, Oct 29, 2012 at 5:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> >
>>> > Peter;
>>> >
>>> >> In the case of Bow's SearchIO code, what would you prefer?
>>> >> e.g. Bio.SearchIO as it is now on his branch?
>>> >
>>> > I like plain ol' Search the best but don't have a strong preference. I'm
>>> > terrible at naming things so trust everyone's judgment on this.
>>> >
>>> > Brad
>>>
>>> Since we have no clear consensus, I propose we add Bow's code
>>> as Bio.SearchIO (which is how it is written right now), with the new
>>> BiopythonExperimentalWarning in place (to alert people that it may
>>> change in the next release). We can then rename or move it at a
>>> later date. This will make it easier for people to test the code, and
>>> also suggest further changes or additions (e.g. Kai's HMMER work).
>>>
>>> If we and when we agree a consolidation of the Bio.SeqXXX
>>> modules, then Bio.SearchIO could move too. If this happens
>>> before any public release as Bio.SearchIO so much the better.
>>>
>>> Adopting lower case module names under Python 3 is also a
>>> separate issue.
>>>
>>> Peter
>>>
>>
>> +1
>>
>> Regarding ...
>
> I plan to do the commit today, barring any last minute objections.
>
> I am leaning towards a merge from Bow's original (un-rebased) branch,
> which had only three trivial conflicts to handle.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Mon Nov 26 13:49:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 13:49:44 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
Message-ID: <CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>

On Mon, Nov 26, 2012 at 1:38 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter and everyone,
>
> If it helps, I've done the rebase (also resolving the three conflicts)
> with the latest master branch. On top of it, I've also added the new
> BiopythonExperimentalWarning in Bio.SearchIO.__init__.py. It's
> available here: https://github.com/bow/biopython/tree/searchio.
>
> However if you're interested in inspecting the non-rebased branch,
> I've also kept it here:
> https://github.com/bow/biopython/tree/searchio-nonrebased. Note that
> this one doesn't have the new experimental warning since it's a
> feature added more recently.
>
> Also, in both branches, the tutorial has been changed with the
> addition of the (draft) Bio.SearchIO tutorial.
>
> Let me know which one you prefer and I'll submit a pull request :).
>
> cheers,
> Bow

That's fine - I found both branches :)

I've actually done a trial merge on the non-rebased one and
then cherry-picked the experimental warning - looks good.

Once that's done there is some housekeeping to do, like
the indexing code duplication with Bio.SeqIO, and tackling
indexing BGZF compressed files with Bio.SearchIO which
I will have a go at.

Peter

P.S. I had intended to do this earlier this month, but we
had the OBF server issues to deal with.


From w.arindrarto at gmail.com  Mon Nov 26 14:06:03 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 15:06:03 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
Message-ID: <CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>

> That's fine - I found both branches :)
>
> I've actually done a trial merge on the non-rebased one and
> then cherry-picked the experimental warning - looks good.

Ah, good then :).

> Once that's done there is some housekeeping to do, like
> the indexing code duplication with Bio.SeqIO, and tackling
> indexing BGZF compressed files with Bio.SearchIO which
> I will have a go at.

Yes. I'm pretty sure there will also be changes we need to implement
after more feedback from users.

> P.S. I had intended to do this earlier this month, but we
> had the OBF server issues to deal with.

That's ok, I also noticed that it's not until quite recently that the
commits become frequent again.


From mauriceling at gmail.com  Mon Nov 26 14:48:24 2012
From: mauriceling at gmail.com (Maurice Ling)
Date: Mon, 26 Nov 2012 08:48:24 -0600
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
Message-ID: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>

Hi

I am setting an error running this:

from Bio import Entrez
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline",
retmode="text")

The traceback is

Traceback (most recent call last):
  File "C:\Users\Maurice.Ling\Desktop\muscorian\archive\pubmed_dump.py",
line 16, in <module>
    retmode="text")
  File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 133, in
efetch
    keywords["id"] = ",".join(keywds["id"])
TypeError: sequence item 0: expected string, int found

When I changed line 133 of Bio.Entrez.__init__ from

keywords["id"] = ",".join(keywds["id"])

to

keywords["id"] = ",".join(str(keywds["id"]))

The error disappeared.

Maurice LING
mobile: +1(605)5920300, +6596669233
www: http://maurice.vodien.com
CV: http://maurice.vodien.com/maurice_resume.pdf
Linkedin: http://www.linkedin.com/in/mauriceling
ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling


From p.j.a.cock at googlemail.com  Mon Nov 26 14:57:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 14:57:28 +0000
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>
References: <CAFO915G3+T-H9aGJr=XPg-2DZTN-nbaPKJipr6Gg_ev5Usft5g@mail.gmail.com>
Message-ID: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>

On Mon, Nov 26, 2012 at 2:48 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Hi
>
> I am setting an error running this:
>
> from Bio import Entrez
> from Bio import Medline
> handle = Entrez.efetch(db="pubmed", id=[19300000], rettype="medline",
> retmode="text")
>

I would have used this:

Entrez.efetch(db="pubmed", id=["19300000"], rettype="medline", retmode="text")

In general the NCBI identifiers are arbitrary strings, although
perhaps the pubmed identifiers could be treated as integers.
This is perhaps worth changing in the Bio.Entrez code...

What do you think Michael?

Peter


From mauriceling at gmail.com  Mon Nov 26 15:23:31 2012
From: mauriceling at gmail.com (Maurice Ling)
Date: Mon, 26 Nov 2012 09:23:31 -0600
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
Message-ID: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>

Hi

I found something strange in my download script to pull a list of pubmed
citations. This was working in the past (back in 2008 period)...

The script is

ID_start = 19000000
ID_stop = 19000010
downtime = 1.2

from Bio import Entrez
from Bio import Medline
import string
import time
import cPickle

Entrez.email = 'maurice.ling at sdstate.edu'

while (ID_start < ID_stop):
    try:
        handle = Entrez.efetch(db="pubmed", id=[str(ID_start)],
rettype="medline",
                           retmode="text")
        records = list(Medline.parse(handle))[0]
        print records
        cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1)
        ID_start = ID_start + 1
        time.sleep(downtime)
        print 'ID count: ', str(ID_start)
    except:
        print 'ID count: error ', str(ID_start)
        ID_start = ID_start + 1

But the results from print records kept showing the same thing:

{'STAT': 'MEDLINE', 'IP': '2', 'JT': 'Biochemical medicine', 'DA':
'19760116', 'FAU': ['Makar, A B', 'McMartin, K E', 'Palese, M', 'Tephly, T
R'], 'DP': '1975 Jun', 'OWN': 'NLM', 'PT': ['Journal Article', "Research
Support, U.S. Gov't, P.H.S."], 'LA': ['eng'], 'CRDT': ['1975/06/01 00:00'],
'DCOM': '19760116', 'LR': '20091111', 'PG': '117-26', 'TI': 'Formate assay
in body fluids: application in methanol poisoning.', 'RN': ['0 (Formates)',
'124-38-9 (Carbon Dioxide)', '67-56-1 (Methanol)', 'EC 1.2.- (Aldehyde
Oxidoreductases)'], 'PL': 'UNITED STATES', 'TA': 'Biochem Med', 'JID':
'0151424', 'VI': '13', 'IS': '0006-2944 (Print) 0006-2944 (Linking)', 'AU':
['Makar AB', 'McMartin KE', 'Palese M', 'Tephly TR'], 'MHDA': '1975/06/01
00:01', 'MH': ['Aldehyde Oxidoreductases/metabolism', 'Animals', 'Body
Fluids/*analysis', 'Carbon Dioxide/blood', 'Formates/blood/*poisoning',
'Haplorhini', 'Humans', 'Hydrogen-Ion Concentration', 'Kinetics',
'Methanol/blood', 'Methods', 'Pseudomonas/enzymology'], 'EDAT':
'1975/06/01', 'SO': 'Biochem Med. 1975 Jun;13(2):117-26.', 'SB': 'IM',
'PMID': '1', 'PST': 'ppublish'}

It seems to keep efetching PMID 1 (http://www.ncbi.nlm.nih.gov/pubmed/1)

Any idea?

Thanks in advance.

Maurice LING
mobile: +1(605)5920300, +6596669233
www: http://maurice.vodien.com
CV: http://maurice.vodien.com/maurice_resume.pdf
Linkedin: http://www.linkedin.com/in/mauriceling
ResearchGate: https://www.researchgate.net/profile/Maurice_HT_Ling


From p.j.a.cock at googlemail.com  Mon Nov 26 15:36:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 15:36:13 +0000
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
In-Reply-To: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
References: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
Message-ID: <CAKVJ-_7SFdCN+hsEGEQ6b3tXGx8bDomEiRVvQDevspMVB-xmOw@mail.gmail.com>

On Mon, Nov 26, 2012 at 3:23 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Hi
>
> I found something strange in my download script to pull a list of pubmed
> citations. This was working in the past (back in 2008 period)...
>
> The script is
>
> ID_start = 19000000
> ID_stop = 19000010
> downtime = 1.2
>
> from Bio import Entrez
> from Bio import Medline
> import string
> import time
> import cPickle
>
> Entrez.email = 'maurice.ling at sdstate.edu'
>
> while (ID_start < ID_stop):
>     try:
>         handle = Entrez.efetch(db="pubmed", id=[str(ID_start)],
> rettype="medline",
>                            retmode="text")
>         records = list(Medline.parse(handle))[0]
>         print records
>         cPickle.dump(records, open(str(ID_start) + '.txt', 'w'), -1)
>         ID_start = ID_start + 1
>         time.sleep(downtime)
>         print 'ID count: ', str(ID_start)
>     except:
>         print 'ID count: error ', str(ID_start)
>         ID_start = ID_start + 1

Are you sure you didn't run something slightly different? The
simplest possibility would be a line accidentally setting
ID_start to equal 1, rather than increasing it.

Also, using a for loop would be much cleaner (with the identifiers
as either integers or as strings). For instance,

for identifier in range(19000000, 19000010):
   #Do stuff

Note you have a discrepancy with ID_stop vs ID_end

This seems to work for me:

ID_start = 19000000
ID_stop = 19000010
downtime = 1.2
from Bio import Entrez
from Bio import Medline
import string
import time
import cPickle
Entrez.email = 'maurice.ling at sdstate.edu'
for identifier in range(ID_start, ID_stop):
    identifier = str(identifier)
    try:
        handle = Entrez.efetch(db="pubmed", id=identifier,
                               rettype="medline", retmode="text")
        records = list(Medline.parse(handle))[0]
        print records
        cPickle.dump(records, open('%s.txt' % identifier, 'w'), -1)
    except Excpetion, error:
        print "Error for %s - %s" % (identifier, error)

However, rather than parsing the Medline records and saving
the pickled object, I would save the plain text Medline data itself.
That way you can use the files outside of Python (e.g. working at
the Unix command line with grep).

Peter


From p.j.a.cock at googlemail.com  Mon Nov 26 16:08:28 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:08:28 +0000
Subject: [Biopython-dev] Strange behaviour in efetching Pubmed citations
In-Reply-To: <CAFO915GmHKCRcupbxQAJK23TdbQsKJwwgp=LAcDntV3Ti2ummw@mail.gmail.com>
References: <CAFO915HPaokqncstUcVn6WDARBMx-M1m1ni19wEM=W6mB3DCPQ@mail.gmail.com>
	<CAKVJ-_7SFdCN+hsEGEQ6b3tXGx8bDomEiRVvQDevspMVB-xmOw@mail.gmail.com>
	<CAFO915GmHKCRcupbxQAJK23TdbQsKJwwgp=LAcDntV3Ti2ummw@mail.gmail.com>
Message-ID: <CAKVJ-_7WWtEAfmCGejhzg7Xxg99_8jY6G-erjc+8gEoU0_RSXQ@mail.gmail.com>

On Mon, Nov 26, 2012 at 3:42 PM, Maurice Ling <mauriceling at gmail.com> wrote:
> Thanks Peter
>
> Now, that seems to work... still scratching my uncaffeinated head though....
>

Great. I'm sure a coffee will help :)

Peter

P.S. Next time could you use the main list for usage queries, rather
than the development list, biopython-dev - thanks!


From p.j.a.cock at googlemail.com  Mon Nov 26 16:46:44 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:46:44 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
Message-ID: <CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>

On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
>> That's fine - I found both branches :)
>>
>> I've actually done a trial merge on the non-rebased one and
>> then cherry-picked the experimental warning - looks good.
>
> Ah, good then :).

Done,
https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513

>> Once that's done there is some housekeeping to do, like
>> the indexing code duplication with Bio.SeqIO, and tackling
>> indexing BGZF compressed files with Bio.SearchIO which
>> I will have a go at.
>
> Yes.

Started, it seems the two _index.py files have diverged a
little more than I'd expected:
https://github.com/biopython/biopython/commit/ad1786b99afd2a50248246d877ff00a53949546b

>> P.S. I had intended to do this earlier this month, but we
>> had the OBF server issues to deal with.
>
> That's ok, I also noticed that it's not until quite recently that the
> commits become frequent again.

Christian Brueffer deserves some of the credit for the recent
burst of commits - he's been very busy sending pull requests!

Peter


From p.j.a.cock at googlemail.com  Mon Nov 26 16:55:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 16:55:32 +0000
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
Message-ID: <CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>

On Mon, Nov 26, 2012 at 4:46 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Mon, Nov 26, 2012 at 2:06 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>>> That's fine - I found both branches :)
>>>
>>> I've actually done a trial merge on the non-rebased one and
>>> then cherry-picked the experimental warning - looks good.
>>
>> Ah, good then :).
>
> Done,
> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513

I've put a short note in the NEWS file,
https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7

Congratulations Bow :)

I guess this would be a good excuse for you to write another blog post ;)

Speaking of which, unless we expect to release Biopython 1.61
soon, we should probably have something on the news blog too
(which reminds me I was supposed to co-ordinate a general
OBF GSoC 2012 post). Maybe I will manage that will on leave
in December?

Regards,

Peter


From w.arindrarto at gmail.com  Mon Nov 26 17:05:43 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Mon, 26 Nov 2012 18:05:43 +0100
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
	<CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
Message-ID: <CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>

>>>> That's fine - I found both branches :)
>>>>
>>>> I've actually done a trial merge on the non-rebased one and
>>>> then cherry-picked the experimental warning - looks good.
>>>
>>> Ah, good then :).
>>
>> Done,
>> https://github.com/biopython/biopython/commit/9f6e810cc68dd1e353d899772fda3053d9f49513
>
> I've put a short note in the NEWS file,
> https://github.com/biopython/biopython/commit/43f7d4467dd56e67a7ad475e5ff3bf3d4f31d1d7
>
> Congratulations Bow :)

Thank you :D! It feels great to see the code in master.

> I guess this would be a good excuse for you to write another blog post ;)

It is, and one should come up in the next couple of days :).

Now I'm anxiously waiting for the next Biopython release ~ and the
submodule's 'final' form after more feedback ;).

cheers,
Bow


From p.j.a.cock at googlemail.com  Mon Nov 26 17:22:00 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 26 Nov 2012 17:22:00 +0000
Subject: [Biopython-dev] [GSoC] GSoC python variant final update
In-Reply-To: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
References: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
Message-ID: <CAKVJ-_6ZqitFQh-FEr6gqA0bp5VNO9MEUHjDy9NWEERREMWq6g@mail.gmail.com>

On Mon, Aug 20, 2012 at 5:22 AM, Lenna Peterson <arklenna at gmail.com> wrote:
> Post: http://arklenna.tumblr.com/post/29808300789/
>
> The coordinate mapper, with updated documentation, is now located on
> this branch: https://github.com/lennax/biopython/tree/f_loc4
> It awaits the merging of Peter's f_loc4 branch.
>
> I've written an entry on coordinate mapping for the Cookbook:
> http://biopython.org/wiki/Coordinate_mapping

Hi Lenna,

Do you need my f_loc4 branch for the main GSoC variants work,
or just the coordinate mapper?

Thanks,

Peter


From chapmanb at 50mail.com  Mon Nov 26 20:18:09 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 26 Nov 2012 15:18:09 -0500
Subject: [Biopython-dev] SearchIO, was: PEP8 lower case module names?
In-Reply-To: <CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>
References: <CAKVJ-_7NpV0aveBJfut1Mn5xh=gxzJ2qPFr3UnCBRt589SwQVA@mail.gmail.com>
	<CADEGkF6LkJT2WrDrKSevozSi=FYL_iZcDMMx3QdnBE5FLM=szw@mail.gmail.com>
	<CAKVJ-_6zdSOq9JFtvKNx7NynBjX-m1p6iw4Pfd02LcWsdp+tig@mail.gmail.com>
	<CADEGkF7GerjkkuN1FamM8MNVp+h=-10dDeY9d4UkZ6ise4t9+Q@mail.gmail.com>
	<CAKVJ-_7v1XaqfbwUN-Juu6-26AwfHO1697haZV5Pw-hdK7wTrA@mail.gmail.com>
	<CAKVJ-_7Ca8fQkQ_EdgCM7A+bo4EwC0D3eHy74DDK46yGjfaLDg@mail.gmail.com>
	<CADEGkF6oL-_Zi_yVxPWb6EYHRGnbk31EHmfu3=kS+1PqsOn6RA@mail.gmail.com>
Message-ID: <87vccs15ku.fsf@fastmail.fm>


Bow and Peter;

>> Congratulations Bow :)
>
> Thank you :D! It feels great to see the code in master.

Awesome, nice work on this project and congratulations on getting it
integrated. It's great to see this go in,
Brad


From p.j.a.cock at googlemail.com  Tue Nov 27 09:35:46 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 27 Nov 2012 09:35:46 +0000
Subject: [Biopython-dev] Minor buildbot issues from SearchIO
Message-ID: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>

Hi all,

The BuildBot flagged two new issues overnight,
http://testing.open-bio.org/biopython/tgrid

Python 2.5 on Windows - doctests are failing due to floating point decimal place
differences in the exponent (down to C library differences, something fixed in
later Python releases). Perhaps a Python 2.5 hack is the way to go here?
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio

Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity.
Perhaps there is some encoding setting needed under Python 3 for the BLAST
XML files?
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio

There is a separate cross-platform issue on Python 3.1, "TypeError:
invalid event tuple"
again with XML parsing. Curiously this had started a few days back in
the UniprotIO
tests on one machine, pre-dating the SearchIO merge. I'm not sure what
triggered it.
http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767
http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio
http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio

(Note TravisCI doesn't officially support Python 3.1, although until recently
they did offer it unofficially - Python 3.3 support is happening soon through).

Peter


From diego_zea at yahoo.com.ar  Tue Nov 27 14:25:48 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Tue, 27 Nov 2012 06:25:48 -0800 (PST)
Subject: [Biopython-dev] Numpy/Scipy and Biopython
Message-ID: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>

Hi!!!
This is my firts mail in the list.
I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project.
I did this post in Stackoverflow, and I want to share my question to all of you ;)

http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
Best wishes,

?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}


From anaryin at gmail.com  Tue Nov 27 15:40:58 2012
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 27 Nov 2012 16:40:58 +0100
Subject: [Biopython-dev] Numpy/Scipy and Biopython
In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Message-ID: <CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>

Hi Diego,

Nice post and nice ideas. As for Bio.PDB, indeed representing the entire
structure as a Nx3 matrix of coordinates is super attractive, but would
require a deep change in the current framework. Also, manipulation of the
structure (removing atoms, adding atoms, etc) would become a bit more
complicated.. If you have good ideas to do this, please do share them. I
know for example ProDy and csb use a similar approach.

Cheers,

Jo?o

2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>

>
> http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated


From redmine at redmine.open-bio.org  Wed Nov 28 00:46:22 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 28 Nov 2012 00:46:22 +0000
Subject: [Biopython-dev] [Biopython - Feature #3396] (New) Add alignment
	score, % identity, % similarity, % gaps, etc to EmbossIO
Message-ID: <redmine.issue-3396.20121128004622@redmine.open-bio.org>


Issue #3396 has been reported by Olga Botvinnik.

----------------------------------------
Feature #3396: Add alignment score, % identity, % similarity, % gaps, etc to EmbossIO
https://redmine.open-bio.org/issues/3396

Author: Olga Botvinnik
Status: New
Priority: Normal
Assignee: Olga Botvinnik
Category: 
Target version: 
URL: 


As of BioPython 1.59, if an alignment is read in with Bio.AlignIO(handle, 'emboss'), the metadata such as the substitution matrix used, gap_penalty, extend_penalty, identity, similarity, gaps, and score in the header is ignored:

<pre>
#=======================================
#
# Aligned_sequences: 4
# 1: IXI_234
# 2: IXI_235
# 3: IXI_236
# 4: IXI_237
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 131
# Identity:      95/131 (72.5%)
# Similarity:   127/131 (96.9%)
# Gaps:          25/131 (19.1%)
# Score: 100.0
#
#
#=======================================
</pre>

I edited the EmbossIO.py file to read these metadata and add it as an annotation to each SeqRecord in the MultipleSequenceAlignment object, since the MultipleSequenceAlignment object does not have the option for annotations. I also added the appropriate unit tests. Please let me know if there is a bug in the code that I missed.

For example, for the above alignment, the SeqRecord objects would have the following annotations:

<pre>
{'identity_denominator': 131, 'matrix': 'EBLOSUM62', 'similarity': 0.8549618320610687, 'similarity_numerator': 112, 'similarity_denominator': 131, 'gaps': 0.1450381679389313, 'identity_numerator': 112, 'gap_penalty': 10.0, 'extend_penalty': 0.5, 'gaps_denominator': 131, 'score': 591.5, 'identity': 0.8549618320610687, 'gaps_numerator': 19}
</pre>

I decided to keep the numerators and denominators separately from the identity, similarity, and gap percentages just in case a user wanted to do something else with them.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From diego_zea at yahoo.com.ar  Wed Nov 28 03:09:58 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Tue, 27 Nov 2012 19:09:58 -0800 (PST)
Subject: [Biopython-dev] Numpy/Scipy and Biopython
In-Reply-To: <CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
	<CAJ9sUYN517w_XB394rL4pY397+qt3Bx4_QcUgdRrZT_-7=HUng@mail.gmail.com>
Message-ID: <1354072198.13226.YahooMailNeo@web140606.mail.bf1.yahoo.com>

""""
Hi Jo?o (and others)!!! Thanks :)

I think someone with more Numpy knowledgement can do this better, but this is my idea:

1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser)
2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how )
[ The indice is the same, and you can use one for slice the other with boolean arrays ;) ]
3- Define methods for the most commons operations

This is and example of my idea (work on 1AB0 from PDB)...

""""

import numpy

names=[]
descript=[]
xyz = []

# The example structure is 
# http://www.rcsb.org/pdb/explore.do?structureId=1ab0 
with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh:
??? """ Very naive parser.I write this in a couple of minutes.
??? It's bad, but it's only for show the idea """
??? for line in fh:
??????? if line[0:4]=='ATOM':
??????????? temp =[]
??????????? temp2 =[]
??????????? temp.append(line[4:11].replace(" ",""))
??????????? temp2.append(line[11:16].replace(" ",""))
??????????? temp2.append(line[17:21].replace(" ",""))
??????????? temp.append(line[22:27].replace(" ",""))
??????????? xyz.append(line[31:56].split())
??????????? temp.append(line[55:60].replace(" ",""))
??????????? temp.append(line[60:67].replace(" ",""))
??????????? temp2.append(line[-5:].replace(" ","").replace("\n",""))
??????????? descript.append(temp)
??????????? names.append(temp2)

# I don't good for using different dtypes 
# In different columns
# But can be better columns with names instead of this:
names_array = numpy.array(names,numpy.character)???????????? 
descript_array = numpy.array(descript,numpy.float16)
xyz_array = numpy.array(xyz,numpy.float16)

def select_atom(names,xyz,descript,atom='CA'):
??? xyz_s = xyz[names[:,0]==atom,:]
??? names_s = names[names[:,0]==atom,:]
??? descript_s = descript[names[:,0]==atom,:]
??? return names_s,xyz_s,descript_s

def delete_res_num(names,xyz,descript,num=20):
??? xyz_s = xyz[descript[:,1]!=num,:]
??? names_s = names[descript[:,1]!=num,:]
??? descript_s = descript[descript[:,1]!=num,:]
??? return names_s,xyz_s,descript_s

def delete_atom_num(names,xyz,descript,num=20):
??? xyz_s = xyz[descript[:,0]!=num,:]
??? names_s = names[descript[:,0]!=num,:]
??? descript_s = descript[descript[:,0]!=num,:]
??? return names_s,xyz_s,descript_s

def add_atom(new_name,new_xyz,new_descript,names,xyz,descript):
??? # Using vstack ;)
??? new_name = numpy.array(new_name,numpy.character)
??? new_descript = numpy.array(new_descript,numpy.float16)
??? new_xyz = numpy.array(new_xyz,numpy.float16)
??? xyz_s = numpy.vstack((xyz,new_xyz))
??? names_s = numpy.vstack((names,new_name))
??? descript_s = numpy.vstack((descript,new_descript))
??? return names_s,xyz_s,descript_s

## Example (works!!!)
xyz_array.shape
delete_atom_num(names_array,xyz_array,descript_array)[1].shape
add_atom(['H','H','H'],[0,0,0],[0,0,0,0],names_array,xyz_array,descript_array)[1].shape


?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}


>________________________________
> De: Jo?o Rodrigues <anaryin at gmail.com>
>Para: Diego Zea <diego_zea at yahoo.com.ar> 
>CC: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
>Enviado: martes, 27 de noviembre de 2012 12:40
>Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython
> 
>
>Hi Diego,
>
>
>Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach.
>
>
>Cheers,
>
>
>Jo?o
>
>
>2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>
>
>http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
>
>
>


From redmine at redmine.open-bio.org  Thu Nov 29 09:09:49 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 09:09:49 +0000
Subject: [Biopython-dev] [Biopython - Feature #3398] (New) Oracle BioSQL
Message-ID: <redmine.issue-3398.20121129090949@redmine.open-bio.org>


Issue #3398 has been reported by Hyungyong Kim.

----------------------------------------
Feature #3398: Oracle BioSQL
https://redmine.open-bio.org/issues/3398

Author: Hyungyong Kim
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


I just tested Oracle BioSQL for Biopython using cx_Oracle. It includes some Biopython modification due to my genbank file test. I attached this patch and describe how it was generated.

<pre>
[yong27 at dev biopython]$ git ls-remote --heads origin
902947a7df49d8529faeb7e1bfb55b2d06252272        refs/heads/master
[yong27 at dev biopython]$ git diff origin/master master > oracle_biosql.diff
[yong27 at dev biopython]$
</pre>

This is a example how to use Oracle BioSQL. Oracle, Oracle BioSQL schema, cx_Oracle has to be installed.

<pre>
from context lib import contextmanager
from BioSQL import BioSeqDatabase

@contextmanager
def biosqlconn(dbname):
    server = BioSeqDatabase.open_database(driver='cx_Oracle, user='USER', passwd='PASS')
    conn = server[dbname]
    try:
        yield conn
    except:
        conn.adaptor.rollback()
        raise
    else:
        conn.adaptor.commit()
    finally:
        conn.adaptor.close()

with biosqlconn('mydb') as biosqldb:
    record = biosqldb.lookup(accession='1234')

</pre>


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Nov 29 10:56:04 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 10:56:04 +0000
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <50B6F8FF.2090206@brueffer.de>
References: <50B6CBB1.9040706@brueffer.de>
	<50B6F8FF.2090206@brueffer.de>
Message-ID: <CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>

Can we continue this on the biopython-dev mailing list (CC'd)?

On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer
<christian at brueffer.de> wrote:
> On 11/29/2012 10:42 AM, Christian Brueffer wrote:
>>
>> Hi,
>>
>> in preparation of cleaning up the AlignACE wrapper, I wanted to test
>> the current wrapper.   However, it doesn't seem to work at all ...
>>
>> For the record, I'm testing with the Linux version of the binary
>> (AlignACE version 2.3  October 27, 1998).
>>
>
> Some of the test files in the Tests directory mention the following AlignACE
> version: "AlignACE 4.0 05/13/04"
>
> This may be the answer to my problems.  Does anyone know where to get hold
> of this version?
>
> The website (http://atlas.med.harvard.edu/) is down and the only
> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html)
> only distributes the old 2.3 version that I have.

Hmm, I don't see any existing unit tests dedicated to this wrapper.
There should really be a file named test_AlignACE_tool.py or similar.

I would also like some doctests in Bio/Motif/Applications/_AlignAce.py
which must be non-executing so they can be run without dependencies,
which of course isn't actually a functional test but it does still catch some
issues - but primarily would be as documentation to demonstrate typical
usage.

I don't appear to have AlignAce installed on my own machines - in
particular, the nightly buildslaves don't have it. I don't think there is
a Debian/Ubuntu package for AlignAce, so testing this under
TravisCI is non-trivial - it looks like their licence agreement could
block packaging it.

Thanks,

Peter


From p.j.a.cock at googlemail.com  Thu Nov 29 11:22:51 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 11:22:51 +0000
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <50B74199.6020904@brueffer.de>
References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de>
	<CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
	<50B74199.6020904@brueffer.de>
Message-ID: <CAKVJ-_7CMuK7uwATmgY4fya9QO+hHaSmxdHwVsVKOZv4UbS9tQ@mail.gmail.com>

On Thu, Nov 29, 2012 at 11:06 AM, Christian Brueffer
<christian at brueffer.de> wrote:
> On 11/29/2012 06:56 PM, Peter Cock wrote:
>>
>> Can we continue this on the biopython-dev mailing list (CC'd)?
>>
>
> (moved to biopython-dev)
>

Thanks.

> Indeed.  I already have a cleaned up wrapper and unit tests in my local
> tree, but I don't want to submit them without actually testing them with an
> up to date binary ;-)

Excellent - I suspected you'd been doing something like this ;)

> archive.org has a version of http://atlas.med.harvard.edu/ from 2011,
> I have contacted the responsible person mentioned on the page.

It was Bartek who wrote the original wrapper (I only made re-factoring
changes since then), hopefully he still has a working AliceACE
installation and can tell us the version numbers etc that he was using.

Regards,

Peter


From christian at brueffer.de  Thu Nov 29 11:06:01 2012
From: christian at brueffer.de (Christian Brueffer)
Date: Thu, 29 Nov 2012 19:06:01 +0800
Subject: [Biopython-dev] [Biopython] AlignACE Application Wrapper
In-Reply-To: <CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
References: <50B6CBB1.9040706@brueffer.de> <50B6F8FF.2090206@brueffer.de>
	<CAKVJ-_6x2jVbmwO_LEh2jOOnOonkSb0iFQctwX5kAYnc_-22Bg@mail.gmail.com>
Message-ID: <50B74199.6020904@brueffer.de>

On 11/29/2012 06:56 PM, Peter Cock wrote:
> Can we continue this on the biopython-dev mailing list (CC'd)?
>
> On Thu, Nov 29, 2012 at 5:56 AM, Christian Brueffer
> <christian at brueffer.de> wrote:
>> On 11/29/2012 10:42 AM, Christian Brueffer wrote:
>>>
>>> Hi,
>>>
>>> in preparation of cleaning up the AlignACE wrapper, I wanted to test
>>> the current wrapper.   However, it doesn't seem to work at all ...
>>>
>>> For the record, I'm testing with the Linux version of the binary
>>> (AlignACE version 2.3  October 27, 1998).
>>>
>>
>> Some of the test files in the Tests directory mention the following AlignACE
>> version: "AlignACE 4.0 05/13/04"
>>
>> This may be the answer to my problems.  Does anyone know where to get hold
>> of this version?
>>
>> The website (http://atlas.med.harvard.edu/) is down and the only
>> other one I found (http://arep.med.harvard.edu/mrnadata/mrnasoft.html)
>> only distributes the old 2.3 version that I have.
>
> Hmm, I don't see any existing unit tests dedicated to this wrapper.
> There should really be a file named test_AlignACE_tool.py or similar.
>
> I would also like some doctests in Bio/Motif/Applications/_AlignAce.py
> which must be non-executing so they can be run without dependencies,
> which of course isn't actually a functional test but it does still catch some
> issues - but primarily would be as documentation to demonstrate typical
> usage.
>
> I don't appear to have AlignAce installed on my own machines - in
> particular, the nightly buildslaves don't have it. I don't think there is
> a Debian/Ubuntu package for AlignAce, so testing this under
> TravisCI is non-trivial - it looks like their licence agreement could
> block packaging it.
>

(moved to biopython-dev)

Indeed.  I already have a cleaned up wrapper and unit tests in my local 
tree, but I don't want to submit them without actually testing them with 
an up to date binary ;-)

archive.org has a version of http://atlas.med.harvard.edu/ from 2011,
I have contacted the responsible person mentioned on the page.

Cheers,

Chris


From mjldehoon at yahoo.com  Thu Nov 29 14:33:12 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 29 Nov 2012 06:33:12 -0800 (PST)
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>
Message-ID: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>

--- On Mon, 11/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> In general the NCBI identifiers are arbitrary strings,
> although perhaps the pubmed identifiers could be treated as
> integers.
> This is perhaps worth changing in the Bio.Entrez code...
> 
> What do you think Michael?

If we change this in the Bio.Entrez code, we should put str(..) around all NCBI identifiers, not just the pubmed ones. Otherwise we'd have special treatment for one of the Entrez databases, which may cause problems in the future.
I'm OK if somebody else adds the calls to str(..), but I wouldn't champion it myself.

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Thu Nov 29 14:49:42 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 29 Nov 2012 14:49:42 +0000
Subject: [Biopython-dev] Error in Bio.Entrez.__init__
In-Reply-To: <1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CAKVJ-_4Y3rvgQP8C9tgykekhc4Jsi=cVMZwxbm1YS4YOn7754g@mail.gmail.com>
	<1354199592.66390.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_50-b6LMu8rN2MRFbU2ssYMraNgqdN=vJSn9YNUJos85w@mail.gmail.com>

On Thu, Nov 29, 2012 at 2:33 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Mon, 11/26/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> In general the NCBI identifiers are arbitrary strings,
>> although perhaps the pubmed identifiers could be treated as
>> integers.
>> This is perhaps worth changing in the Bio.Entrez code...
>>
>> What do you think Michael?
>
> If we change this in the Bio.Entrez code, we should put str(..) around
> all NCBI identifiers, not just the pubmed ones. Otherwise we'd have
> special treatment for one of the Entrez databases, which may cause
> problems in the future.

Yes, after all there are other Entrez database with 'numerical' identifiers.

> I'm OK if somebody else adds the calls to str(..), but I wouldn't champion
> it myself.

I don't mind doing the commit (and a unit test), but do you have any
specific concern in mind?

Peter


From redmine at redmine.open-bio.org  Thu Nov 29 17:12:31 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 17:12:31 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15021.20121129171231@redmine.open-bio.org>


Issue #3395 has been updated by Peter Cock.

File trie_debug.patch added

I can reproduce the problem with your saved file under Mac OS X, using the latest Biopython from github, e.g.

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33) 
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import trie
>>> import gzip
>>> with gzip.open("trie.4.dat.gz") as handle:
...     t = trie.load(handle)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
RuntimeError: loading failed for some reason

Adding a little debugging to the C code tells us where this fails (see attachment), line 669:

668    if(has_value) {
669        if(!(trie->value = (*read_value)(data)))
670            goto _deserialize_trie_error;
371    }

What kind of CPU does your machine have? i.e. is it a normal Intel or AMD CPU, or something unusual like a PowerPC where we have to worry about the bit order interpretation?

We may need a complete example creating the trie as well - the problem could be in the trie itself, the serialisation (writing to disk), or de-serialisation (loading from disk).
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 29 17:21:30 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 29 Nov 2012 17:21:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie
	implementation can't load large data sets
References: <redmine.issue-3395.20121120134147@redmine.open-bio.org>
Message-ID: <redmine.journal-15022.20121129172130@redmine.open-bio.org>


Issue #3395 has been updated by Micha? Nowotka.


I'm using ubuntu virtual machine running on MacBookPro using single Intel? Core? i7-2720QM CPU @ 2.20GHz processor. I will try to prepare code and data for which it fails.
----------------------------------------
Bug #3395: Biopython trie implementation can't load large data sets
https://redmine.open-bio.org/issues/3395

Author: Micha? Nowotka
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 
URL: 


Imagine I have Biopython trie:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'w')
tr = trie.trie()
#fill in the trie
trie.save(f, trie)

Now /tmp/trie.dat.gz is about 50MB. Let's try to read it:

from Bio import trie
import gzip

f = gzip.open('/tmp/trie.dat.gz', 'r')
tr = trie.load(f)

Unfortunately I'm getting meaningless error saying:
"loading failed for some reason"

Any hints?


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Fri Nov 30 02:35:25 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Fri, 30 Nov 2012 03:35:25 +0100
Subject: [Biopython-dev] Minor buildbot issues from SearchIO
In-Reply-To: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>
References: <CAKVJ-_6N_Wy9QVKp=niHSexB0_yEL5svh4oDzbxEYuSHv3KfWA@mail.gmail.com>
Message-ID: <CADEGkF4RLmQDMS2sBNTs=Rwag_CypmU6WX-Q71R=Xsbuc4_GQg@mail.gmail.com>

Hi everyone,

I've done some digging around to see how to deal with these issues.
Here's what I found:

> The BuildBot flagged two new issues overnight,
> http://testing.open-bio.org/biopython/tgrid
>
> Python 2.5 on Windows - doctests are failing due to floating point decimal place
> differences in the exponent (down to C library differences, something fixed in
> later Python releases). Perhaps a Python 2.5 hack is the way to go here?
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/664/steps/shell/logs/stdio

I've submitted a pull request to fix this here:
https://github.com/biopython/biopython/pull/98

> Python 3.2 and 3.3 on Windows are showing some XML character encoding oddity.
> Perhaps there is some encoding setting needed under Python 3 for the BLAST
> XML files?
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.2/builds/512/steps/shell/logs/stdio
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.3/builds/29/steps/shell/logs/stdio

I've also addressed these failures here:
https://github.com/biopython/biopython/pull/99

> There is a separate cross-platform issue on Python 3.1, "TypeError:
> invalid event tuple"
> again with XML parsing. Curiously this had started a few days back in
> the UniprotIO
> tests on one machine, pre-dating the SearchIO merge. I'm not sure what
> triggered it.
> http://testing.open-bio.org/biopython/builders/Linux%20-%20Python%203.1/builds/767
> http://testing.open-bio.org/biopython/builders/Linux%2064%20-%20Python%203.1/builds/766/steps/shell/logs/stdio
> http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%203.1/builds/648/steps/shell/logs/stdio

As for this one, it seems that it's caused by a bug in Python3.1
(http://bugs.python.org/issue9257) due to the way
`xml.etree.cElemenTree.iterparse` accepts the `event` argument. I
haven't submitted any pull request for this bug, since the fix looks
quite messy. Should we try to address this or simply make note that
XML parsing in Python3.1 will not work? Like Peter noted, currently
this bug involves Bio.SearchIO blast xml parsing, SeqIO.UniprotIO, and
Phylo.PhyloXMLIO.

regards,
Bow


From diego_zea at yahoo.com.ar  Fri Nov 30 13:00:20 2012
From: diego_zea at yahoo.com.ar (Diego Zea)
Date: Fri, 30 Nov 2012 05:00:20 -0800 (PST)
Subject: [Biopython-dev]  Numpy/Scipy and Biopython
In-Reply-To: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
References: <1354026348.44288.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Message-ID: <1354280420.4305.YahooMailNeo@web140605.mail.bf1.yahoo.com>

Hi! I were checking the Seq/AlignIO, and I think can be possible avoid the overhead of create Bio objects after Numpy object. Adding an optional funci?n in __init__ with a argument setting in False for default. When this arguments became True, objects based on Numpy are generate too. At the time, maybe can be more easy interchange between simple python objects and numpy based objects. And use all functionality of Bio and fast numerical operations of Numpy arrays... It's only and idea, what do you think? Thanks!!! :)

?
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}


>________________________________
> De: Diego Zea <diego_zea at yahoo.com.ar>
>Para: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
>Enviado: martes, 27 de noviembre de 2012 11:25
>Asunto: [Biopython-dev] Numpy/Scipy and Biopython
> 
>Hi!!!
>This is my firts mail in the list.
>I relative new in BioPython (I used to code more in Perl) but I want to colaborate with the project.
>I did this post in Stackoverflow, and I want to share my question to all of you ;)
>
>http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
>Best wishes,
>
>?
>if ((dx*dp)>=(h/(2*pi)))
>{
>printf("Diego Javier Zea\n");
>}
>_______________________________________________
>Biopython-dev mailing list
>Biopython-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>