From bugzilla-daemon at portal.open-bio.org  Tue Dec  1 07:28:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Dec 2009 07:28:33 -0500
Subject: [Biopython-dev] [Bug 2957] GenBank Writer Should Write Out Date
In-Reply-To: <bug-2957-42@http.bugzilla.open-bio.org/>
Message-ID: <200912011228.nB1CSXec001831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2957


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-01 07:28 EST -------
A slightly more robust version of this has been checked in. Future work could
handle date/time objects. Please reopen this bug if there are any problems.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Dec  1 14:34:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 1 Dec 2009 19:34:19 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utility Policy
	Change
In-Reply-To: <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
	<320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com>
Message-ID: <320fb6e00912011134u2481644aw5dfdfe9f9a3049f0@mail.gmail.com>

Hi all,

Attention NCBI Entrez users - the NCBI really do want you to include
your email address, and it will be mandatory in future! See below...

If using Bio.Entrez, the tool parameter will by default be set to
Biopython, but the email is omitted. We already encourage the email
to be included in our documentation but given the new NCBI guidance
I'd suggest we make omitting the email issue a warning in the next
release (and an error in the subsequent release of Biopython?).

Peter


---------- Forwarded message ----------
From: ?<utilities-announce at ncbi.nlm.nih.gov>
Date: Tue, Dec 1, 2009 at 6:59 PM
Subject: [Utilities-announce] NCBI E-Utility Policy Change
To: utilities-announce at ncbi.nlm.nih.gov


As part of an ongoing effort to ensure efficient access to the Entrez
Utilities (E-utilities) by all users, NCBI has decided to change the
usage policy for the E-utilities effective June 1, 2010. Effective on
June 1, 2010, all E-utility requests, either using standard URLs or
SOAP, must contain non-null values for both the &tool and &email
parameters. Any E-utility request made after June 1, 2010 that does
not contain values for both parameters will return an error explaining
that these parameters must be included in E-utility requests.


The value of the &tool parameter should be a URI-safe string that is
the name of the software package, script or web page producing the
E-utility request.


The value of the &email parameter should be a valid e-mail address for
the appropriate contact person or group responsible for maintaining
the tool producing the E-utility request.


NCBI uses these parameters to contact users whose use of the
E-utilities violates the standard usage policies described at
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements.
These usage policies are designed to prevent excessive requests from a
small group of users from reducing or eliminating the wider
community's access to the E-utilities. NCBI will attempt to contact a
user at the e-mail address provided in the &email parameter prior to
blocking access to the E-utilities.


NCBI realizes that this policy change will require many of our users
to change their code. Based on past experience, we anticipate that
most of our users should be able to make the necessary changes before
the June 1, 2010 deadline. If you have any concerns about making these
changes by that date, or if you have any questions about these
policies, please contact eutilities at ncbi.nlm.nih.gov.


Thank you for your understanding and cooperation in helping us
continue to deliver a reliable and efficient web service.


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
-------------- next part --------------
_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

From chapmanb at 50mail.com  Wed Dec  2 07:57:44 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 2 Dec 2009 07:57:44 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com>
	<20090406220826.GH43636@sobchak.mgh.harvard.edu>
	<320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
Message-ID: <20091202125744.GA46415@sobchak.mgh.harvard.edu>

Hi Peter;

> Brad has some GFF parsing code he as been working on, which
> would be nice to merge into Biopython at some point. See:
> 
> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html
> 
> As we started to discuss earlier this year, we need to think about
> what to do with the existing (old) Bio.GFF module. This was written
> by Michael Hoffman back in 2002 which accesses MySQL General
> Feature Format (GFF) databases created with BioPerl.
> 
> I've been looking at the old Bio.GFF code, and there are a lot of
> redundant things like its own GenBank/EMBL location parsing,
> plus its own location objects and its own Feature objects (rather
> than reusing Bio.SeqFeature which should have sufficed).

I'm ambivalent on deprecating GFF. Agreed that some of it is not
well integrated with the rest of Biopython, with the
Location/LocationFromString code being the most duplicated. It's too
bad feature were reimplemented as well. Is Michael around at all?

> I want to suggest we deprecate Michael Hoffman's Bio.GFF module
> in Biopython 1.53 (I'm hoping we can do this next month, Dec 2009).
> Depending on how soon Brad's code is ready to be merged (which I
> am assuming could be Biopython 1.54, spring 2010), we can perhaps
> accelerate removal of the old module.

The current structure of the GFF code does not require removing what
is currently there. It needs a couple of lines in __init__.py to
expose the useful classes at the top level:

from GFFParser import GFFParser, DiscoGFFParser, GFFExaminer
from GFFOutput import GFF3Writer

and we'd need to move the MySQLdb check to the Connection class so
it's only needed if you are actually using the database code.

So these can happen in parallel. Ideally, I'd like to get the GFF
stuff in sooner rather than later. The main item on my todo list is
finishing the documentation, with the stubs here:

http://biopython.org/wiki/GFF_Parsing

If I crank that out what do we think about putting it in with the
__init__.py modifications I suggested?

Brad

From mjldehoon at yahoo.com  Wed Dec  2 09:29:27 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Dec 2009 06:29:27 -0800 (PST)
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
Message-ID: <317375.58712.qm@web62401.mail.re1.yahoo.com>

--- On Wed, 12/2/09, Brad Chapman <chapmanb at 50mail.com> wrote:
> If I crank that out what do we think about putting it in
> with the __init__.py modifications I suggested?

I'd definitely welcome a GFF parser in Biopython, but I think the current code needs to be simplified and its usage more consistent with other Biopython modules. It's great that the documentation is available. It's a big help in designing the module, in particular what its usage looks like to the user.

Let's start from basic GFF parsing. This is the example in the documentation:

>>> from BCBio.GFF import GFFParser
>>> in_file = "your_file.gff"
>>> parser = GFFParser()
>>> in_handle = open(in_file)
>>> for rec in parser.parse(in_handle):
...    print rec
>>> in_handle.close()

What is the purpose of creating the parser first, and then calling parser.parse on the in_handle? I'd much rather have

>>> from BCBio import GFF
>>> in_file = "your_file.gff"
>>> in_handle = open(in_file)
>>> for rec in GFF.parse(in_handle):
...    print rec
>>> in_handle.close()

which is how most other Biopython parsers work.

--Michiel.


From chapmanb at 50mail.com  Thu Dec  3 09:25:34 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 3 Dec 2009 09:25:34 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <317375.58712.qm@web62401.mail.re1.yahoo.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
Message-ID: <20091203142534.GF51407@sobchak.mgh.harvard.edu>

Hi Michiel;

> > If I crank that out what do we think about putting it in
> > with the __init__.py modifications I suggested?
> 
> I'd definitely welcome a GFF parser in Biopython, but I think the
> current code needs to be simplified and its usage more consistent
> with other Biopython modules. It's great that the documentation is
> available. It's a big help in designing the module, in particular what
> its usage looks like to the user.

Awesome. I welcome these suggestions; it's really helpful to have
fresh eyes looking at it. Hopefully moving it into Biopython will
stimulate that.

> Let's start from basic GFF parsing. This is the example in the documentation:
[...]
> What is the purpose of creating the parser first, and then calling
> parser.parse on the in_handle? I'd much rather have
>
> >>> from BCBio import GFF
> >>> in_file = "your_file.gff"
> >>> in_handle = open(in_file)
> >>> for rec in GFF.parse(in_handle):
> ...    print rec
> >>> in_handle.close()

Great -- done for parsing and writing and committed to GitHub. The
documentation is updated as well.

Happy to get other comments and thoughts. Thanks again,
Brad

From biopython at maubp.freeserve.co.uk  Thu Dec  3 09:53:44 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:53:44 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091203142534.GF51407@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:25 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Great -- done for parsing and writing and committed to GitHub. The
> documentation is updated as well.
>
> Happy to get other comments and thoughts. Thanks again,
>

I understand that GFF files are complex, and a simple "record
iterator" isn't flexible enough to cover all use cases - hence the
need for a complex parser class. That said, Michiel is right that
GFF.parse() or GFF.read() functions would be consistent with
other bits of Biopython, and would provide for the simple use
cases.

Looking at your code, BCBio.GFF.parse(...) would return
SeqRecord objects (with SeqFeatures). That seems
redundant to me as one expect people to just use
Bio.SeqIO.parse(handle, "gff3") instead. I would instead
have expected BCBio.GFF.parse(...) to iterate over the
features in a GFF file.

Also, and we'd touched on this before - I'd much prefer to
have the GFF module quite "low level" using either new
GFF-specific classes or simple Python objects (e.g. for
each feature use a tuple of ints and strings for the first
feature columns plus a dict for the final extendible
column of annotation).

>From a technical point of view, a justification for this
separation is the GFF details are not a perfect fit to the
SeqRecord and SeqFeature objects and forcing their
use adds unnecessary overheads for people wanting
to work directly with the features themselves.

Also, by splitting the code into basic parsing and a
SeqRecord/SeqFeature conversion layer (which I
would put in Bio/SeqIO/GffIO.py) we can add the
code in two steps (first GFF parsing, then SeqIO
support).

I think this split is useful as this is a very big job to do
properly: Once we have GFF to SeqRecord parsing,
we need to try and ensure that it is compatible with the
GenBank to SeqRecord parsing. This is important as
we would in effect be extending Biopython to allow
GFF3 to GenBank conversions. For testing all this,
we can grab the same data in the two file formats
(e.g. from the NCBI) and perhaps also use EMBOSS.

You may recall we talked to Peter Rice (from EMBOSS)
about this - there are some important issues here like
ontology mapping where we should be able to reuse a
lot of the work EMBOSS has already done (and use the
EMBOSS tools to help validate our mapping).

i.e. While I may be being overly cautious, I think that
while adding GFF parsing and GFF to SeqRecord
mapping is very important, it is also very complex.
Therefore breaking this into a two stage task makes
managing and testing it easier - as well as seeming
a good idea for the code itself.

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Dec  3 10:03:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Dec 2009 10:03:29 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912031503.nB3F3Tu8013049@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-03 10:03 EST -------
Brad,

Now that Chris at BioPerl is interested, I am confident
we can get the SQLite schema into BioSQL in the near future:
http://lists.open-bio.org/pipermail/biosql-l/2009-November/001655.html

Do you want to update your patch (if needed) and put this
up on a Biopython branch in github? How soon do you think
it could be ready to merge? It would be nice to have this
in the next release (even if we put a bug "beta" warning in)?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Dec  3 10:30:54 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 15:30:54 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com>
	<20090406220826.GH43636@sobchak.mgh.harvard.edu>
	<320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
	<20091202125744.GA46415@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912030730rb66c2dav1993465ba25f9f5f@mail.gmail.com>

On Wed, Dec 2, 2009 at 12:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
>
>> Brad has some GFF parsing code he as been working on, which
>> would be nice to merge into Biopython at some point. See:
>>
>> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html
>>
>> As we started to discuss earlier this year, we need to think about
>> what to do with the existing (old) Bio.GFF module. This was written
>> by Michael Hoffman back in 2002 which accesses MySQL General
>> Feature Format (GFF) databases created with BioPerl.
>>
>> I've been looking at the old Bio.GFF code, and there are a lot of
>> redundant things like its own GenBank/EMBL location parsing,
>> plus its own location objects and its own Feature objects (rather
>> than reusing Bio.SeqFeature which should have sufficed).
>
> I'm ambivalent on deprecating GFF. Agreed that some of it is not
> well integrated with the rest of Biopython, with the
> Location/LocationFromString code being the most duplicated. It's too
> bad feature were reimplemented as well. Is Michael around at all?

I got in touch with Michael Hoffman - he has moved from the EBI to
the University of Washington but his EBI email address still worked.
He said:

"Please feel free to deprecate the module or make any
necessary changes for the project."

Even if you (Brad) didn't have a new GFF parser waiting to be
added to Biopython, I would still want to do something with
Bio.GFF to reduce the redundancy of location and feature code.
Deprecation is the simplest solution (but I may be able to
reuse some of his location string parsing code on Bug 2738).

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Dec  3 10:32:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Dec 2009 10:32:31 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200912031532.nB3FWV7G013739@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-03 10:32 EST -------
Note - we may be able to reuse some of the location string parsing ideas in
Bio/GFF/easy.py here too...

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 07:31:51 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 07:31:51 -0500
Subject: [Biopython-dev] [Bug 2961] New: Adding undocumented file format
	switches to MUSCLE wrapper
Message-ID: <bug-2961-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961

           Summary: Adding undocumented file format switches to MUSCLE
                    wrapper
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As discussed on the mailing list, and confirmed with MUSCLE author Robert
Edgar,
there are a number of useful command line arguments for things like PHYLIP
output
(both interlaced and sequential) which the Bio.Align.Applications wrapper does
not support. See:
http://lists.open-bio.org/pipermail/biopython/2009-December/005881.html

We should add these.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 07:50:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 07:50:25 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041250.nB4CoP72007627@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #1 from cymon.cox at gmail.com  2009-12-04 07:50 EST -------
Created an attachment (id=1408)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1408&action=view)
Add PHYLIP output to Muscle command line interface


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 08:14:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:14:08 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041314.nB4DE8aA008792@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1408 is|0                           |1
           obsolete|                            |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-04 08:14 EST -------
(From update of attachment 1408)
Patch applied.

Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
all take a filename)?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 08:21:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:21:52 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041321.nB4DLqsd009037@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #3 from cymon.cox at gmail.com  2009-12-04 08:21 EST -------
(In reply to comment #2)
> (From update of attachment 1408 [details])
> Patch applied.
> 
> Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
> all take a filename)?

! Is there anything else undocumented?

OK, I'll do that asap. I'll also add tests - change test suite to use
subprocess module etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 08:36:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:36:11 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041336.nB4DaBvS009365@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-04 08:36 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 1408 [details] [details])
> > Patch applied.
> > 
> > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
> > all take a filename)?
> 
> ! Is there anything else undocumented?

Robert did imply there could be other things as his documentation
was out of sync with the code :(

These after of limited value given you can use "-phyi -out filename.phy"
as an alternative to "-phyiout filename.phy" however one bonus feature
is these options allow you to get multiple output files in one run (e.g.
both HTML output to inspect visually and ClustalW output to parse).

> OK, I'll do that asap. I'll also add tests - change test suite to use
> subprocess module etc.

I'd forgotten about that (using subprocess rather than generic_run
in our unit tests). Could you do that as a separate patch please?

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at 50mail.com  Fri Dec  4 08:40:10 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 4 Dec 2009 08:40:10 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
Message-ID: <20091204134010.GK51407@sobchak.mgh.harvard.edu>

Hi all;
Peter, thanks for the feedback. Thoughts below.

> Looking at your code, BCBio.GFF.parse(...) would return
> SeqRecord objects (with SeqFeatures). That seems
> redundant to me as one expect people to just use
> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
> have expected BCBio.GFF.parse(...) to iterate over the
> features in a GFF file.

This would work for simple cases, but for most real life cases you
will likely want to limit the file to a subset of things you are
interested in. It helps reduce memory problems, and is equivalent to
a track system view in UCSC or Ensembl. I find it very useful for
all of the work I've done with it.

We could use SeqIO here, but then there is the issue of passing
along the additional arguments. The simplicity of SeqIO is really
nice, so not sure if we'd want to clutter SeqIO with it.

So we could support basic parsing in SeqIO, but it would be useful to
have this GFF specific parsing as the additional arguments will be a
regular use case.

> Also, and we'd touched on this before - I'd much prefer to
> have the GFF module quite "low level" using either new
> GFF-specific classes or simple Python objects (e.g. for
> each feature use a tuple of ints and strings for the first
> feature columns plus a dict for the final extendible
> column of annotation).

Yes, it is implemented this way. The parse_simple function returns
a line by line parse of the file as a dictionary, which is then used
to build up the SeqFeature objects:

http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py

We can document and flesh that out, although I'm not really sure how
useful it will be. It's pretty easy to build your own simple
line-by-line GFF parser; the only advantage of this code over a
home-brew is that it handles tricky annotation cases.

For all of my uses, the real win was being able to build up the
multiple transcript exon/intron structures from the file. This is
not trivial to do on your own, and the real win of the code is in
handling this, especially for older GFF2 and GTF formatted files.

> From a technical point of view, a justification for this
> separation is the GFF details are not a perfect fit to the
> SeqRecord and SeqFeature objects and forcing their
> use adds unnecessary overheads for people wanting
> to work directly with the features themselves.

Why are SeqRecord and SeqFeature not appropriate for GFF? We could 
improve them to make things more lightweight, as we discussed
previously, but conceptually the values fit into the framework fine.

> Also, by splitting the code into basic parsing and a
> SeqRecord/SeqFeature conversion layer (which I
> would put in Bio/SeqIO/GffIO.py) we can add the
> code in two steps (first GFF parsing, then SeqIO
> support).

We can do this as is. I'm not suggesting SeqIO support right now,
and want to target getting the GFF parser as is into Biopython.

> I think this split is useful as this is a very big job to do
> properly: Once we have GFF to SeqRecord parsing,
> we need to try and ensure that it is compatible with the
> GenBank to SeqRecord parsing. This is important as
> we would in effect be extending Biopython to allow
> GFF3 to GenBank conversions. For testing all this,
> we can grab the same data in the two file formats
> (e.g. from the NCBI) and perhaps also use EMBOSS.

Do you think GFF to GenBank is a common use case? Agreed that it is
very hard, but this really had less to do with the object
structure in Biopython and more to do with how things 
are represented and named in the original source files. GenBank has
some "consistency" since it is produced mostly by NCBI, but GFF
files are all over the place.

This can be tackled later if someone wants, but right now my goals
are simply:

- Produce Biopython objects from GFF3/GTF/GFF2 files
- Represent nested features
- Allow GFF2/GTF to GFF3 conversion

This should be done with the current code. We can formalize the raw
parse_simple output for the line-by-line if people find it useful,
but otherwise we should leave these bigger projects for down the
line.

Brad

From biopython at maubp.freeserve.co.uk  Fri Dec  4 09:25:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Dec 2009 14:25:40 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091204134010.GK51407@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>

On Fri, Dec 4, 2009 at 1:40 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> Peter, thanks for the feedback. Thoughts below.
>
>> Looking at your code, BCBio.GFF.parse(...) would return
>> SeqRecord objects (with SeqFeatures). That seems
>> redundant to me as one expect people to just use
>> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
>> have expected BCBio.GFF.parse(...) to iterate over the
>> features in a GFF file.
>
> This would work for simple cases, but for most real life cases you
> will likely want to limit the file to a subset of things you are
> interested in. It helps reduce memory problems, and is equivalent to
> a track system view in UCSC or Ensembl. I find it very useful for
> all of the work I've done with it.

Understood - a feature returning Bio.GFF.parse() function could
take various arguments, or for full flexibility, the user can use the
parser object directly.

> We could use SeqIO here, but then there is the issue of passing
> along the additional arguments. The simplicity of SeqIO is really
> nice, so not sure if we'd want to clutter SeqIO with it.
>
> So we could support basic parsing in SeqIO, but it would be useful to
> have this GFF specific parsing as the additional arguments will be a
> regular use case.

This is already catered for in that Bio.SeqIO.parse() and read()
don't take arbitrary arguments (currently), but the underlying
Bio.SeqIO.XxxxIO.XxxIterator() they invoke may do so. i.e. You
could have Bio.SeqIO.GffIO.GffIterator() and perhaps variants
(e.g. GFF2 vs GFF3) which take filtering arguments.

>> Also, and we'd touched on this before - I'd much prefer to
>> have the GFF module quite "low level" using either new
>> GFF-specific classes or simple Python objects (e.g. for
>> each feature use a tuple of ints and strings for the first
>> feature columns plus a dict for the final extendible
>> column of annotation).
>
> Yes, it is implemented this way. The parse_simple function returns
> a line by line parse of the file as a dictionary, which is then used
> to build up the SeqFeature objects:
>
> http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py
>
> We can document and flesh that out, although I'm not really sure how
> useful it will be. It's pretty easy to build your own simple
> line-by-line GFF parser; the only advantage of this code over a
> home-brew is that it handles tricky annotation cases.

I still think it would be useful to have Bio/GFF/Parser.py (or
similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
similar) to turn this into SeqRecord and SeqFeature objects.

> For all of my uses, the real win was being able to build up the
> multiple transcript exon/intron structures from the file. This is
> not trivial to do on your own, and the real win of the code is in
> handling this, especially for older GFF2 and GTF formatted files.
>
>> From a technical point of view, a justification for this
>> separation is the GFF details are not a perfect fit to the
>> SeqRecord and SeqFeature objects and forcing their
>> use adds unnecessary overheads for people wanting
>> to work directly with the features themselves.
>
> Why are SeqRecord and SeqFeature not appropriate for GFF? We could
> improve them to make things more lightweight, as we discussed
> previously, but conceptually the values fit into the framework fine.

The nested features that worry me. Perhaps the existing
location operator (e.g. "join") could be set to something
like "parent/child" if the subfeatures is used to hold child
features rather than the elements of a join? We need
the GenBank output code etc to be able to tell these
apart reliably.

>> Also, by splitting the code into basic parsing and a
>> SeqRecord/SeqFeature conversion layer (which I
>> would put in Bio/SeqIO/GffIO.py) we can add the
>> code in two steps (first GFF parsing, then SeqIO
>> support).
>
> We can do this as is. I'm not suggesting SeqIO support right now,
> and want to target getting the GFF parser as is into Biopython.

My point is the moment you include GFF -> SeqRecord
code (even if not explicitly via the Bio.SeqIO namespace)
it opens us up to people giving these SeqRecord objects
to SeqIO for output (e.g. as GenBank).

>> I think this split is useful as this is a very big job to do
>> properly: Once we have GFF to SeqRecord parsing,
>> we need to try and ensure that it is compatible with the
>> GenBank to SeqRecord parsing. This is important as
>> we would in effect be extending Biopython to allow
>> GFF3 to GenBank conversions. For testing all this,
>> we can grab the same data in the two file formats
>> (e.g. from the NCBI) and perhaps also use EMBOSS.
>
> Do you think GFF to GenBank is a common use case?

I suspect its something I'd want to do it when working with
new genome annotations. GeneMark produces GFF, while
Prodigal produces (simple) GenBank. The SOLiD pipeline
corona produces GFF. Sometimes you can get both, the
tool RAST outputs GenBank, GFF, GTF and EMBL files.

> Agreed that it is very hard, but this really had less to do
> with the object structure in Biopython and more to do
> with how things are represented and named in the
> original source files. GenBank has some "consistency"
> since it is produced mostly by NCBI, but GFF files are
> all over the place.
>
> This can be tackled later if someone wants, but right
> now my goals are simply:
>
> - Produce Biopython objects from GFF3/GTF/GFF2 files
> - Represent nested features
> - Allow GFF2/GTF to GFF3 conversion
>
> This should be done with the current code. We can
> formalize the raw parse_simple output for the line-by-line
> if people find it useful, but otherwise we should leave
> these bigger projects for down the line.

Worth goals, but if by "Produce Biopython objects from
GFF3/GTF/GFF2 files" you mean SeqRecords with
SeqFeatures, (as I said above) we are opening up the
GFF to GenBank can of worms. There is no "later" :(

Peter

From mjldehoon at yahoo.com  Sat Dec  5 10:54:19 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Dec 2009 07:54:19 -0800 (PST)
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
Message-ID: <983129.2133.qm@web62408.mail.re1.yahoo.com>

I didn't realize that the GFF parser returns SeqRecords. I agree with Peter that a parser returning SeqRecords should be accessed through Bio.SeqIO, while a lower-level parser can live in Bio.GFF.

--Michiel

--- On Thu, 12/3/09, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Bio.GFF and Brad's code
> To: "Brad Chapman" <chapmanb at 50mail.com>, biopython-dev at lists.open-bio.org
> Date: Thursday, December 3, 2009, 9:53 AM
> On Thu, Dec 3, 2009 at 2:25 PM, Brad
> Chapman <chapmanb at 50mail.com>
> wrote:
> >
> > Great -- done for parsing and writing and committed to
> GitHub. The
> > documentation is updated as well.
> >
> > Happy to get other comments and thoughts. Thanks
> again,
> >
> 
> I understand that GFF files are complex, and a simple
> "record
> iterator" isn't flexible enough to cover all use cases -
> hence the
> need for a complex parser class. That said, Michiel is
> right that
> GFF.parse() or GFF.read() functions would be consistent
> with
> other bits of Biopython, and would provide for the simple
> use
> cases.
> 
> Looking at your code, BCBio.GFF.parse(...) would return
> SeqRecord objects (with SeqFeatures). That seems
> redundant to me as one expect people to just use
> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
> have expected BCBio.GFF.parse(...) to iterate over the
> features in a GFF file.
> 
> Also, and we'd touched on this before - I'd much prefer to
> have the GFF module quite "low level" using either new
> GFF-specific classes or simple Python objects (e.g. for
> each feature use a tuple of ints and strings for the first
> feature columns plus a dict for the final extendible
> column of annotation).
> 
> >From a technical point of view, a justification for
> this
> separation is the GFF details are not a perfect fit to the
> SeqRecord and SeqFeature objects and forcing their
> use adds unnecessary overheads for people wanting
> to work directly with the features themselves.
> 
> Also, by splitting the code into basic parsing and a
> SeqRecord/SeqFeature conversion layer (which I
> would put in Bio/SeqIO/GffIO.py) we can add the
> code in two steps (first GFF parsing, then SeqIO
> support).
> 
> I think this split is useful as this is a very big job to
> do
> properly: Once we have GFF to SeqRecord parsing,
> we need to try and ensure that it is compatible with the
> GenBank to SeqRecord parsing. This is important as
> we would in effect be extending Biopython to allow
> GFF3 to GenBank conversions. For testing all this,
> we can grab the same data in the two file formats
> (e.g. from the NCBI) and perhaps also use EMBOSS.
> 
> You may recall we talked to Peter Rice (from EMBOSS)
> about this - there are some important issues here like
> ontology mapping where we should be able to reuse a
> lot of the work EMBOSS has already done (and use the
> EMBOSS tools to help validate our mapping).
> 
> i.e. While I may be being overly cautious, I think that
> while adding GFF parsing and GFF to SeqRecord
> mapping is very important, it is also very complex.
> Therefore breaking this into a two stage task makes
> managing and testing it easier - as well as seeming
> a good idea for the code itself.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From MatatTHC at gmx.de  Sun Dec  6 09:18:40 2009
From: MatatTHC at gmx.de (Matthias Bernt)
Date: Sun, 06 Dec 2009 15:18:40 +0100
Subject: [Biopython-dev] Genetic Code
Message-ID: <20091206141840.67400@gmx.net>

Hi, 

The genetic codes you provide in Bio.Data.CodonTable are somewhat out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. 

Regards, 
Matthias
-- 
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

From biopython at maubp.freeserve.co.uk  Sun Dec  6 09:55:24 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 6 Dec 2009 14:55:24 +0000
Subject: [Biopython-dev] Genetic Code
In-Reply-To: <20091206141840.67400@gmx.net>
References: <20091206141840.67400@gmx.net>
Message-ID: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>

On Sun, Dec 6, 2009 at 2:18 PM, Matthias Bernt <MatatTHC at gmx.de> wrote:
> Hi,
>
> The genetic codes you provide in Bio.Data.CodonTable are somewhat
> out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code
> one start codon is missing.

Confirmed - could you file a bug please?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

Peter

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 10:07:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:07:23 -0500
Subject: [Biopython-dev] [Bug 2962] New: deprecated generic code
Message-ID: <bug-2962-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962

           Summary: deprecated generic code
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: MatatTHC at gmx.de


The genetic codes provided in Bio.Data.CodonTable are 
out of date. E.g. in the mitochondrial echinoderm (id 9) 
genetic code one start codon is missing.

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 10:07:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:07:43 -0500
Subject: [Biopython-dev] [Bug 2963] New: deprecated genetic code
Message-ID: <bug-2963-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2963

           Summary: deprecated genetic code
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: MatatTHC at gmx.de


The genetic codes provided in Bio.Data.CodonTable are 
out of date. E.g. in the mitochondrial echinoderm (id 9) 
genetic code one start codon is missing.

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 10:35:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:35:09 -0500
Subject: [Biopython-dev] [Bug 2963] deprecated genetic code
In-Reply-To: <bug-2963-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061535.nB6FZ9qY029156@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2963


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 10:35 EST -------


*** This bug has been marked as a duplicate of bug 2962 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 10:35:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:35:21 -0500
Subject: [Biopython-dev] [Bug 2962] deprecated generic code
In-Reply-To: <bug-2962-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061535.nB6FZL0I029172@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 10:35 EST -------
*** Bug 2963 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 11:09:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 11:09:28 -0500
Subject: [Biopython-dev] [Bug 2962] deprecated generic code
In-Reply-To: <bug-2962-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061609.nB6G9Sk9030056@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 11:09 EST -------
The NCBI codon tables have been updated from version 3.4 to 3.9, which adds
a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23).
Note that Table 14 which used to be called "Flatworm Mitochondrial" is now
called "Alternative Flatworm Mitochondrial", and "Flatworm Mitochondrial" is
now an alias for Table 9 ("Echinoderm Mitochondrial").

See:
http://github.com/biopython/biopython/commit/74ba9d295b2cd6c6fa6862e91f1e1e59300deeb6

Marking as fixed - but feel free to reopen this is I missed anything.

Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Sun Dec  6 11:11:08 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 6 Dec 2009 16:11:08 +0000
Subject: [Biopython-dev] Genetic Code
In-Reply-To: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>
References: <20091206141840.67400@gmx.net>
	<320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>
Message-ID: <320fb6e00912060811x1fc336ech6245221741372c62@mail.gmail.com>

On Sun, Dec 6, 2009 at 2:55 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Confirmed - could you file a bug please?
> http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Thanks - I was expecting to look at this next week, but had
some spare time this afternoon after all. It should be fixed,
you can grab the latest code and reinstall to test:
http://www.biopython.org/wiki/SourceCode

Peter

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 12:46:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 12:46:55 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061746.nB6Hkt7x032479@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #5 from cymon.cox at gmail.com  2009-12-06 12:46 EST -------
Created an attachment (id=1409)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view)
Patch for output file fomat options


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 13:50:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 13:50:08 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061850.nB6Io80P001234@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 13:50 EST -------
(In reply to comment #5)
> Created an attachment (id=1409)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) [details]
> Patch for output file fomat options
> 

Applied with minor changes to the docstrings - Bio.AlignIO will now
accept the default CLUSTALW output from MUSCLE as is. Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 14:10:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:10:01 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061910.nB6JA1p3001668@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #7 from cymon.cox at gmail.com  2009-12-06 14:10 EST -------
Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view)
Change Application cmdline tests to use subprocess module


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 14:36:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:36:27 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061936.nB6JaRo0002258@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 14:36 EST -------
(In reply to comment #7)
> Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details]
> Change Application cmdline tests to use subprocess module
> 

Lovely - applied as is - thanks again :)

Did you want to add tests for the new MUSCLE output options, or can we close
this bug now Cymon?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 14:43:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:43:12 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061943.nB6JhCOd002510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #9 from cymon.cox at gmail.com  2009-12-06 14:43 EST -------
(In reply to comment #8)
> (In reply to comment #7)
> > Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] [details]
> > Change Application cmdline tests to use subprocess module
> > 
> 
> Lovely - applied as is - thanks again :)
> 
> Did you want to add tests for the new MUSCLE output options, or can we close
> this bug now Cymon?

There's is one in the patch called: test_with_multiple_output_formats that
writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and
-clwstrict options.

I think it can be closed.
Cheers, C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 14:47:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:47:11 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061947.nB6JlBHi002609@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 14:47 EST -------
(In reply to comment #9)
> > Did you want to add tests for the new MUSCLE output options, or can we
> > close this bug now Cymon?
> 
> There's is one in the patch called: test_with_multiple_output_formats that
> writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and
> -clwstrict options.

So there is - I missed that. Lovely :)

Marking bug as fixed.
Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 04:16:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 04:16:42 -0500
Subject: [Biopython-dev] [Bug 2964] New: placing x-axis of graph track at
	the bottom or top of the track
Message-ID: <bug-2964-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964

           Summary: placing x-axis of graph track at the bottom or top of
                    the track
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: Daniel.Nicorici at gmail.com


By default when one uses the graph track the axis is placed automatically in
the middle of the track (which is given by the mean of the all values which are
plotted).

It would be great if the x-axis of the graph track could be placed at the
bottom of the track also and the plotting of the values could be done
accordingly. This would allow one to plot for example the short-read coverage
in next-gen sequencing data.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 04:48:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 04:48:11 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912070948.nB79mBTh022941@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #1 from Daniel.Nicorici at gmail.com  2009-12-07 04:48 EST -------
This has feature has been added in:

http://github.com/ndaniel/biopython/tree/x-axis_GenomeDiagram/Bio/Graphics/GenomeDiagram/

Also, here a small additional bug has been fixed, i.e. the line/bar graphs are
drawn from the first element to the last element of the graph and not from the
origin to the end of the x-axis as it was original.

One can specify that the x-axis should be drawn at bottom of the track by
specifying the argument x_axis='bottom' for new_track, e.g.
gdt_features=gdd.new_track(2,x_axis='bottom').

Below one may find two examples where the x-axis is drawn in the middle (as it
is originally done by the GenomeDiagram) and bottom of the track (the new
feature added to GenomeDiagram).

====Example_1:_Using_Graph_from_GenomeDiagram_where_the_x-axis_is_at_the_middle_of_track(as_it_is_originally)=============================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()

# Add three features
feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2)
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.append((250,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
gds_features.new_graph(coverage, 'coverage', style='bar')

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
============================================


====Example_2:_Using_Graph_from_GenomeDiagram_where_x-axis_is_at_the_bottom_of_track=============================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()

# Add three features
feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2,x_axis='bottom')
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.append((250,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
gds_features.new_graph(coverage, 'coverage', style='bar')

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
============================================

Best Regards,
Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 05:55:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 05:55:12 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071055.nB7AtCol024504@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-07 05:55 EST -------
I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
having the x-axis line at the middle y-value (center or centre=None). Try
setting 
center to zero when you create the Graph object. If you could give a cut down
example it would be easier to help.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Dec  7 06:34:11 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 11:34:11 +0000
Subject: [Biopython-dev] Biopython git access for Cymon
Message-ID: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>

Dear all,

It is a little overdue, but I'm pleased to announce Cymon Cox
now has write access to the Biopython repository.

Cymon has made contributions to Biopython over many years,
initially with the modules Bio.Nexus and Bio.Sequencing
(together with Frank Kauff), and more recently with
improvements to our BioSQL wrappers (especially on
PostgreSQL) and his recent work on alignment wrappers.

I'd previously talked to Cymon about giving him CVS access,
and he said we might as well wait until after the git transition.
I've just checked in a few patches on his behalf (alignment tool
wrappers), which served to remind me of this - it would have
saved me some work to just say "Yes, please check that in" ;)

On behalf of the Biopython project, welcome (fully) to the
development team Cymon, and thanks again for all your
work to date.

Regards,

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 06:38:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:38:27 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071138.nB7BcROx026201@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #3 from Daniel.Nicorici at gmail.com  2009-12-07 06:38 EST -------
(In reply to comment #2)
> I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
> having the x-axis line at the middle y-value (center or centre=None). Try
> setting 
> center to zero when you create the Graph object. If you could give a cut down
> example it would be easier to help.

Yes, I am referring to GenomeDiagram.

If one sets the center to zero then the lower half of the track (below the
x-axis) is always empty and unused when all values are positive, e.g. CG
content, short-read coverage have positive values.

This feature allows one to use the entire track for plotting and not only half
of it when setting center to zero is used.

Best Regards,
Daniel

> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 06:48:32 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:48:32 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071148.nB7BmW8A026423@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #4 from Daniel.Nicorici at gmail.com  2009-12-07 06:48 EST -------
Here is the cut down example of what I mean:

=====================================================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()


feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2)
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0)))
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0)))
coverage.append((250,float(0)))
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))
gds_features.new_graph(coverage, 'coverage', style='bar',center=0)

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
===========================================

The values which are plotted here in this are in range 0 to 400 and the
GenomeDiagram's y-axis range is from -400 to 400 when center is set to 0. It is
really odd choice of a y-axis range of -n to n when all the values which are to
be plotted are in range 0 to n.

The feature proposed here allows the entire track to be used instead of using
half of the track and also having a better range for y-axis.


(In reply to comment #2)
> I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
> having the x-axis line at the middle y-value (center or centre=None). Try
> setting 
> center to zero when you create the Graph object. If you could give a cut down
> example it would be easier to help.
> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 06:59:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:59:33 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071159.nB7BxXs5026717@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
            Summary|placing x-axis of graph     |placing x-axis of graph
                   |track at the bottom or top  |track at the bottom or top
                   |of the track                |of the track in
                   |                            |GenomeDiagram


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-07 06:59 EST -------
When I wrote comment 2, I hadn't seen comment 1 with the github link and
examples.

Leighton and I had (some time ago now) chatted about a related enhancement
allowing the user to give the y-limits. With than in mind, it makes sense to
give the x-axis vertical position in terms of a y-coordinate (rather than a few
limited options like top, middle and bottom). This would be more flexible.

Marking this as an enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at 50mail.com  Mon Dec  7 07:12:45 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 7 Dec 2009 07:12:45 -0500
Subject: [Biopython-dev] Biopython git access for Cymon
In-Reply-To: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>
References: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>
Message-ID: <20091207121245.GM51407@sobchak.mgh.harvard.edu>

Hi all;

> It is a little overdue, but I'm pleased to announce Cymon Cox
> now has write access to the Biopython repository.
> 
> Cymon has made contributions to Biopython over many years,
> initially with the modules Bio.Nexus and Bio.Sequencing
> (together with Frank Kauff), and more recently with
> improvements to our BioSQL wrappers (especially on
> PostgreSQL) and his recent work on alignment wrappers.

Awesome. Congrats Cymon and thanks for all your excellent work. Well
deserved.

Brad

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 07:15:03 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 07:15:03 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071215.nB7CF3pE027513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #6 from Daniel.Nicorici at gmail.com  2009-12-07 07:15 EST -------


(In reply to comment #5)
> When I wrote comment 2, I hadn't seen comment 1 with the github link and
> examples.
> 

;-)

> Leighton and I had (some time ago now) chatted about a related enhancement
> allowing the user to give the y-limits.

I think that it is need enhancement. Let's see if others think that same! ;-)

> With than in mind, it makes sense to
> give the x-axis vertical position in terms of a y-coordinate (rather than a few
> limited options like top, middle and bottom). This would be more flexible.

This sounds good and I agree that it is more flexible.

Indeed that options like "top, middle, bottom" are limited but still the
scaling is done automatically and the user does not have to know in what range
are his/her values are and what are the minimum and maximum and what axis
position matches all the graphs which he/she wants to generate.

I am sure that this can be done better than I did it.

> 
> Marking this as an enhancement.

Ok.

Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 08:03:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:03:14 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071303.nB7D3Esa029362@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #7 from lpritc at scri.sari.ac.uk  2009-12-07 08:03 EST -------
(In reply to comment #6)
> 
> (In reply to comment #5)
> > Leighton and I had (some time ago now) chatted about a related enhancement
> > allowing the user to give the y-limits.
> 
> I think that it is need enhancement. Let's see if others think that same! ;-)

Oh, it definitely does! ;)  Thank you for taking the time to improve it.

> > With than in mind, it makes sense to
> > give the x-axis vertical position in terms of a y-coordinate (rather than a few
> > limited options like top, middle and bottom). This would be more flexible.
> 
> This sounds good and I agree that it is more flexible.

This is my preferred option.

> Indeed that options like "top, middle, bottom" are limited but still the
> scaling is done automatically and the user does not have to know in what range
> are his/her values are and what are the minimum and maximum and what axis
> position matches all the graphs which he/she wants to generate.
> 
> I am sure that this can be done better than I did it.

By allowing the position of the axis to take any value within the data range,
this still allows 'top', 'middle' and 'bottom' to be defined as functions of
the data with, e.g.

x_axis_pos = min(data)          # bottom
x_axis_pos = max(data)         # middle
x_axis_pos = median(data)   # top

and also allows for explicit placing of the axis at specified points on the
y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.)

Cheers,

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 08:05:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:05:11 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071305.nB7D5B22029508@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #8 from lpritc at scri.sari.ac.uk  2009-12-07 08:05 EST -------
(In reply to comment #7)

> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # middle
> x_axis_pos = median(data)   # top

x_axis_pos = min(data)          # bottom
x_axis_pos = max(data)         # top
x_axis_pos = median(data)   # middle

D'oh!

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 08:25:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:25:29 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071325.nB7DPTSH030274@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #9 from Daniel.Nicorici at gmail.com  2009-12-07 08:25 EST -------

(In reply to comment #8)

Ok.

> (In reply to comment #7)
> 
> > x_axis_pos = min(data)          # bottom
> > x_axis_pos = max(data)         # middle
> > x_axis_pos = median(data)   # top
> 
> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # top
> x_axis_pos = median(data)   # middle
> 
> D'oh!
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Dec  7 08:28:10 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 13:28:10 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
Message-ID: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>

Hi all,

I would like us to do the Biopython 1.53 release this month.

We still have lots of new stuff that hasn't yet landed on the
trunk, but despite that, looking at the NEWS file we have
had plenty of improvements in the two months and a bit
since Biopython 1.52 was released:

http://biopython.open-bio.org/SRC/biopython/NEWS
http://github.com/biopython/biopython/blob/master/NEWS

One good reason for doing Biopython 1.53 soon is the
NCBI said they plan to start using the new Jan 2010 DTD
files for MedLine/PubMed as early as mid December:
http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html

Any comments on how things stand on the trunk - is there
anything people think needs to be fixed before the release?

Thanks,

Peter

From eric.talevich at gmail.com  Mon Dec  7 11:33:30 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 7 Dec 2009 11:33:30 -0500
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
Message-ID: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>

On Mon, Dec 7, 2009 at 8:28 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Hi all,
>
> I would like us to do the Biopython 1.53 release this month.
>
> We still have lots of new stuff that hasn't yet landed on the
> trunk, but despite that, looking at the NEWS file we have
> had plenty of improvements in the two months and a bit
> since Biopython 1.52 was released:
>
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://github.com/biopython/biopython/blob/master/NEWS
>
> One good reason for doing Biopython 1.53 soon is the
> NCBI said they plan to start using the new Jan 2010 DTD
> files for MedLine/PubMed as early as mid December:
> http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html
>
> Any comments on how things stand on the trunk - is there
> anything people think needs to be fixed before the release?
>
>
I'll chime in about the status of the Summer of Code stuff.

For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
the TreeIO API will work independently of file formats. For Bio.Tree, I'm
about halfway done porting the Nexus tree methods, though it'll go faster
now that the semester's over. (I'll post the details and ask for a code
review soon.)

My phyloxml branch won't be ready to land in time for a December release,
but merging it into the trunk right after that is feasible. That would
everyone time to try it out and suggest changes before Biopython 1.54
cements the API.

Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
touched since Aug. 19. It will need some love before it can be merged into
the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
rescue it after TreeIO lands?

Cheers,
Eric

From biopython at maubp.freeserve.co.uk  Mon Dec  7 11:48:34 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 16:48:34 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
Message-ID: <320fb6e00912070848i4153ee33w9df5c7df65a4c225@mail.gmail.com>

On Mon, Dec 7, 2009 at 4:33 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> I'll chime in about the status of the Summer of Code stuff.

Thanks

> For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
> and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
> the TreeIO API will work independently of file formats. For Bio.Tree, I'm
> about halfway done porting the Nexus tree methods, though it'll go faster
> now that the semester's over. (I'll post the details and ask for a code
> review soon.)
>
> My phyloxml branch won't be ready to land in time for a December release,
> but merging it into the trunk right after that is feasible. That would
> everyone time to try it out and suggest changes before Biopython 1.54
> cements the API.

That is what I was hoping for. Fingers crossed Tiago will be able to
spare some time to go over the basics of the phyloxml and TreeIO
work - more eyes on the code would be great.

> Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
> touched since Aug. 19. It will need some love before it can be merged into
> the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
> rescue it after TreeIO lands?

That sounds good as a tentative plan - Nick may want to be more
involved, but you would be the next logical choice to handle this.

Cheers,

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 13:56:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 13:56:20 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071856.nB7IuKI7007552@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #10 from Daniel.Nicorici at gmail.com  2009-12-07 13:56 EST -------


(In reply to comment #7)
> (In reply to comment #6)
> > 
> > (In reply to comment #5)
> > > Leighton and I had (some time ago now) chatted about a related enhancement
> > > allowing the user to give the y-limits.
> > 
> > I think that it is need enhancement. Let's see if others think that same! ;-)
> 
> Oh, it definitely does! ;)  Thank you for taking the time to improve it.
> 
> > > With than in mind, it makes sense to
> > > give the x-axis vertical position in terms of a y-coordinate (rather than a few
> > > limited options like top, middle and bottom). This would be more flexible.
> > 
> > This sounds good and I agree that it is more flexible.
> 
> This is my preferred option.
> 
> > Indeed that options like "top, middle, bottom" are limited but still the
> > scaling is done automatically and the user does not have to know in what range
> > are his/her values are and what are the minimum and maximum and what axis
> > position matches all the graphs which he/she wants to generate.
> > 
> > I am sure that this can be done better than I did it.
> 
> By allowing the position of the axis to take any value within the data range,
> this still allows 'top', 'middle' and 'bottom' to be defined as functions of
> the data with, e.g.
> 
> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # middle
> x_axis_pos = median(data)   # top
> 
> and also allows for explicit placing of the axis at specified points on the
> y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.)
>

It looks a little bit confusing too me now because I see that there are two
sides of the problem (or two bugs?), as following:
1) drawing a line orthogonal on y-axis at any position which represents the
x-axis (this does not affect how the values are plotted and in what interval)
2) in the case of bar plotting (partially affects also linear plotting), the
values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
y=-inf...+inf) unless the user specify something else and not to be drawn by
default from some arbitrary point, e.g. median, mean, etc., as it is done now. 

I have the feeling that the solution presented here affects only the point 1)
and not 2).

Please, could you elaborate more such that maybe I could implement your
suggestion?

BR,
Daniel

> Cheers,
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 03:49:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 03:49:59 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912080849.nB88nx00030750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #11 from lpritc at scri.sari.ac.uk  2009-12-08 03:49 EST -------
(In reply to comment #10)

> It looks a little bit confusing too me now because I see that there are two
> sides of the problem (or two bugs?), as following:
> 1) drawing a line orthogonal on y-axis at any position which represents the
> x-axis (this does not affect how the values are plotted and in what interval)
> 2) in the case of bar plotting (partially affects also linear plotting), the
> values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
> y=-inf...+inf) unless the user specify something else and not to be drawn by
> default from some arbitrary point, e.g. median, mean, etc., as it is done now. 
> 
> I have the feeling that the solution presented here affects only the point 1)
> and not 2).
> 
> Please, could you elaborate more such that maybe I could implement your
> suggestion?

I see why you've distinguished between the two cases, but I think they can be
handled by the earlier suggestion to implement the location of the x-axis in
the context of also allowing the user to set y-axis limits (see comment #5). 
It's the combination of allowing y-axis limits and the location of x-axis
crossing that gives the greatest flexibility.  For example, if y-limit
selection and x-axis crossing point were under user control...

...if you wanted to continue with the current behaviour, you'd not set any
y-limits, and not specify the location of the x-axis.

...if you wanted to draw short read coverage, you'd set the lower y-limit to 0,
and set the location of the x-axis to zero (if that was not the default).  This
should draw bars with their bases on the bottom/inner of the track, and the
scale running along the bottom/inner of the track. 

...if you wanted to represent some data as a bar graph, with a special meaning
for the mean (or median) value, you could optionally set y-limits, but have the
x-axis cross at mean(data) or median(data).  This should draw bars with their
bases on the x-axis, and the axis located at the mean/median value for the
data.

Does this help clarify what I meant, above?

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at 50mail.com  Tue Dec  8 08:33:12 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 8 Dec 2009 08:33:12 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
Message-ID: <20091208133312.GE74538@sobchak.mgh.harvard.edu>

Peter and Michiel;
Thanks for the thoughts. Tried to combine these below:

Michiel:
> I didn't realize that the GFF parser returns SeqRecords. I agree with
> Peter that a parser returning SeqRecords should be accessed through
> Bio.SeqIO, while a lower-level parser can live in Bio.GFF.

Peter:
> My point is the moment you include GFF -> SeqRecord
> code (even if not explicitly via the Bio.SeqIO namespace)
> it opens us up to people giving these SeqRecord objects
> to SeqIO for output (e.g. as GenBank).
[...]
> Worth goals, but if by "Produce Biopython objects from
> GFF3/GTF/GFF2 files" you mean SeqRecords with
> SeqFeatures, (as I said above) we are opening up the
> GFF to GenBank can of worms. There is no "later" :(

We seem to have a very different view of SeqRecords/SeqFeatures. To
me, they are a convenient well thought out object model to capture
annotations and features associated with a sequence. They have the
advantage that people who have used Biopython will be familiar with
the object model. That's why I chose to use them for representing GFF,
as opposed to a GFF specific class.

You are adding on two extra conditions:

- If something produces SeqRecords, it needs to come from SeqIO.
- If you have a SeqRecord, it has to be compatible with GenBank
  output.

This quickly ties us up to the not-that-great GenBank way of
representing features and locations, and makes it hard to add on more
flexible formats like GFF. Converting between very different feature
representations is going to be complex and a whole new problem; 
why do you have to support that to use a SeqRecord in your code?

Overall, I'd like to see it be simpler for people to contribute and
add parsers to Biopython.

> I still think it would be useful to have Bio/GFF/Parser.py (or
> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
> similar) to turn this into SeqRecord and SeqFeature objects.

This appears to be about where the code lives. Personally, I prefer
having things under the GFF namespace and then building thin
wrappers around if in SeqIO if desired. Practically, I want to leave
SeqIO inclusion out right now and try to argue only for getting the
GFF specific parser in.

> The nested features that worry me. Perhaps the existing
> location operator (e.g. "join") could be set to something
> like "parent/child" if the subfeatures is used to hold child
> features rather than the elements of a join? We need
> the GenBank output code etc to be able to tell these
> apart reliably.

Right now I don't set the location operator at all. The parent/child
model is much more flexible than the GenBank operator stuff, so
maybe the right way to go is to phase out using the operator at all.
If it is set to nothing than parent/child is assumed, and GenBank
output can add in all of the operators at output time.

Brad

From chapmanb at 50mail.com  Tue Dec  8 09:03:54 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 8 Dec 2009 09:03:54 -0500
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
Message-ID: <20091208140354.GG74538@sobchak.mgh.harvard.edu>

Hi Eric;

> I'll chime in about the status of the Summer of Code stuff.
> 
> For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
> and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
> the TreeIO API will work independently of file formats. For Bio.Tree, I'm
> about halfway done porting the Nexus tree methods, though it'll go faster
> now that the semester's over. (I'll post the details and ask for a code
> review soon.)
> 
> My phyloxml branch won't be ready to land in time for a December release,
> but merging it into the trunk right after that is feasible. That would
> everyone time to try it out and suggest changes before Biopython 1.54
> cements the API.

This sounds awesome. Thanks for keeping up with the code; looking
forward to seeing it get in to the main branch.

> Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
> touched since Aug. 19. It will need some love before it can be merged into
> the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
> rescue it after TreeIO lands?

No plan from my end; hopefully Nick will chime in. If Nick doesn't
have time, it would be beyond great if you could finalize and merge the
most useful parts. Thanks for volunteering on this.

Brad

From biopython at maubp.freeserve.co.uk  Tue Dec  8 09:15:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 14:15:30 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091208133312.GE74538@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>

On Tue, Dec 8, 2009 at 1:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> We seem to have a very different view of SeqRecords/SeqFeatures. To
> me, they are a convenient well thought out object model to capture
> annotations and features associated with a sequence. They have the
> advantage that people who have used Biopython will be familiar with
> the object model. That's why I chose to use them for representing GFF,
> as opposed to a GFF specific class.

OK, but (as I expand on below), your planned use of the SeqFeature
(while legitimate) appears to risk being inconsistent with existing parts
of the Biopython code base (in particular, GenBank output, and maybe
GenomeDiagram).

> You are adding on two extra conditions:
>
> - If something produces SeqRecords, it needs to come from SeqIO.

It was more of an aim than a rule. Isn't true of all the existing code for
historical reasons, e.g. Bio.SeqIO "genbank" support acts as a thin
wrapper to Bio.GenBank which does offer SeqRecord objects. For
a user perspective, if you want a SeqRecord from a sequence file,
the first point of call should be Bio.SeqIO.

> - If you have a SeqRecord, it has to be compatible with GenBank
> ?output.
>
> This quickly ties us up to the not-that-great GenBank way of
> representing features and locations, and makes it hard to add on more
> flexible formats like GFF. Converting between very different feature
> representations is going to be complex and a whole new problem;
> why do you have to support that to use a SeqRecord in your code?

The big aim of Bio.SeqIO was to allow using many different file
formats with the same object representation. Implicitly (assuming
the required data is present), input from one file format could be
output in another format. The problem lots of current code in
Biopython uses SeqRecord/SeqFeatures in a particular way
(GenBank/EMBL parsers, GenomeDiagram, GenBank output).
Unfortunately, for GFF files it seems this isn't the most natural
way to use SeqFeature objects (where you need real nesting).

> Overall, I'd like to see it be simpler for people to contribute and
> add parsers to Biopython.

I hope that for simple file formats this already the case. But for
annotation rich file formats, if we want SeqIO to continue to be
useful for conversion, this by neccessity requires some
awareness of how the other parsers/writers will represent
the same data.

One option for contributions is to offer a "low level" parser
using basic Python datatypes or simple file-type specific
records. Then someone more familiar with SeqIO and the
other file formats can write a SeqRecord converter in order
to integrate it into Bio.SeqIO.  This is basically how Ace,
Phred, SwissProt (and probably others) were done.

>> I still think it would be useful to have Bio/GFF/Parser.py (or
>> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
>> similar) to turn this into SeqRecord and SeqFeature objects.
>
> This appears to be about where the code lives. Personally, I prefer
> having things under the GFF namespace and then building thin
> wrappers around if in SeqIO if desired. Practically, I want to leave
> SeqIO inclusion out right now and try to argue only for getting the
> GFF specific parser in.

Where the code lives isn't a big issue. You can do a thin
wrapper in Bio.SeqIO calling Bio.GFF (where Bio.GFF makes
SeqRecords), or a fat wrapper (where Bio.GFF does not make
SeqRecords).

The problem (as I see it) is SeqIO integration and how your
desired use of SeqFeatures will impact this.

>> The nested features that worry me. Perhaps the existing
>> location operator (e.g. "join") could be set to something
>> like "parent/child" if the subfeatures is used to hold child
>> features rather than the elements of a join? We need
>> the GenBank output code etc to be able to tell these
>> apart reliably.
>
> Right now I don't set the location operator at all. The parent/child
> model is much more flexible than the GenBank operator stuff, so
> maybe the right way to go is to phase out using the operator at all.
> If it is set to nothing than parent/child is assumed, and GenBank
> output can add in all of the operators at output time.

I agree that using SeqFeature sub-features for parent/child
relationships makes a lot of sense. BUT, we have a lot of
existing code which follows the GenBank/EMBL parser
route of using this for joins (and a few other corner cases).

There are other annoyances with the current SeqFeature
and FeatureLocation model - the strand and location operator
are part of the SeqFeature not the FeatureLocation. It would
make more sense to me to move them to the FeatureLocation
(and have that handle joins itself). Or, move everything to
the SeqFeature (and get rid of the FeatureLocation object).

I think the best route forward is to plan a transition of the
SeqFeature object to allow nice handling of real nested
relationships, and a reworking of complex location handling.
Then (hopefully) we can have the GenBank/EMBL/GFF3
parsers all using the SeqFeature in a consistent way.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 11:56:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 11:56:17 -0500
Subject: [Biopython-dev] [Bug 2965] New: Updating Bio.Restriction with
	latest REBASE data
Message-ID: <bug-2965-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965

           Summary: Updating Bio.Restriction with latest REBASE data
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The Bio/Restriction/Restriction_Dictionary.py file hasn't been updated since
2004.

The latest REBASE restriction digest files seem to be from Nov 29 2009,
ftp://ftp.neb.com/pub/rebase/

This bug is to update Restriction_Dictionary.py to use the Nov 2009 data. I
have tried and failed as described below:

----------------------------------------------------------------------------

I manually downloading these files to the Scripts/Restriction directory:
ftp://ftp.neb.com/pub/rebase/emboss_e.912
ftp://ftp.neb.com/pub/rebase/emboss_r.912
ftp://ftp.neb.com/pub/rebase/emboss_s.912

And then ran ranacompiler.py which generated a new Restriction_Dictionary.py

As an aside, module sre is deprecate, re is suggested instead. Other
interesting output:

WARNING : HaeIV cut twice with different overhang length each time.            
        Unable to deal with this behaviour.             
        This enzyme will not be included in the database. Sorry.
        Checking : Anyway, HaeIV is not commercially available.


WARNING : TaqII has two different sites.


The new database contains 753 enzymes.

So far so good, but using the new Restriction_Dictionary.py the unit tests
fail:

$ python test_Restriction.py
Traceback (most recent call last):
  File "test_Restriction.py", line 6, in <module>
    from Bio.Restriction import *
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/__init__.py",
line 61, in <module>
    from Bio.Restriction.Restriction import *
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py",
line 2358, in <module>
    newenz = T(k, bases, enzymedict[k])
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py",
line 216, in __init__
    cls.compsite = re.compile(cls.compsite)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 188, in compile
    return _compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 241, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character in group name


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 12:02:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 12:02:42 -0500
Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest
	REBASE data
In-Reply-To: <bug-2965-42@http.bugzilla.open-bio.org/>
Message-ID: <200912081702.nB8H2g4b014553@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-08 12:02 EST -------
To be more precise, running Bio/Restriction/Restriction.py in IDLE and looking
at the stack track, the regular expression failing is for enzyme CviKI-1,

(?P<CviKI-1>[AG]GC[CT])|(?P<CviKI-1_as>[AG]GC[CT])

The problem seems to be the hyphen/minus sign in the enzyme name which is
being used as a group name in the regular expression. I think this is the
only Enzyme with this name. Since it can't be used as a python name either,
we should probably map it to an underscore:

>>> import re
>>> re.compile('(?P<CviKI\-1>[AG]GC[CT])|(?P<CviKI\-1_as>[AG]GC[CT])')
...
error: bad character in group name
>>> re.compile('(?P<CviKI_1>[AG]GC[CT])|(?P<CviKI_1_as>[AG]GC[CT])')
<_sre.SRE_Pattern object at 0xe8d700>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 12:50:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 12:50:29 -0500
Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest
	REBASE data
In-Reply-To: <bug-2965-42@http.bugzilla.open-bio.org/>
Message-ID: <200912081750.nB8HoTDW016476@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-08 12:50 EST -------
Fixed by mapping hyphen to an underscore.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From kellrott at gmail.com  Tue Dec  8 17:00:11 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Tue, 8 Dec 2009 14:00:11 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <20091208140354.GG74538@sobchak.mgh.harvard.edu>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
Message-ID: <bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>

Speaking of stuff that may not be ready for 1.53, but should start speeding
up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the
Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting
in the jython branch, but I can spin it into a separate branch).
Right now it's missing the code to parse HMMER2, there needs to be more
extensive unit testing, and the API needs to be nailed down with some
documentation.
Is there anybody else that needs HMMER and Pfam support?

Kyle

From biopython at maubp.freeserve.co.uk  Tue Dec  8 17:18:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:18:03 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
Message-ID: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:00 PM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> Speaking of stuff that may not be ready for 1.53, but should start speeding
> up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the
> Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting
> in the jython branch, but I can spin it into a separate branch).
> Right now it's missing the code to parse HMMER2, there needs to be more
> extensive unit testing, and the API needs to be nailed down with some
> documentation.
> Is there anybody else that needs HMMER and Pfam support?
>
> Kyle

That had caught my eye, and it is potentially of direct interest to
me personally. I will probably skip HMMER2 and go straight to
HMMER3 though ;)

On a related point, I am reasonably confident we can get most
of Biopython running on Jython 2.5.1 in time for the release.
Other than things that Jython doesn't support at all, i.e. the C
code, DTD parsing (needed for Bio.Entrez), and the lack of a
buffer function (not important, only used in deprecated code
now), the only remaining hurdle is Bio.Restriction, and I think
I have solved that. I will be testing this tomorrow (time
permitting). Your groundwork has been very useful here Kyle.

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec  8 17:30:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:30:20 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
	<320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
Message-ID: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>

On Tue, Dec 8, 2009 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> I agree that using SeqFeature sub-features for parent/child
> relationships makes a lot of sense. BUT, we have a lot of
> existing code which follows the GenBank/EMBL parser
> route of using this for joins (and a few other corner cases).
>
> There are other annoyances with the current SeqFeature
> and FeatureLocation model - the strand and location operator
> are part of the SeqFeature not the FeatureLocation. It would
> make more sense to me to move them to the FeatureLocation
> (and have that handle joins itself). Or, move everything to
> the SeqFeature (and get rid of the FeatureLocation object).
>
> I think the best route forward is to plan a transition of the
> SeqFeature object to allow nice handling of real nested
> relationships, and a reworking of complex location handling.
> Then (hopefully) we can have the GenBank/EMBL/GFF3
> parsers all using the SeqFeature in a consistent way.
>

Just to add some ideas to this thread for discussion,
on possible ways forward without breaking backwards
compatibility... hopefully this is clear, I did have a glass
of wine with dinner ;)

Given the way the existing SeqFeature list property
subfeatures is used (by the GenBank/EMBL parser
etc), would it make sense for the GFF needs to add
a new list for child features (say property "children"),
and perhaps another property (maybe "parent") which
can point back at the parent SeqFeature. i.e. A sort
of tree, allowing us to represent genes, exons, etc.

Note we may want to use weak references in the above
(children/parent references) to assist the python GC.

Given the above, potentially the GenBank/EMBL
parser could be enhanced to use these new properties
(e.g. for linking gene and CDS features in bacteria,
or CDS and mat_peptide features in viruses etc).

[This still leaves the ontology issues - which might
be best dealt with by the GenBank output code]

Peter

From kellrott at gmail.com  Tue Dec  8 17:42:54 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Tue, 8 Dec 2009 14:42:54 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
Message-ID: <bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>

>
> On a related point, I am reasonably confident we can get most
> of Biopython running on Jython 2.5.1 in time for the release.
> Other than things that Jython doesn't support at all, i.e. the C
> code, DTD parsing (needed for Bio.Entrez), and the lack of a
> buffer function (not important, only used in deprecated code
> now), the only remaining hurdle is Bio.Restriction, and I think
> I have solved that. I will be testing this tomorrow (time
> permitting).


The last bit for 'full' jython support is getting BioSQL working.
Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in
Jython.  My jython port also has work that moves the BioSQL interface from
the internal ORM to a SqlAlchemy interface.  Of course that is a little
controversial because it introduces a dependency on another python package.
Of course it takes care of sqlite and Java MySql connector support at the
same time, so it does have some pluses.

Kyle

From biopython at maubp.freeserve.co.uk  Tue Dec  8 17:46:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:46:19 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>
Message-ID: <320fb6e00912081446w303edd73qe3a5dad964314487@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:42 PM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> The last bit for 'full' jython support is getting BioSQL working.
> Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in
> Jython.? My jython port also has work that moves the BioSQL interface from
> the internal ORM to a SqlAlchemy interface.? Of course that is a little
> controversial because it introduces a dependency on another python package.
> Of course it takes care of sqlite and Java MySql connector support at the
> same time, so it does have some pluses.

Fair point w.r.t. "full" jython support ;)

I would be more comfortable with BioSQL on Jython working
directly with sqlite (once we add that to BioSQL) and the Java
MySql connector directly (without the extra dependency on
SQLAlchemy).

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec  8 18:38:04 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 23:38:04 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
	<320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
	<320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>
Message-ID: <320fb6e00912081538o635347ceh8e10aa4863e538e9@mail.gmail.com>

On Tue, Dec 8, 2009 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>
>> There are other annoyances with the current SeqFeature
>> and FeatureLocation model - the strand and location operator
>> are part of the SeqFeature not the FeatureLocation. It would
>> make more sense to me to move them to the FeatureLocation
>> (and have that handle joins itself). Or, move everything to
>> the SeqFeature (and get rid of the FeatureLocation object).
>>

In addition to the strand and location operator, there is also
(sometimes) a database cross reference (properties ref and
db_ref, e.g. in contig files). Again, this is conceptually part
of the feature location (and stored that way in BioSQL if I
recall correctly).

One example of where it would make sense to move things
like the database, operator and strand to the FeatureLocation
is the coded_by information in some GenPept file annotation,
a use case very recently raised on the main mailing list:
http://lists.open-bio.org/pipermail/biopython/2009-December/005910.html
The current FeatureLocation simply can't be used here -
although a full SeqFeature could be.

Peter

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 04:56:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 04:56:34 -0500
Subject: [Biopython-dev] [Bug 2966] New: Primer3Commandline does not use
	EMBOSS 6.1.0 arguments
Message-ID: <bug-2966-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966

           Summary: Primer3Commandline does not use EMBOSS 6.1.0 arguments
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


Several arguments for EMBOSS eprimer3 are different in version 6.1.0 from those
used in Primer3Commandline.  I have updated Primer3Commandline locally (and
added documentation strings), and will make this available via github with some
other proposed changes shortly, after talking to Peter.

This revealed another bug, which I will submit separately.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 05:07:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:07:14 -0500
Subject: [Biopython-dev] [Bug 2967] New: AbstractCommandline silently
	accepts invalid parameter options
Message-ID: <bug-2967-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967

           Summary: AbstractCommandline silently accepts invalid parameter
                    options
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


While investigating Bug 2996 I noticed that AbstractCommandline was silently
accepting invalid parameter options when passed by setting attributes.  For
example:

    cline = Primer3Commandline(bogus=True)
    cline.sequence = filename

raises the appropriate ValueError, as the parameter name 'bogus' is being
compared to the self.parameters list when setting, and is found not to be
valid.  However, the following code:

    cline = Primer3Commandline()
    cline.sequence = filename
    cline.bogus = True    # Invalid argument not flagged up
    cline.sequnce = True  # Mistyped argument not flagged up


silently sets the invalid cline.bogus and cline.sequnce attributes without
warning.  Parameters set via attribute are not validated with the
setter/getters defined for the properties in AbstractCommandline.__init__  This
could (did!) lead the user to think that parameters are set when they are not,
under at least two circumstances:

1) Typos in the parameter name
2) Using a parameter unsupported by the interface (see Bug 2996).

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 05:08:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:08:12 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091008.nB9A8Cc5008147@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 05:08 EST -------
Sorry, I'm referring to bug 2966 in the post above.  My bad.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 05:46:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:46:11 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091046.nB9AkBXi009268@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 05:46 EST -------
(In reply to comment #0)
> While investigating Bug 2996 I noticed that AbstractCommandline was silently
> accepting invalid parameter options when passed by setting attributes.  For
> example:
> 
>     cline = Primer3Commandline(bogus=True)
>     cline.sequence = filename
> 
> raises the appropriate ValueError, as the parameter name 'bogus' is being
> compared to the self.parameters list when setting, and is found not to be
> valid.  However, the following code:
> 
>     cline = Primer3Commandline()
>     cline.sequence = filename
>     cline.bogus = True    # Invalid argument not flagged up
>     cline.sequnce = True  # Mistyped argument not flagged up
> 
> 
> silently sets the invalid cline.bogus and cline.sequnce attributes without
> warning.  Parameters set via attribute are not validated with the
> setter/getters defined for the properties in AbstractCommandline.__init__
> This could (did!) lead the user to think that parameters are set when they
> are not, under at least two circumstances:
> 
> 1) Typos in the parameter name
> 2) Using a parameter unsupported by the interface

This is normal Python object behaviour - you can add any "property" like this
at run time,

>>> class Dummy(object) :
...     pass
... 
>>> d = Dummy()
>>> d.name = "Fred"
>>> dir(d)
['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',
'__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__str__', '__weakref__', 'name']
>>> d.name
'Fred'

We might still be able to block this via __setattr__, this needs some
experimentation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 07:23:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:23:34 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091223.nB9CNYtT012354@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #3 from lpritc at scri.sari.ac.uk  2009-12-09 07:23 EST -------
(In reply to comment #2)

> This is normal Python object behaviour - you can add any "property" like this
> at run time,

[...]

Oddly enough, I was already aware of that... ;)

The issue is that the setting of parameters via attributes fails silently, but
is demonstrated in the tutorial and is in any case often rather more convenient
than declaring the parameters on instantiation, so is very likely to be used in
anger.  This potentially (and *actually* in my case, when attempting to use
EMBOSS 6.1.0 parameter names with eprimer3) leads to cases where the user might
expect that command-line options have been set, when they in fact haven't.  

> We might still be able to block this via __setattr__, this needs some
> experimentation.

That seems to be the best route to me, initially.  It might be worth removing
the property magic in the AbstractCommandline.__init__(), and instead use
__setattr__, __getattr__, and __delattr__, having them behave appropriately for
known parameter names.

I'll have a go at doing that and put it in with the EMBOSS stuff I'm working
on, just now.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 07:28:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:28:07 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091228.nB9CS7vS012457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-12-09 07:28 EST -------
(In reply to comment #3)
> (In reply to comment #2)
>
> > We might still be able to block this via __setattr__, this needs some
> > experimentation.
> 
> That seems to be the best route to me, initially.  It might be worth removing
> the property magic in the AbstractCommandline.__init__(), and instead use
> __setattr__, __getattr__, and __delattr__, having them behave appropriately for
> known parameter names.
> 
> I'll have a go at doing that and put it in with the EMBOSS stuff I'm working
> on, just now.

Peter has pointed out that he'd like to retain discoverability, and so restrict
the change to a validating __setattr__ - which seems reasonable.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 07:53:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:53:00 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091253.nB9Cr0cP013048@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #5 from lpritc at scri.sari.ac.uk  2009-12-09 07:53 EST -------
This works for me, at the moment:

    def __setattr__(self, name, value):
        """ Workaround for a user interface issue.  Without this __setattr__
            attribute-based assignment of parameters will silently accept
invalid 
            parameters, leading to known instances of the user assuming that 
            parameters for the application are set, when they are not.
            This workaround uses a whitelist of object attributes, and 
            sets the object attribute list as normal, for these.  Other
            attributes are assumed to be parameters, and passed to the 
            self.set_parameter method for validation and assignment.
        """
        attr_whitelist = ['parameters', 'program_name']     # Allowed
attributes
        if name not in attr_whitelist:       # If not in whitelist, treat
            self.set_parameter(name, value)  # as parameter
        else:
            self.__dict__[name] = value


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Dec  9 08:21:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 13:21:50 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
Message-ID: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:18 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> On a related point, I am reasonably confident we can get most
> of Biopython running on Jython 2.5.1 in time for the release.
> Other than things that Jython doesn't support at all, i.e. the C
> code, DTD parsing (needed for Bio.Entrez), and the lack of a
> buffer function (not important, only used in deprecated code
> now), the only remaining hurdle is Bio.Restriction, and I think
> I have solved that. I will be testing this tomorrow (time
> permitting). Your groundwork has been very useful here Kyle.

I'm stuck again with Bio.Restriction under Jython. I've got the
Bio/Restriction/Restriction_Dictionary.py to load under Jython
(just = the Nov 2009 update isn't helping to keep the code
size down), but doing test_Restriction.py hits the JVM limit.

Furthermore, there is a little bit of C code in Bio.Restriction
(which I think we can replace with plain python).

Peter

From biopython at maubp.freeserve.co.uk  Wed Dec  9 09:18:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 14:18:19 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
Message-ID: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>

On Wed, Dec 9, 2009 at 1:21 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Furthermore, there is a little bit of C code in Bio.Restriction
> (which I think we can replace with plain python).
>

I've replaced the C module Bio.Restriction.DNAUtils with
Python code, and deprecated it. I am surprised it was
written in C in the first place!

Peter

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 10:04:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 10:04:10 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091504.nB9F4AUM017626@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 10:04 EST -------
Fix committed - almost as is, I also added a doctest.

Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Dec  9 10:57:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 15:57:20 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
Message-ID: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>

Good news:

I've tweaked the RestrictionCompiler.py code to modify how it generates
Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries
incrementally. Together with the removal of the C code DNAUtils, this
means (after a clean install) that Jython likes Bio.Restriction and that
test_Restiction.py passes on Jython 2.5.1 (and C Python too).

Bad news:

I think I have broken test_CAPS.py (under both Jython and Python).
It looks like it hits some bits of Bio.Restriction are not covered by
test_Restiction.py

I'm working on it still ...

Peter

From biopython at maubp.freeserve.co.uk  Wed Dec  9 11:25:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 16:25:28 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
	<320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
Message-ID: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>

On Wed, Dec 9, 2009 at 3:57 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Good news:
>
> I've tweaked the RestrictionCompiler.py code to modify how it generates
> Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries
> incrementally. Together with the removal of the C code DNAUtils, this
> means (after a clean install) that Jython likes Bio.Restriction and that
> test_Restiction.py passes on Jython 2.5.1 (and C Python too).
>
> Bad news:
>
> I think I have broken test_CAPS.py (under both Jython and Python).
> It looks like it hits some bits of Bio.Restriction are not covered by
> test_Restiction.py
>
> I'm working on it still ...

Solved: the check_bases function in Bio.Restriction also used to
make things uppercase (but the docstring didn't make this clear
and the C code was non-obvious).

I think this means the whole test suite passes on Jython 2.5.1
(barring those bits with C code dependencies, BioSQL, or the
known Jython issues with DTD passing or the missing buffer
function).

Kyle - could you confirm this on your machine please?

Thanks,

Peter

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 12:57:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 12:57:37 -0500
Subject: [Biopython-dev] [Bug 2968] New: Modifications to Emboss eprimer3
	parser and associated files
Message-ID: <bug-2968-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2968

           Summary: Modifications to Emboss eprimer3 parser and associated
                    files
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


The existing Emboss primer3/eprimer3 code has a couple of issues, and some
scope for improvement:

- The existing Primer3.py parser code can only parse output when eprimer3 is
applied to a single sequence.  When eprimer3 is applied to multiple sequence
input, it groups all primers for all sequences into a single record, which may
incorrectly associate primers with the wrong sequences in downstream analysis.
- The current parser lacks an iterator for iterating over multiple sequence
output
- The current parser creates 'ghost' primers for all primer pairs, with length
zero and sequence as an empty string; it does not do this for internal oligos. 
A more intuitive solution might be to return None for absent primers/oligos
- The current data model stores all primer data as individual attributes.  It
might be more useful to group the attributes of individual primers into their
natural associations

I have written new code for Emboss/Primer3.py that adds iterator/multiple
sequence parsing functionality to the parser, and extensively revises the
object model for the data.  The Record and Primers objects are retained, but
each primer/oligo is now represented by a Primer object that collects the
relevant data together.  The Record object has a new attribute that allows the
sequence to be recorded directly, rather than having to be parsed from the
comments attribute.  The new data model retains the old attribute-based access
for compatibility, but adds direct access to the Primer objects (where present)
by .forward, .reverse and .oligo attributes, and by keywords.

One change was required to the unit test, to account for the reporting of
absent primers as None, rather than having 'null' attributes.  I've added two
further test output files, which may be rather large for the distribution (60kb
total), and doctests that use these.

The code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0

This enhancement request also relates to bug 2966.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 12:59:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 12:59:14 -0500
Subject: [Biopython-dev] [Bug 2968] Modifications to Emboss eprimer3 parser
	and associated files
In-Reply-To: <bug-2968-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091759.nB9HxErQ022462@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2968


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 12:59 EST -------
I forgot to mention - the new code still passes the test_EmbossPrimer.py unit
test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 13:01:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 13:01:13 -0500
Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS
	6.1.0 arguments
In-Reply-To: <bug-2966-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091801.nB9I1DMe022568@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 13:01 EST -------
I have made changes to Primer3Commandline that involve adding the EMBOSS 6.1.0
arguments, and docstrings to each argument.  I have also added doctests.

The proposed code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/9c0643e333b0cafb4e356426fb4902e0e9d2385c


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 13:03:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 13:03:30 -0500
Subject: [Biopython-dev] [Bug 2969] New: Addition of SeqmatchallCommandline
	to Emboss/Applications.py
Message-ID: <bug-2969-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969

           Summary: Addition of SeqmatchallCommandline to
                    Emboss/Applications.py
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


I thought it would be useful to have a command line wrapper to the EMBOSS
seqmatchall application, and have added this to Emboss/Applications.py, with
doctests.

The proposed code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/ced72a34b2565b97f3ad2c77a66e1083375cff02


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From kellrott at gmail.com  Wed Dec  9 14:22:01 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Wed, 9 Dec 2009 11:22:01 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
	<320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
	<320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>
Message-ID: <bb02be080912091122j36c53354k32144b75ae0bb28e@mail.gmail.com>

> Kyle - could you confirm this on your machine please?
>

It looks like the master branch is working well.
I guess the next step will be looking into the zxJDBC to expand the BioSQL
ORM.
Intro can be found at: http://www.informit.com/articles/article.aspx?p=26143

Kyle

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 16:53:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 16:53:42 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092153.nB9LrgYN027652@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 16:53 EST -------
A nice easy one to wrap at first glance. I would like to also include the
"aformat" output to set the output alignment format (useful to set to pair or
simple for AlignIO to parse it as the "emboss" alignment format - see the
needle and water wrappers). You could then also add a run time test to
test_Emboss.py piping this to AlignIO... ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 17:42:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 17:42:26 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092242.nB9MgQS9028588@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #8 from chapmanb at 50mail.com  2009-12-09 17:42 EST -------
Great idea Peter -- happy to get this in. It's now on a branch here:

http://github.com/chapmanb/biopython/tree/biosql-sqlite

It would be excellent if you, Cymon or anyone else interested could review and
merge it in.

This also includes a small typo fix on Bio/SeqIO/InsdcIO.py which isn't really
related but came up when I was running the BioSQL tests.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 18:51:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 18:51:14 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092351.nB9NpESn030303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 18:51 EST -------
Hi Brad,

My only immediate comment is it might make sense to split the BioSQL tests in
two, one for SQLite which we can try and make 100% automatic (at least on
Python 2.5+), and one for a user specified back end (MySQL, PostreSQL etc)
which requires a username and password.

Its midnight here in the UK, so feel free to tweak things this evening your
time and I'll take full look tomorrow.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 06:12:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 06:12:36 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101112.nBABCaRr015734@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-12-10 06:12 EST -------
(In reply to comment #1)
> A nice easy one to wrap at first glance. I would like to also include the
> "aformat" output to set the output alignment format (useful to set to pair or
> simple for AlignIO to parse it as the "emboss" alignment format - see the
> needle and water wrappers). You could then also add a run time test to
> test_Emboss.py piping this to AlignIO... ;)

That shouldn't take too long to do (though probably won't get done by me this
week).  Do we want to set any particular policy for the sequence-associated and
outfile-associated arguments?  Their inclusion in the command-line wrappers is
pretty inconsistent, which is why I left them out in the first place.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 06:15:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 06:15:09 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101115.nBABF90t015907@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #12 from Daniel.Nicorici at gmail.com  2009-12-10 06:15 EST -------
(In reply to comment #11)
> (In reply to comment #10)
> 
> > It looks a little bit confusing too me now because I see that there are two
> > sides of the problem (or two bugs?), as following:
> > 1) drawing a line orthogonal on y-axis at any position which represents the
> > x-axis (this does not affect how the values are plotted and in what interval)
> > 2) in the case of bar plotting (partially affects also linear plotting), the
> > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
> > y=-inf...+inf) unless the user specify something else and not to be drawn by
> > default from some arbitrary point, e.g. median, mean, etc., as it is done now. 
> > 
> > I have the feeling that the solution presented here affects only the point 1)
> > and not 2).
> > 
> > Please, could you elaborate more such that maybe I could implement your
> > suggestion?
> 
> I see why you've distinguished between the two cases, but I think they can be
> handled by the earlier suggestion to implement the location of the x-axis in
> the context of also allowing the user to set y-axis limits (see comment #5). 
> It's the combination of allowing y-axis limits and the location of x-axis
> crossing that gives the greatest flexibility.  For example, if y-limit
> selection and x-axis crossing point were under user control...
> 
> ...if you wanted to continue with the current behaviour, you'd not set any
> y-limits, and not specify the location of the x-axis.
> 
> ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0,
> and set the location of the x-axis to zero (if that was not the default).  This
> should draw bars with their bases on the bottom/inner of the track, and the
> scale running along the bottom/inner of the track. 
> 
> ...if you wanted to represent some data as a bar graph, with a special meaning
> for the mean (or median) value, you could optionally set y-limits, but have the
> x-axis cross at mean(data) or median(data).  This should draw bars with their
> bases on the x-axis, and the axis located at the mean/median value for the
> data.


I submitted the changes which do somehow what is described above, i.e. still by
default the x-axis is drawn in the middle of the track (it is still left for
now like this in order not to change the default behavior of GenomeDiagram). If
the x-axis is specified to be drawn at the bottom or top of the track then the
x-axis is drawn there and the values for bars/lines in the graph are drawn
using zero-based (if the some values are positive and other are negative) or
min (if all values are positive) or max (all values are negative). Hence only
when specifying the x-axis to be drawn at the bottom or top for the track, the
behavior of the graph and plotting are affected. The limits are computed
automatically.

> 
> Does this help clarify what I meant, above?

It helped. Thanks!

BR,
Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Dec 10 07:20:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Dec 2009 12:20:55 +0000
Subject: [Biopython-dev] Removing C implementation of deprecated listfns,
	mathfns, stringfns
Message-ID: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>

Hi all,

The modules listfns, mathfns, stringfns are now all deprecated. They
all have both a C implementation and a pure Python implementation.

We could wait for the complete deprecation process, and remove
the C code when the Python code gets removed. However, I would
like remove their C implementations for the next release, as this will
simplify our code base.

The only downside is anyone still using these modules will get
a deprecation warning and a possible slow down (as the C code
wouldn't exist any more). Also anyone using the C code directly
will be in trouble (but no-one should be doing that...).

Any comments? Objections?

Thanks,

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 07:39:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:39:15 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101239.nBACdFtu018207@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #10 from chapmanb at 50mail.com  2009-12-10 07:39 EST -------
Thanks Peter. All of the tests will run on SQLite provided sqlite3 is
installed, so there is no need to split them. I enabled SQLite by default, so
they will run automatically if a user has sqlite3 and fail gracefully with a
dependency error if not.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 07:43:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:43:28 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101243.nBAChSHg018300@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:43 EST -------
(In reply to comment #2)
> Do we want to set any particular policy for the sequence-associated and
> outfile-associated arguments?  Their inclusion in the command-line wrappers
> is pretty inconsistent, which is why I left them out in the first place.

In the long term, I'd like us to look at generating the wrappers automatically
from the EMBOSS ACD files which define their tool options. For now, since some
EMBOSS tools have so many options, they have been added in a somewhat ad-hoc
basis based on what the coder thought most important, or user feedback.

Fix checked in with addition of aformat option.

Thanks! Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 07:52:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:52:16 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101252.nBACqGp6018512@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:52 EST -------
(In reply to comment #10)
> Thanks Peter. All of the tests will run on SQLite provided sqlite3 is
> installed, so there is no need to split them. I enabled SQLite by default, so
> they will run automatically if a user has sqlite3 and fail gracefully with a
> dependency error if not.

That's great as is. I was thinking about something more: What I meant was, I
want to be able to run all the tests on SQLite (by default) AND on another back
end (e.g. MySQL) if the user has configured it. Otherwise we (as developers)
have to manually switch the BioSQL settings and rerun the BioSQL unit tests.

I will be able to test the effect of your changes on MySQL, hopefully Cymon
can do this on PostgreSQL - not that I anticipate and regressions, but best
to be sure ;)

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 07:56:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:56:44 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101256.nBACuheQ018635@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:56 EST -------
(In reply to comment #11)
> 
> That's great as is. I was thinking about something more: What I meant was, I
> want to be able to run all the tests on SQLite (by default) AND on another
> back end (e.g. MySQL) if the user has configured it. Otherwise we (as
> developers) have to manually switch the BioSQL settings and rerun the BioSQL
> unit tests.
> 

On reflection, that kind of improvement can wait until after Biopython 1.53 is
out. It would be great to make it completely general so that if you have all
the backends installed the test suite could check on SQLite, MySQL, PostgreSQL
etc.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 08:15:45 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 08:15:45 -0500
Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM
	records (Bio.PDB.PDBParser)
In-Reply-To: <bug-2495-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101315.nBADFj7O019533@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2495


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 08:15 EST -------
(In reply to comment #2)
> 
> Leaving bug open to deal with the output as well.
> 

Marking bug as fixed. I've just committed a change based on a patch from
Frederik Gwinner via GitHub - Bio.PDB.PDBIO should now save the element
on output now, 

Please reopen this bug if there is any problem.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Dec 10 09:25:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Dec 2009 14:25:53 +0000
Subject: [Biopython-dev] Removing C implementation of deprecated listfns,
	mathfns, stringfns
In-Reply-To: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>
References: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>
Message-ID: <320fb6e00912100625s48ba290cj1234d757da0b94f@mail.gmail.com>

On Thu, Dec 10, 2009 at 12:20 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> The modules listfns, mathfns, stringfns are now all deprecated. They
> all have both a C implementation and a pure Python implementation.
>
> We could wait for the complete deprecation process, and remove
> the C code when the Python code gets removed. However, I would
> like remove their C implementations for the next release, as this will
> simplify our code base.
>
> The only downside is anyone still using these modules will get
> a deprecation warning and a possible slow down (as the C code
> wouldn't exist any more). Also anyone using the C code directly
> will be in trouble (but no-one should be doing that...).
>
> Any comments? Objections?

I hope there are no objections as I've just done this on the trunk ;)

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 09:54:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 09:54:17 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101454.nBAEsHdi023376@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 09:54 EST -------
(In reply to comment #11)
> 
> I will be able to test the effect of your changes on MySQL, hopefully Cymon
> can do this on PostgreSQL - not that I anticipate and regressions, but best
> to be sure ;)
> 

The branch still merges cleanly onto the trunk (I had already manually applied
the Bio/SeqIO/InsdcIO.py date fix to the trunk). Testing "as is" on Mac OS X
10.5 with Apple's Python 2.5.2 uses SQLite, and works. Changing setup_BioSQL.py
to use MySQL also works fine :)

I have not yet tried this on Windows.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 13:12:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:12:23 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121812.nBCICNWt003206@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #14 from cymon.cox at gmail.com  2009-12-12 13:12 EST -------
(In reply to comment #11)
> I will be able to test the effect of your changes on MySQL, hopefully Cymon
> can do this on PostgreSQL - not that I anticipate and regressions, but best
> to be sure ;)

Is SQLite ":memory:" TESTDB working for you on Brads branch?

It fails for me, all else is fin (incl the SQLite file db). 

[cymon at spiro Tests]$ python test_BioSQL_SeqIO.py
Connecting to database
Removing existing sub-database 'biosql-seqio-test' (if exists)
Traceback (most recent call last):
  File "test_BioSQL_SeqIO.py", line 134, in <module>
    if db_name in server.keys():
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
123, in keys
    return self.adaptor.list_biodatabase_names()
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
306, in list_biodatabase_names
    "SELECT name FROM biodatabase")
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
355, in execute_and_fetch_col0
    self.execute(sql, args or ())
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
336, in execute
    self.dbutils.execute(self.cursor, sql, args)
  File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 53, in
execute
    cursor.execute(sql, args or ())
sqlite3.OperationalError: no such table: biodatabase


Perhaps its my sqlite installation - I'm not familiar with it:

[cymon at spiro BioSQL]$ dpkg -l|egrep sqlite
ii  libmono-sqlite2.0-cil                2.4.2.3+dfsg-2                        
    Mono Sqlite library (for CLI 2.0)
ii  libsqlite0                           2.8.17-6build1                        
    SQLite shared library
ii  libsqlite3-0                         3.6.16-1ubuntu1                       
    SQLite 3 shared library
ii  sqlite3                              3.6.16-1ubuntu1                       
    A command line interface for SQLite 3

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 13:33:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:33:15 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121833.nBCIXFCH003747@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-12 13:33 EST -------
(In reply to comment #14)
> (In reply to comment #11)
> > I will be able to test the effect of your changes on MySQL, hopefully Cymon
> > can do this on PostgreSQL - not that I anticipate and regressions, but best
> > to be sure ;)
> 
> Is SQLite ":memory:" TESTDB working for you on Brads branch?

I didn't try that specifically - just SQLite on disk. Brad?

>
> It fails for me, all else is fin (incl the SQLite file db)
>

But the good news is Brad's changes to BioSQL/*.py haven't caused any
regressions on PostreSQL :)

Thanks Cymon,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 13:39:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:39:07 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121839.nBCId7U6003831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #16 from cymon.cox at gmail.com  2009-12-12 13:39 EST -------
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #11)
> > > I will be able to test the effect of your changes on MySQL, hopefully Cymon
> > > can do this on PostgreSQL - not that I anticipate and regressions, but best
> > > to be sure ;)
> > 
> > Is SQLite ":memory:" TESTDB working for you on Brads branch?
> 
> I didn't try that specifically - just SQLite on disk. Brad?
> 
> >
> > It fails for me, all else is fin (incl the SQLite file db)
> >
> 
> But the good news is Brad's changes to BioSQL/*.py haven't caused any
> regressions on PostreSQL :)

Yep, no problems, although I only tried the psycopg2 driver (with and without
rules deletion).

Psycopg version 1 support has had a deprecation warning since version 1.53
http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it?

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 14:05:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 14:05:02 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121905.nBCJ52Nn004276@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #17 from chapmanb at 50mail.com  2009-12-12 14:05 EST -------
Thanks Cymon -- glad nothing is broken on Postgres. 

The in memory database (:memory:) doesn't work for the tests, because they
assume a database created by previous test cases. Since the memory one keeps
going away, they will get plenty of errors about non-existing tables. It would
work in theory with some test re-writing, but it's not too necessary.

Sorry, should have added a note about this. Thanks again for double checking
that everything works.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 14:41:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 14:41:12 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121941.nBCJfCXr004756@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-12 14:41 EST -------
(In reply to comment #16)
> 
> Yep, no problems, although I only tried the psycopg2 driver (with and
> without rules deletion).
> 
> Psycopg version 1 support has had a deprecation warning since version 1.53
> http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it?
> 
> C.
> 

Minor typo - Psycopg v1 support was deprecated in Biopython 1.51 (August 2009).
In line with the current deprecation policy, we aim for two releases with the
warning (which has happened already, 1.51 and 1.52) plus at least one year -
which means we can drop Psycopg v1 in summer 2010. Given in this case its a
fairly simple task for someone to just install Psycopg v2, we might look at
dropping the Psycopg v1 support a little quicker (say Biopython 1.54?).

See: http://www.biopython.org/wiki/Deprecation_policy

(In reply to comment #17)
> Thanks Cymon -- glad nothing is broken on Postgres. 
> 
> The in memory database (:memory:) doesn't work for the tests, because they
> assume a database created by previous test cases. Since the memory one keeps
> going away, they will get plenty of errors about non-existing tables. It would
> work in theory with some test re-writing, but it's not too necessary.
> 
> Sorry, should have added a note about this. Thanks again for double checking
> that everything works.

OK then - Brad, would you like to merge this to the trunk now (or in the next
few days), add a note about not using :memory: in Tests/setup_BioSQL.py, and
something to the NEWS file (with a proviso about the SQLite schema not yet
being official)?

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 14 07:48:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 Dec 2009 07:48:28 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912141248.nBECmS6b007714@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #19 from chapmanb at 50mail.com  2009-12-14 07:48 EST -------
Peter and Cymon -- thanks again for the help. Merged into the main trunk and
marking this as resolved.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Dec 14 11:24:44 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 14 Dec 2009 16:24:44 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
Message-ID: <320fb6e00912140824x3bfa58cfy8520142c0fea3a45@mail.gmail.com>

On Mon, Dec 7, 2009 at 1:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> One good reason for doing Biopython 1.53 soon is the
> NCBI said they plan to start using the new Jan 2010 DTD
> files for MedLine/PubMed as early as mid December:
> http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html

I've just checked the PubMed XML from efetch, and the
NCBI are still using the old 2009 DTD file. I guess it is
only midday in the USA, so plenty of time for them to
make the switch on 14 Dec as announced...

Once that happens (hopefully within hours), and I've
checked the Entrez parser is still happy, we can do
the Biopython release.

Until then, only documentation and unit tests fixes
on the trunk please.

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec 15 05:45:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 10:45:31 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
Message-ID: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>

Hello all,

I plan to do the Biopython 1.53 release this afternoon (in a few hours time).

If there are any last minute changes anyone wants to make on the trunk,
please email first. Ideally just documentation or additional unit tests at this
point ;)

Thanks

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec 15 10:29:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 15:29:48 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
Message-ID: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>

On Tue, Dec 15, 2009 at 10:45 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hello all,
>
> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).
>

OK - Everything looks good on the code side, git has been tagged, source
archives and windows installers uploaded. If anyone could double check
the installers work on your machines that would be great.

Brad - could you run a sanity test before uploading to pypi?

David - did you manage to draft a release announcement? If not, don't
worry, I'll make one up ;)

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec 15 11:28:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 16:28:13 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
Message-ID: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>

On Tue, Dec 15, 2009 at 3:29 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Dec 15, 2009 at 10:45 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hello all,
>>
>> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).
>>
>
> OK - Everything looks good on the code side, git has been tagged, source
> archives and windows installers uploaded. If anyone could double check
> the installers work on your machines that would be great.
>
> Brad - could you run a sanity test before uploading to pypi?
>
> David - did you manage to draft a release announcement? If not, don't
> worry, I'll make one up ;)

Draft text below - any comments?

Thanks,

Peter

----

We are pleased to announce the availability of Biopython 1.53, a new
stable release of the Biopython library, three months after the
release of Biopython 1.52. This is our first release since migrating
from CVS to git for source code control.

There have been some additions to our core objects ? the Seq (and
related UnknownSeq) objects gained upper and lower methods (like the
string methods of the same name but alphabet aware) plus a new ungap
method. The SeqFeature object now has an extract method to get the
region of sequence it describes (useful for getting CDS nucleotide
sequences from GenBank files). Also SeqRecord objects now support
addition, giving a new SeqRecord with the combined sequence, all the
SeqFeatures, and any common annotation.

SQLite support (built into Python 2.5+) was added to our BioSQL
interface. This is still a little experimental as we are using a draft
BioSQL SQLite schema, but this should be merged into the next BioSQL
release.

Biopython now includes wrappers for the new NCBI BLAST C++ tools,
which will be replacing the old NCBI ?legacy? BLAST tools written in
C. The plain text BLAST parser has been updated to cope as well.
Nevertheless, we (and the NCBI) still recommend using the XML output
for parsing.

Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for
parsing MedLine/PubMed data.

The NCBI codon tables have been updated from version 3.4 to 3.9, which
adds a few extra start codons, and a few new tables (Tables 16, 21, 22
and 23).

The restriction enzyme list in Bio.Restriction has been updated to the
Nov 2009 release of REBASE.

The Bio.PDB parser and output code has been updated to understand the
element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been
updated for recent changes to the PDB FTP site.

Finally, support for running Biopython under Jython (using the Java
Virtual Machine) has been much improved. Note that Jython does not
support C code, and currently Jython does not parse DTD files (needed
for the Bio.Entrez XML parser). However, most of the Biopython modules
seem fine from testing Jython 2.5.0 and 2.5.1.

Sources and Windows Installers are available from our downloads page.

Thanks to the Biopython development team and to everyone who has
reported bugs or contributed patches since our last release.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:32:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:28 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWS6a022173@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:32 EST -------
Fixed in Biopython 1.53, using a similar technique but complicated because
this file is generated by a separate script.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:32:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:46 -0500
Subject: [Biopython-dev] [Bug 2892] Jython MatrixInfo.py fix+patch
In-Reply-To: <bug-2892-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWkSA022203@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2892


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:32 EST -------
Fixed in Biopython 1.53 using a similar technique.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:32:48 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:48 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWm0Q022215@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


Bug 2895 depends on bug 2892, which changed state.

Bug 2892 Summary: Jython MatrixInfo.py fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2892

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:32:51 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:51 -0500
Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch
In-Reply-To: <bug-2893-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWpCp022227@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2893


Bug 2893 depends on bug 2892, which changed state.

Bug 2892 Summary: Jython MatrixInfo.py fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2892

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:33:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:33:13 -0500
Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch
In-Reply-To: <bug-2893-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151633.nBFGXD3Y022254@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2893


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:33 EST -------
Fixed in Biopython 1.53 using a similar technique


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:33:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:33:15 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151633.nBFGXFa7022266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


Bug 2895 depends on bug 2893, which changed state.

Bug 2893 Summary: Jython test_prosite fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2893

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:41:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:41:30 -0500
Subject: [Biopython-dev] [Bug 2807] Clustalw return codes
In-Reply-To: <bug-2807-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151641.nBFGfUpS022532@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2807


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:41 EST -------
Bio.Clustalw was declared obsolete in Release 1.52, so there is no reason to
add
better support for return codes. With the new alignment wrappers and subprocess
this is a non-issue.

Marking as "won't fix".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 11:46:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:46:17 -0500
Subject: [Biopython-dev] [Bug 2820] Convert test_PDB.py to unittest
In-Reply-To: <bug-2820-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151646.nBFGkHAG022705@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2820


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:46 EST -------
(In reply to comment #1)
> 
> I've checked in a slightly modified version as test_PDB_unit.py - I think
> having both this and the original test_PDB.py is sensible in the short term.
> 

I've just removed old print-and-compare test_PDB.py, then renamed
test_PDB_unit.py to test_PDB.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Dec 15 12:01:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 17:01:38 +0000
Subject: [Biopython-dev] Biopython 1.53 released
Message-ID: <320fb6e00912150901k138ae04bmc5d5af9c867340ec@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce the availability of Biopython 1.53, a new
stable release of the Biopython library, three months after the
release of Biopython 1.52. This is our first release since migrating
from CVS to git for source code control.

There have been some additions to our core objects ? the Seq (and
related UnknownSeq) objects gained upper and lower methods (like the
string methods of the same name but alphabet aware) plus a new ungap
method. The SeqFeature object now has an extract method to get the
region of sequence it describes (useful for getting CDS nucleotide
sequences from GenBank files). Also SeqRecord objects now support
addition, giving a new SeqRecord with the combined sequence, all the
SeqFeatures, and any common annotation.

SQLite support (built into Python 2.5+) was added to our BioSQL
interface. This is still a little experimental as we are using a draft
BioSQL SQLite schema, but this should be merged into the next
BioSQL release.

Biopython now includes wrappers for the new NCBI BLAST C++ tools,
which will be replacing the old NCBI ?legacy? BLAST tools written in
C. The plain text BLAST parser has been updated to cope as well.
Nevertheless, we (and the NCBI) still recommend using the XML output
for parsing.

Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for
parsing MedLine/PubMed data.

The NCBI codon tables have been updated from version 3.4 to 3.9, which
adds a few extra start codons, and a few new tables (Tables 16, 21, 22
and 23).

The restriction enzyme list in Bio.Restriction has been updated to the
Nov 2009 release of REBASE.

The Bio.PDB parser and output code has been updated to understand the
element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been
updated for recent changes to the PDB FTP site.

Finally, support for running Biopython under Jython (using the Java
Virtual Machine) has been much improved. Note that Jython does not
support C code, and currently Jython does not parse DTD files (needed
for the Bio.Entrez XML parser). However, most of the Biopython modules
seem fine from testing Jython 2.5.0 and 2.5.1.

Sources and Windows Installers are available from our downloads page.

Thanks to the Biopython development team and to everyone who has
reported bugs or contributed patches since our last release.

--Peter, on behalf of the Biopython developers

P.S. This news post is online at
http://news.open-bio.org/news/2009/12/biopython-release-153/

You may wish to subscribe to our news feed.  For RSS links etc, see:
http://biopython.org/wiki/News

Biopython news is also on twitter:
http://twitter.com/biopython


From chapmanb at 50mail.com  Wed Dec 16 07:42:35 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 16 Dec 2009 07:42:35 -0500
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
	<320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
Message-ID: <20091216124235.GK78379@sobchak.mgh.harvard.edu>

Hi Peter;

> >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).

Sorry I am too slow with your mails. Thanks for the hard work
getting this together. Great stuff.

> > Brad - could you run a sanity test before uploading to pypi?

Looks good to me, and uploaded to pypi.

> Draft text below - any comments?

As a thought for next time, what do you think about adding the
names of people who have worked on the items mentioned in the
release? This would give a bit more public recognition for the
contributions, especially to people who only look at the release
notes and not mailing list traffic.

Thanks again,
Brad

From biopython at maubp.freeserve.co.uk  Wed Dec 16 17:43:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Dec 2009 22:43:16 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <20091216124235.GK78379@sobchak.mgh.harvard.edu>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
	<320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
	<20091216124235.GK78379@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912161443q30f82120of1c98b073136c3f6@mail.gmail.com>

Brad wrote:
>> Brad - could you run a sanity test before uploading to pypi?
>
> Looks good to me, and uploaded to pypi.

Great, thank you.

>> Draft text below - any comments?
>
> As a thought for next time, what do you think about adding the
> names of people who have worked on the items mentioned in the
> release? This would give a bit more public recognition for the
> contributions, especially to people who only look at the release
> notes and not mailing list traffic.

Its too late for the emails and the source code bundles, but
the nice thing about the NEWS file (in the repository) and
the OBF news server is we can update them even now.

Of course, quite where to draw the line is debatable - a simple
patch probably doesn't warrant it (or does it?), but solving a
more complex bug or adding some new functionality does.
If any existing core developers want more "recognition" we
can do that too.

For example, Kyle, would you have like to be named with
regards to the Jython work? I almost put you in anyway,
but in the end just mentioned it on twitter:
http://twitter.com/Biopython/statuses/6502469425

Another idea to showcase new features would be for the
author(s) to prepare a (credited) blog post with some
examples (to put on our news server). I have already done
a few like this, and think it would also be a good thing in
general.

Peter

From kellrott at gmail.com  Wed Dec 16 20:39:49 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Wed, 16 Dec 2009 17:39:49 -0800
Subject: [Biopython-dev] zxJDBC support for BioSQL
Message-ID: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>

I've push pushed a patch to the BioSQL code that enables zxJDBC support.
This means that Jython can now run BioSQL through mysql.  (SQLite hasn't
been ported to Java yet)
zxJDBC is a Jython module included in the standard distribution that
provides a PythonDB interface through the java sql interfaces.  I've only
ran the unit tests using the mysql-connector, but it should theoretically
work with Oracle as well.
The biggest issues for changing code:
 - Java expects ? instead of %s, so sql strings have to be altered (I
override the execute method in the DBUtils to run a regular express before
execution)
 - A Sql string with a=? works, one with a='?' does not (Loader.py had some
examples of this)
 - Java returns unicode, not strings (recent patch to the mainline fixes
this)

Code can be found at http://github.com/kellrott/biopython

Kyle

From biopython at maubp.freeserve.co.uk  Thu Dec 17 05:46:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 10:46:37 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
Message-ID: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>

On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
> been ported to Java yet)
> zxJDBC is a Jython module included in the standard distribution that
> provides a PythonDB interface through the java sql interfaces. ?I've only
> ran the unit tests using the mysql-connector, but it should theoretically
> work with Oracle as well.

Sounds good, and ought to work on PostgreSQL too in theory.

I should be able to test it on MySQL.

> The biggest issues for changing code:
> ?- Java expects ? instead of %s, so sql strings have to be altered (I
> override the execute method in the DBUtils to run a regular express
> before execution)
> ?- A Sql string with a=? works, one with a='?' does not (Loader.py had some
> examples of this)
> ?- Java returns unicode, not strings (recent patch to the mainline fixes
> this)

Some of those issues applied to SQLite (hence the changes on the
trunk from Brad).

> Code can be found at http://github.com/kellrott/biopython

Lovely. That's on your jython branch (along with lots of your other work)?

Peter


From biopython at maubp.freeserve.co.uk  Thu Dec 17 08:31:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 13:31:30 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <320fb6e00912170531j3f9c9b38n123e0464fa536e45@mail.gmail.com>

On Thu, Dec 17, 2009 at 10:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>>
>> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
>> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
>> been ported to Java yet)
>> zxJDBC is a Jython module included in the standard distribution that
>> provides a PythonDB interface through the java sql interfaces. ?I've only
>> ran the unit tests using the mysql-connector, but it should theoretically
>> work with Oracle as well.
>
> Sounds good, and ought to work on PostgreSQL too in theory.
>
> I should be able to test it on MySQL.

I worked out I needed to install MySQL Connector/J  so that
org.gjt.mm.mysql.Driver works in Jython, get it from here:
http://dev.mysql.com/downloads/connector/j/

Installation seems to be just unzipping this and updating your
CLASSPATH environment variable to point at the jar file.

Peter


From biopython at maubp.freeserve.co.uk  Thu Dec 17 09:54:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 14:54:13 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <320fb6e00912170654g41bc8c4eyce0f56b4472076f9@mail.gmail.com>

On Thu, Dec 17, 2009 at 10:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>>
>> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
>> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
>> been ported to Java yet)

Maybe one day Jython will have a Python sqlite3 like library built in:
http://bugs.jython.org/issue1682864

For now it looks like we could probably use SQLite via zxJDBC instead
(see links on that Jython issue).

Peter


From kellrott at gmail.com  Thu Dec 17 13:03:38 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Thu, 17 Dec 2009 10:03:38 -0800
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <bb02be080912171003n58ba38dej8a9aeed15a289223@mail.gmail.com>

> > Code can be found at http://github.com/kellrott/biopython
>
> Lovely. That's on your jython branch (along with lots of your other work)?
>

Yes, but all of the zxJDBC work has been done in the past 2 weeks (just the
last three commits), so it should be easy to cherry-pick out the relevant
patches.

Kyle

From mhampton at d.umn.edu  Thu Dec 17 13:42:33 2009
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Thu, 17 Dec 2009 12:42:33 -0600 (CST)
Subject: [Biopython-dev] code credits
In-Reply-To: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
References: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
Message-ID: <Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>


I strongly encourage you to list anyone who has contributed a patch, no 
matter how small.  This has worked very well for the Sage project 
(www.sagemath.org) where credit is given to all contributors and reviewers 
(every patch must be reviewed by at least one other person).  For example 
see:

http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f

Marshall Hampton
Department of Mathematics and Statistics
University of Minnesota, Duluth

> Message: 1
> Date: Wed, 16 Dec 2009 22:43:16 +0000
> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Code freeze for Biopython 1.53
> To: Brad Chapman <chapmanb at 50mail.com>, biopython-dev at biopython.org
> Message-ID:
> 	<320fb6e00912161443q30f82120of1c98b073136c3f6 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Brad wrote:
>>> Brad - could you run a sanity test before uploading to pypi?
>>
>> Looks good to me, and uploaded to pypi.
>
> Great, thank you.
>
>>> Draft text below - any comments?
>>
>> As a thought for next time, what do you think about adding the
>> names of people who have worked on the items mentioned in the
>> release? This would give a bit more public recognition for the
>> contributions, especially to people who only look at the release
>> notes and not mailing list traffic.
>
> Its too late for the emails and the source code bundles, but
> the nice thing about the NEWS file (in the repository) and
> the OBF news server is we can update them even now.
>
> Of course, quite where to draw the line is debatable - a simple
> patch probably doesn't warrant it (or does it?), but solving a
> more complex bug or adding some new functionality does.
> If any existing core developers want more "recognition" we
> can do that too.
>
> For example, Kyle, would you have like to be named with
> regards to the Jython work? I almost put you in anyway,
> but in the end just mentioned it on twitter:
> http://twitter.com/Biopython/statuses/6502469425
>
> Another idea to showcase new features would be for the
> author(s) to prepare a (credited) blog post with some
> examples (to put on our news server). I have already done
> a few like this, and think it would also be a good thing in
> general.
>
> Peter

From kellrott at gmail.com  Thu Dec 17 16:20:10 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Thu, 17 Dec 2009 13:20:10 -0800
Subject: [Biopython-dev] code credits
In-Reply-To: <Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>
References: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
	<Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>
Message-ID: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>

I would agree with that.  Drawing from broad stereotypes, I would think that
a majority of contributors are academic and would be most interested in
adding things to their CV.  So acknowledgment would be of great value to
them at no real cost to the Biopython project.  Plus there's the old idea
that the more authors a paper has the more important it must be.

Kyle

I strongly encourage you to list anyone who has contributed a patch, no
> matter how small.  This has worked very well for the Sage project (
> www.sagemath.org) where credit is given to all contributors and reviewers
> (every patch must be reviewed by at least one other person).  For example
> see:
>
> http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f
>
> Marshall Hampton
> Department of Mathematics and Statistics
> University of Minnesota, Duluth
>
>

From tallpaulinjax at yahoo.com  Thu Dec 17 16:48:25 2009
From: tallpaulinjax at yahoo.com (Paul B)
Date: Thu, 17 Dec 2009 13:48:25 -0800 (PST)
Subject: [Biopython-dev] code credits
In-Reply-To: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
Message-ID: <928490.72367.qm@web30708.mail.mud.yahoo.com>

I also agree completely. Adding value to the code deserves some form of credit, if desired by the contributor. I fixed a bit of code in a couple of the modules and received no credit... that made me a good bit less gung ho about contributing more.

--- On Thu, 12/17/09, Kyle Ellrott <kellrott at gmail.com> wrote:

From: Kyle Ellrott <kellrott at gmail.com>
Subject: Re: [Biopython-dev] code credits
To: "Marshall Hampton" <mhampton at d.umn.edu>
Cc: biopython-dev at lists.open-bio.org
Date: Thursday, December 17, 2009, 4:20 PM


I would agree with that.? Drawing from broad stereotypes, I would think that
a majority of contributors are academic and would be most interested in
adding things to their CV.? So acknowledgment would be of great value to
them at no real cost to the Biopython project.? Plus there's the old idea
that the more authors a paper has the more important it must be.

Kyle

I strongly encourage you to list anyone who has contributed a patch, no
> matter how small.? This has worked very well for the Sage project (
> www.sagemath.org) where credit is given to all contributors and reviewers
> (every patch must be reviewed by at least one other person).? For example
> see:
>
> http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f
>
> Marshall Hampton
> Department of Mathematics and Statistics
> University of Minnesota, Duluth
>
>
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Thu Dec 17 17:54:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 22:54:40 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <928490.72367.qm@web30708.mail.mud.yahoo.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
Message-ID: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>

Hi all,

Marshall Hampton's description of how they do it on Sage
sounds worth trying - if we keep track as things are checked
in, it won't be too much work either. Do you (sage) have a
list of guidelines for what qualifies for a credit?

On Thu, Dec 17, 2009 at 9:48 PM, Paul B <tallpaulinjax at yahoo.com> wrote:
>
> I also agree completely. Adding value to the code deserves
> some form of credit, if desired by the contributor. I fixed a bit
> of code in a couple of the modules and received no credit...
> that made me a good bit less gung ho about contributing more.
>

Sorry :(  You didn't get no credit at all though, you were
named in the commit:
http://github.com/biopython/biopython/commit/225fb0eb92c99018c3710c3ec5ac0b22e9706208

Also people who offer changes via github that can be
merged cleanly onto the trunk, or cherry-picked would
also automatically get a credit in the repository history.

Would someone like to go through the git log for Biopython
1.53 for a full list? e.g. Hongbo Zhu and Frederik Gwinner
contributed to a PDB enhancement (Bug 2495), and as he
pointed out, so did Paul B (again, PDB stuff). These were
the "border line" cases I had in mind here:
http://lists.open-bio.org/pipermail/biopython-dev/2009-December/007161.html

>From personal experience contributing to other open
source project, getting a credit in release notes even for
a small bug fix/enhancement as in sage is rare. So while
I thought I was following OS norms in writing the last
release notes, we can certainly do this differently in
future.

Regards,

Peter

From mhampton at d.umn.edu  Thu Dec 17 20:54:00 2009
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Thu, 17 Dec 2009 19:54:00 -0600 (CST)
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com> 
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
Message-ID: <Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>


On Thu, 17 Dec 2009, Peter wrote:
> Marshall Hampton's description of how they do it on Sage
> sounds worth trying - if we keep track as things are checked
> in, it won't be too much work either. Do you (sage) have a
> list of guidelines for what qualifies for a credit?

I don't think we have formal guidelines, but the process is pretty simple. 
Whoever works on a patch in our bug/feature tracker has to flag it for 
review.  Both the person who implements the patch and the reviewer get 
credit.  It doesn't matter if its a 1-character change to the 
documentation, they're listed in the release notes.  Basically, the idea 
is to err (if that's the right word) on the side of acknowledging any 
contribution.  I think that Sage (really William Stein initially) adopting 
that philosophy is one of the reasons its gone from 1 to 150 or so 
developers.

I'm cc'ing sage-devel in case anyone there wants to comment on this.

Cheers,

Marshall Hampton
Department of Mathematics and Statistics
University of Minnesota, Duluth

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 04:44:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 04:44:02 -0500
Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string
	for HTML (not an error)
In-Reply-To: <bug-2938-42@http.bugzilla.open-bio.org/>
Message-ID: <200912180944.nBI9i22n007947@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2938


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 04:44 EST -------
The offending XML file (the one that does not start with <?xml) is created by
efetch from the journals database. Upon the EUtils documentation more
carefully, it seems that XML output from the journals database is not
officially supported; only text and html output are supported. One option is to
simply remove the offending XML file from the tests, and raise an error
whenever Entrez.read is presented with data that do not start with <?xml.
Additionally, we could add a parser for the text output generated by efetch
from the journals database.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 04:46:45 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 04:46:45 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912180946.nBI9kjFA008009@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 04:46 EST -------
Peter, are you still looking at this bug report?
Otherwise I could have a look at it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:00:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:00:50 -0500
Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy
In-Reply-To: <bug-2698-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181000.nBIA0opL008316@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:00 EST -------
Thanks for your test!

I would like to simplify the code a bit though.
How about replacing

ix, iy= expand_count([0, 0, 1],'C', 40)
xm.extend(ix)
ym.extend(iy)

by

xm.extend([0,0,1] * 40)
ym.extend(['C'] * 40)

Or, you could replace this whole section by
xm = [0,0,1]*40 + [0,0,1]*60 + [0,1,0]*75 + [0,1,0]*25 + [1,0,0]*90 +
[1,0,0]*10

and similarly for ym.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:08:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:08:24 -0500
Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion
	is too lenient
In-Reply-To: <bug-2693-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181008.nBIA8Ogf008537@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:08 EST -------
Sorry for not getting back to this bug report earlier.

(In reply to comment #3)
> > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn
> > can store the value of llik on each call.
> 
> I guess this is all how you define the purpose of the update_fn function.
> 
Do you have an example of the update_fn function where old_llik is needed?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:17:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:17:12 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181017.nBIAHCJN008837@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:17 EST -------
One option is to store these variables inside the function. As an example, if
this is a module mymodule.py:

def f(x = None):
    if x==None:
        x = f.x
    print x

f.x = 3

then we can do the following:

>>> import mymodule
>>> mymodule.f()
3
>>> mymodule.f(5)
5
>>> mymodule.f.x = 9
>>> mymodule.f(5)
5
>>> mymodule.f()
9
>>> 

But personally, I think that having module-level defaults is not really
necessary. We typically don't have that for other functions, and the only
reason for having them here is that once upon a time this module had such
module-level defaults.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:24:35 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:24:35 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181024.nBIAOZj6009054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:24 EST -------
(In reply to comment #11)
> Peter, are you still looking at this bug report?
> Otherwise I could have a look at it.

Thanks Michiel - Please feel free. I didn't feel we had time to get this into
Biopython 1.53, as I think it is going to be a lot of work to assess, but needs
to be done.

I think there are two issues here, poor support for multiple models, and
re-writing the flex parser in pure python. Given time (!) I would want to take
Paul's python parser and use it to replace the flex code (which is currently
not compiled or installed by default, Bug 2619) and verify it is backwards
compatible, and then add in the model support. If we have enough test coverage
already, then doing it in one go might be OK. Up to you.

Other relevant issues include Bug 2626 (files the current parser can't read -
it may turn out that these are also multi-model CIF files).

Also regarding the model support, for PDB files we currently index them
0,1,2,... as found in the file. There are also names given in the PDB file
itself, which need not by continuous etc. See Bug 2950 and Bug 2951 for this.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:44:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:44:13 -0500
Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string
	for HTML (not an error)
In-Reply-To: <bug-2938-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181044.nBIAiDD6009554@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2938


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:44 EST -------
(In reply to comment #6)
> The offending XML file (the one that does not start with <?xml) is created by
> efetch from the journals database. Upon the EUtils documentation more
> carefully, it seems that XML output from the journals database is not
> officially supported; only text and html output are supported. One option is
> to simply remove the offending XML file from the tests, and raise an error
> whenever Entrez.read is presented with data that do not start with <?xml.
> Additionally, we could add a parser for the text output generated by efetch
> from the journals database.

Hmm - sounds like a plan, but maybe drop the Entrez team a query about this.
Does the current funny XML file have anything useful in it?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:50:03 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:50:03 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181050.nBIAo39q009740@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:50 EST -------
(In reply to comment #12)
> 
> But personally, I think that having module-level defaults is not really
> necessary. We typically don't have that for other functions, and the only
> reason for having them here is that once upon a time this module had such
> module-level defaults.

I agree the module-level defaults are not necessary - but it would be "nice"
to have a transition where both can be used. In reality, I may being overly
cautious - doubt it would affect many (any?) users to just make a clean switch
(which would keep the code simple). I'm happy to leave this to your judgement
Michiel.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 05:54:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:54:24 -0500
Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path
In-Reply-To: <bug-2947-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181054.nBIAsOIw009914@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2947


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:54 EST -------
(In reply to comment #0)
> 
> Thus it appears to me that the viterbi algorithm is not robust enough
> and biased towards the last letter of the state alphabet.

Quite possibly. Might there be a bug in our code, or do you think this
is just an inherent algorithm limitation?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 06:53:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 06:53:14 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181153.nBIBrEQi011286@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 06:53 EST -------
Fixed in github.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 09:12:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 09:12:26 -0500
Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path
In-Reply-To: <bug-2947-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181412.nBIECQ59014801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2947


------- Comment #2 from georg.lipps at fhnw.ch  2009-12-18 09:12 EST -------
Hi Peter,

I am not an expert of the Viterbi algorithm. But as such the algorithm does not
do what is is expected to do. So I guess it is indeed an error in the
implementation.

I would be very happy if it can be fixed.

Greetings,

Georg


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 11:15:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 11:15:24 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181615.nBIGFOD2017597@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #13 from TallPaulInJax at yahoo.com  2009-12-18 11:15 EST -------
Michiel, if you have any questions please feel free to contact me!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From rjalves at igc.gulbenkian.pt  Fri Dec 18 18:39:28 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Fri, 18 Dec 2009 23:39:28 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
Message-ID: <4B2C12B0.9060806@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry to take this to the discussion list, took a bit longer than I
expected to get the approval.

Bringing now the subject to the right place. Leaving full quote history
to help the reading.

Quoting Peter on 12/18/2009 09:39 PM:
> Hi Renato,
> 
> I'm cooking dinner while writing this, so it won't be as in depth as
> usual...
> 
> On Fri, Dec 18, 2009 at 5:17 PM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
>> [I tried submitting this message to the dev mailing list, but got
>> rejected since I'm not yet authorized to post there, so here it goes]
> 
> Have you definitely subscribed to the dev list? That should be all that
> is required to post there, and this discussion would be better suited
> there.
> 
>> Hi everyone,
>>
>> I'm working on changes to the Bio.SeqIO.index() function to make it more
>> consistent with the .read and .parse i.e. accept a filehandle instead of
>> a filename and also to include a way to cache the index into a file to
>> speed up the process.
>>
>> The reason why we are implementing these two is because we were going to
>> implement our own index solution until we realized this was added to 1.52.
>>
>> However the implementation in 1.52 has a few limitations.
> 
> Yes, this was designed to cover basic use cases in a general way,
> but with the option in future to do other things - and in particular
> saving the index to disk was kept in mind.
> 
>> One limitation is that we are using a gzipped database for the sake of
>> space and using gzip.open() to create the file-handle that would then be
>> passed to .parse(). The same was not doable with .index().
>> This is already implemented in
>> http://github.com/Unode/biopython/commit/6fc390151452e3ddf26a117269132125a3ffb3fe
> 
> That was a deliberate choice in that the index code wants to "own"
> the handle. If other code has access to the handle, there is a risky
> of different bits of code moving the handle pointer etc. But, if you
> are careful it could be done.
The way I approached it was to reset the handle pointer to the first
position, since we would like to index the full file. But I understand
that if the user uses the same handle on different files weird results
may happen.
Something that could be a simple workaround would be to copy the
filehandle object in such a way that it's properties are maintained
(like being a gzip.open() filehandle) but it's use doesn't affect the
use of the original handle. However I don't know if this is possible.

> 
> There are also issues here in combination with saving the index.
> With a filename, the code can easily reopen the file in the same
> mode. With a handle, things are more tricky. You have non-file
> handles to consider - such as the gzip example. There is also the
> problem of recording the file mode (normal text, universal text,
> or binary - which we will need for SFF files - code already written).
> 
I see, only after your comment I realized handle.name and handle.mode
are only available in normal filehandles. The gzip.open() example stores
the filename in .filename while the .mode seems to have a different meaning.
> If we do change the code to allow handles, it would have to be
> to allow handles OR filenames to be compatible with Biopython
> 1.52 and 1.53 (which take just filenames). This could be handled
> as in Bio.SeqIO.convert(), which also allows both (which was the
> subject of some discussion!).
> 
I'll have to look more on the example and consider the fact that my
current implementation breaks compatibility with previous code and that
not everything needed (filename, mode,...) is accessible in filehandles.
>> The second is that we are going to use this feature to quick search the
>> database in a web application. Here we have the limitation that we don't
>> have persistence across web requests, which means that we would need to
>> recalculate the index on every web request.
>>
>> The details of how we plan to implement this are the following:
>>
>> cPickle the internal dictionary of offsets and save it on the database
>> folder with the same name as the database + .index. The consistency
>> check on whether the file has changed will be performed based on name
>> and timestamp. By default .index() will search for this file, check the
>> timestamp and use the cache if they match, otherwise they will be
>> recalculated. The save function will be available like:
>>
>>>>> d = SeqIO.index(...)
>>>>> d.save(filename)
>> where filename is optional and defaults to "%s.index" % _handle.name
>>
>> We already have a solution like this implemented with subclasses of
>> SeqIO._index, it's just a matter of reworking that and merge it into
>> BioPython if you consider a good addition to the code.
>>
>> I would like to hear your comments and suggestions on this.
> 
> Yes, saving indexes is an obvious addition. I have explored
> using pickle via shelve, and also SQLite - there are
> implementations of this on my github respository, plus
> begun to look into the existing OBF Open Biological
> Database Access (OBDA) specification for cross project
> compatibility. Other potential benefits here are reduced
> memory usage if we don't keep the dictionary
> of offsets in RAM.
I did try to use pickle directly on the dict like object that is
returned from SeqIO.index() but pickle was not happy with it. The SQLite
approach also crossed my mind and also BioSQL or just some custom SQL
database, but the RAM approach seemed good enough, at least for our
current uses. I can see though that some file formats will require a lot
more RAM depending on what is indexed and their size. In the end it came
out as cPickled dictionaries for faster access.
> 
> http://github.com/peterjc/biopython/tree/index-shelve
> http://github.com/peterjc/biopython/tree/index-sqlite
> 
> There is a potential complication with index sub-classes
> which do more specialised indexing (e.g. GenBank files,
> and for a more extreme case, SFF files). See:
> http://github.com/peterjc/biopython/tree/sff-seqio
For these I would have to do it on a unittest base, I'm not familiar
with the formats. Also the implementation I did was based on the current
master branch of biopython. I now realize a lot more has been done
outside of it that I should look into.
> 
> Anyway - great to see you are finding the code useful,
> and have some quite similar ideas for how to extend
> it further.
> 
> Peter
Thanks for all that info, I have a lot to dig into and see if I can
actually contribute with something. You seem to have pretty much
everything sorted ;)

Renato
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkssEqkACgkQYh11EUYTX9QWHwCeOIuuaEGA3qLvB1EHamDohpZ3
bj0AnRAkP9jOGpvTnSc0W7YgFyX/Ard/
=S45W
-----END PGP SIGNATURE-----

From biopython at maubp.freeserve.co.uk  Sat Dec 19 04:57:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 09:57:25 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <4B2C12B0.9060806@igc.gulbenkian.pt>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
Message-ID: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>

On Fri, Dec 18, 2009 at 11:39 PM, Renato Alves wrote:
> Sorry to take this to the discussion list, took a bit longer than I
> expected to get the approval.
>
> Bringing now the subject to the right place. Leaving full quote history
> to help the reading.

Thanks.

>> That was a deliberate choice in that the index code wants to "own"
>> the handle. If other code has access to the handle, there is a risk
>> of different bits of code moving the handle pointer etc. But, if you
>> are careful it could be done.
>
> The way I approached it was to reset the handle pointer to the first
> position, since we would like to index the full file. But I understand
> that if the user uses the same handle on different files weird results
> may happen.

OK

> Something that could be a simple workaround would be to copy the
> filehandle object in such a way that it's properties are maintained
> (like being a gzip.open() filehandle) but it's use doesn't affect the
> use of the original handle. However I don't know if this is possible.

That may work for some handles but not others. Worth trying.

>> There are also issues here in combination with saving the index.
>> With a filename, the code can easily reopen the file in the same
>> mode. With a handle, things are more tricky. You have non-file
>> handles to consider - such as the gzip example. There is also the
>> problem of recording the file mode (normal text, universal text,
>> or binary - which we will need for SFF files - code already written).
>
> I see, only after your comment I realized handle.name and handle.mode
> are only available in normal filehandles. The gzip.open() example stores
> the filename in .filename while the .mode seems to have a different
> meaning.

That would make finding out the filename from a handle tricky.

>> If we do change the code to allow handles, it would have to be
>> to allow handles OR filenames to be compatible with Biopython
>> 1.52 and 1.53 (which take just filenames). This could be handled
>> as in Bio.SeqIO.convert(), which also allows both (which was the
>> subject of some discussion!).
>
> I'll have to look more on the example and consider the fact that my
> current implementation breaks compatibility with previous code and that
> not everything needed (filename, mode,...) is accessible in filehandles.

OK.

>> Yes, saving indexes is an obvious addition. I have explored
>> using pickle via shelve, and also SQLite - there are
>> implementations of this on my github respository, plus
>> begun to look into the existing OBF Open Biological
>> Database Access (OBDA) specification for cross project
>> compatibility. Other potential benefits here are reduced
>> memory usage if we don't keep the dictionary
>> of offsets in RAM.
>
> I did try to use pickle directly on the dict like object that is
> returned from SeqIO.index() but pickle was not happy with it. The SQLite
> approach also crossed my mind and also BioSQL or just some custom SQL
> database, but the RAM approach seemed good enough, at least for our
> current uses. I can see though that some file formats will require a lot
> more RAM depending on what is indexed and their size. In the end it came
> out as cPickled dictionaries for faster access.

I agree that an in RAM dictionary works pretty well, even for
very large sequence files. In terms of speed, I would expect
a two step build index in memory, then save to disk, to be
faster than building the index database on disk (which was
a bit slow).

>> There is a potential complication with index sub-classes
>> which do more specialised indexing (e.g. GenBank files,
>> and for a more extreme case, SFF files). See:
>> http://github.com/peterjc/biopython/tree/sff-seqio
>
> For these I would have to do it on a unittest base, I'm not familiar
> with the formats. Also the implementation I did was based on
> the current master branch of biopython. I now realize a lot more
> has been done outside of it that I should look into.

I'm sorry if the discussion on the (dev) mailing list wasn't
clearer - but having a fresh set of eyes looking at the topic
is very useful.

>> Anyway - great to see you are finding the code useful,
>> and have some quite similar ideas for how to extend
>> it further.
>
> Thanks for all that info, I have a lot to dig into and see if I can
> actually contribute with something. You seem to have pretty much
> everything sorted ;)

Well, i hadn't been thinking about gzipped files (or any archives).
How does gzip behave with memory use? I assume it doesn't
load everything into RAM, but does allow you random access
(seek and tell).

This is a vague idea (which I haven't tried yet), but maybe the
Bio.SeqIO.index() function could take an optional argument
(gzip=True, or something more general like archive=...) which
would cause the file to be opened via the gzip module instead?

Regards,

Peter

From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 06:02:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 06:02:44 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191102.nBJB2iOb014900@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #6 from robfsouza at gmail.com  2009-12-19 06:02 EST -------
Created an attachment (id=1412)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1412&action=view)
Testcase for NCBI's BLAST alignment with errors

This is a sequence from Naegleria gruberi and blastpgp output which reproduces
a reported bug in NCBI's blastpgp output at the first iteration (see hit
against sequence gi|156552846|ref|XP_001600053.1). Search parameters were

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000
-v 10000 -h 0.01 -I T -m 0 -M BLOSUM62 -F F


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 06:21:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 06:21:13 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191121.nBJBLDax015457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


robfsouza at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1412 is|0                           |1
           obsolete|                            |


------- Comment #7 from robfsouza at gmail.com  2009-12-19 06:21 EST -------
Created an attachment (id=1413)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1413&action=view)
Testcase for NCBI's BLAST alignment with errors

Sending the right query sequence now (my mistake! :))


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 07:09:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 07:09:57 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191209.nBJC9vxr016459@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #8 from ibdeno at gmail.com  2009-12-19 07:09 EST -------
(In reply to comment #7)
Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 using
Robson's test case.
Thanks to Robson for this and apologies for not having been able to send a test
case.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From rjalves at igc.gulbenkian.pt  Sat Dec 19 16:48:10 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Sat, 19 Dec 2009 21:48:10 +0000
Subject: [Biopython-dev] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>	
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>	
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
Message-ID: <4B2D4A1A.6@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Well, i hadn't been thinking about gzipped files (or any archives).
> How does gzip behave with memory use? I assume it doesn't
> load everything into RAM, but does allow you random access
> (seek and tell).

- From what I can tell, in terms of RAM it behaves the same way as a
normal open() it only decompresses the segments as they are accessed but
doesn't cache them. A reasonable trade-off between space and access time.

> This is a vague idea (which I haven't tried yet), but maybe the
> Bio.SeqIO.index() function could take an optional argument
> (gzip=True, or something more general like archive=...) which
> would cause the file to be opened via the gzip module instead?

I thought about something similar but using a combination of extension
of the file and magic (or actually python-magic[1]). The first one is
potentially messy although it's how things are mostly done in Windows.
The second one I couldn't confirm if is available for Windows but is
widely present in Linux (and I suppose MacOS too).
In the end I dislike the idea of 'having' to use one approach or the
other depending on the OS the code is running on, however this would fit
in without breaking any compatibility with current code.

1 - http://pypi.python.org/pypi/python-magic/0.1

Renato
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkstShgACgkQYh11EUYTX9Tu3wCglh6d3rt/ANU5J45bsceqcQ78
TQ0AnjgIlNhYRMqdzl4jBGYOPdMKOY7D
=rqsi
-----END PGP SIGNATURE-----

From eric.talevich at gmail.com  Sat Dec 19 17:42:23 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 19 Dec 2009 14:42:23 -0800
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> 
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
Message-ID: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>

On Sat, Dec 19, 2009 at 1:57 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> This is a vague idea (which I haven't tried yet), but maybe the
> Bio.SeqIO.index() function could take an optional argument
> (gzip=True, or something more general like archive=...) which
> would cause the file to be opened via the gzip module instead?
>

Or: open=open -- accept a function that opens the file; by default, the
built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
user-defined function to open zip files (since that's less straightforward).

Otherwise, since the variety of archive formats supported by the Python
standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

-Eric

From rjalves at igc.gulbenkian.pt  Sat Dec 19 19:08:42 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Sun, 20 Dec 2009 00:08:42 +0000
Subject: [Biopython-dev] SeqIO.index improvement suggestions
In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
Message-ID: <4B2D6B0A.4040008@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- - From Eric Talevich on 12/19/2009 10:42 PM:
> Or: open=open -- accept a function that opens the file; by default, the
> built-in open function, but easily replaced by gzip.open, bz2.BZ2File,
> or a user-defined function to open zip files (since that's less
> straightforward).
>
> Otherwise, since the variety of archive formats supported by the Python
> standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

I prefer the first option. Flexible, backwards compatible, fits all
mentioned cases so far and allows inclusion of other formats. Got my
vote on that one.

Renato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkstawUACgkQYh11EUYTX9TJbwCgi4fQGQcfaBdJNLbMRsubjz82
4LQAnRgY0IKjwznjtiQzRNd0k8SH4oMN
=YNHc
-----END PGP SIGNATURE-----

From biopython at maubp.freeserve.co.uk  Sun Dec 20 13:06:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 20 Dec 2009 18:06:33 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
Message-ID: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>

On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote:
> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote:
>>
>> This is a vague idea (which I haven't tried yet), but maybe the
>> Bio.SeqIO.index() function could take an optional argument
>> (gzip=True, or something more general like archive=...) which
>> would cause the file to be opened via the gzip module instead?
>
> Or: open=open -- accept a function that opens the file; by default, the
> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
> user-defined function to open zip files (since that's less straightforward).

That's what I had in mind with the "archive=..." bit (I should have
been clearer), but "open" is probably a better name for it (assuming
it isn't going to become a reserved word in future versions of Python).

> Otherwise, since the variety of archive formats supported by the Python
> standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

That would work, but as you say, it is rather limited.

Peter

From biopython at maubp.freeserve.co.uk  Mon Dec 21 06:57:51 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 11:57:51 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
Message-ID: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>

Hello all,

This email has been sent to the Biopython developers list, where
we are proposing to include a list of contributors in the Biopython
1.53 and future release notes.

I have specifically CC'd Chris Lasher, Hongbo Zhu and Paul B as
"new contributors". I don't have an email address for Frederik
Gwinner but will send him this message via github instead.

On Fri, Dec 18, 2009 at 1:54 AM, Marshall Hampton wrote:
>
> On Thu, 17 Dec 2009, Peter wrote:
>>
>> Marshall Hampton's description of how they do it on Sage
>> sounds worth trying - if we keep track as things are checked
>> in, it won't be too much work either. Do you (sage) have a
>> list of guidelines for what qualifies for a credit?
>
> I don't think we have formal guidelines, but the process is pretty simple.
> Whoever works on a patch in our bug/feature tracker has to flag it for
> review. ?Both the person who implements the patch and the reviewer get
> credit. ?It doesn't matter if its a 1-character change to the documentation,
> they're listed in the release notes. ?Basically, the idea is to err (if
> that's the right word) on the side of acknowledging any contribution. ...

On that basis, this is a (partial?) list for Biopython 1.53, given
alphabetically as done by Sage:

Bartek Wilczyns
Brad Chapman
Chris Lasher (first contribution?)
Cymon Cox
Frank Kauff
Frederik Gwinner (first contribution?)
Hongbo Zhu (first contribution?)
Kyle Ellrott
Leighton Pritchard
Michiel de Hoon
Paul B (first contribution?)
Peter Cock

Am I missing anyone? Have I spelt all the names right? (Actually a
serious question - I recently made a typo on a git commit comment
and miss-typed Leighton's surname).

We can update the release note on the news server/blog to include this,
and send round another announcement email describing this plan. For
the source code, I have two suggestions:

(1) Include this in the NEWS file for each release, and continue adding
names to the single alphabetical list in the CONTRIBUTORS file.

(2) Don't included the list of names in the NEWS file, but instead put
them in the CONTRIBUTORS file. This can have a section for each
future release, with all the existing entries listed as contributors up to
and including Biopython 1.52.

I prefer the second option - the NEWS file is already quite long, and can
refer to the CONTRIBUTORS file (e.g. for Biopython 1.53 we can have a
line "(At least) 12 people contributed to this release, including 4 first time
contributors".

Peter


From chapmanb at 50mail.com  Mon Dec 21 08:23:39 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 21 Dec 2009 08:23:39 -0500
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
Message-ID: <20091221132339.GC21580@sobchak.mgh.harvard.edu>

Hi Peter;
Awesome. Nice to see all the new and familiar names from this latest
release.

> (1) Include this in the NEWS file for each release, and continue adding
> names to the single alphabetical list in the CONTRIBUTORS file.

I'd rather see it this way, which is a bit more informal and in
context. Something along the lines of:

Bob Jones added the FooBar module for parsing the latest NCBI
file format.

or:

Several bug fixes were committed to the PDB module. Thanks to Joe
Smith, Steve P and Jorge Garcia for their patches.

If people contributed to something that didn't make the new cut, then we
could just list additional contributors near the end. The goal should
be to recognize people if they contributed to a release by having
their name somewhere in the release notes. For core contributors like
yourself, you probably don't want your name next to everything so pick a
couple of your favorites for attribution.

Brad

From biopython at maubp.freeserve.co.uk  Mon Dec 21 09:34:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 14:34:38 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <20091221132339.GC21580@sobchak.mgh.harvard.edu>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>

On Mon, Dec 21, 2009 at 1:23 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Hi Peter;
> Awesome. Nice to see all the new and familiar names from this latest
> release.
>
>> (1) Include this in the NEWS file for each release, and continue adding
>> names to the single alphabetical list in the CONTRIBUTORS file.
>
> I'd rather see it this way, which is a bit more informal and in
> context. Something along the lines of:
>
> Bob Jones added the FooBar module for parsing the latest NCBI
> file format.
>
> or:
>
> Several bug fixes were committed to the PDB module. Thanks to Joe
> Smith, Steve P and Jorge Garcia for their patches.
>
> If people contributed to something that didn't make the new cut, then we
> could just list additional contributors near the end. The goal should
> be to recognize people if they contributed to a release by having
> their name somewhere in the release notes. For core contributors like
> yourself, you probably don't want your name next to everything so pick a
> couple of your favorites for attribution.

OK - some under your option (3?), the CONTRIBOTORS file is kept
in the existing style, and the NEWS file also continues in a similar
*style* to before, but making a more concious effort to include names
next to noteworthy features, and ensure any other contributors get
included at the end (e.g. "Plus miscelaneous bug fixes from X, Y
and Z").

That seems fine.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Dec 21 10:34:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Dec 2009 10:34:22 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912211534.nBLFYMKt002285@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-21 10:34 EST -------
(In reply to comment #8)
> (In reply to comment #7)
> Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22
> using Robson's test case. Thanks to Robson for this and apologies for not
> having been able to send a test case.

I was also able to confirmed the problem is present in blastpgp 2.2.22,
however it seems to have been fixed in the "new" BLAST+ suite, psiblast
2.2.22+ as described here:
http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html

Given this new information, this does look like an NCBI BLAST bug, and not
a problem in Biopython itself. We *might* be able to get our parser to cope
with the funny BLAST output, but it does look difficult and risky to me.

Miguel - Is it possible the BLAST bug is relatively recent and first showed
up when you updated blastpgp to 2.2.18?

Regards,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Dec 21 11:48:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 16:48:50 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
	<320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
Message-ID: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>

Peter wrote this (with spelling fixed):
>
> OK - some under your option (3?), the CONTRIBUTORS file is kept
> in the existing style, and the NEWS file also continues in a similar
> *style* to before, but making a more concious effort to include names
> next to noteworthy features, and ensure any other contributors get
> included at the end (e.g. "Plus miscellaneous bug fixes from X, Y
> and Z").
>

Actually, looking over this again, if we want to include a "Sage style"
list of names in the release notes (which looks good), it really would
be easier if we kept this list of names in that format within the
repository (updating it as needed when new code is checked in).
The NEWS and CONTRIBUTORS files are the obvious places to
do this.

With Brad's outline (3), or at least how I understood it (and maybe
I misunderstood you Brad), the NEWS file would have the contributor
names for each release, but not in a format where they can be
copy and pasted to put together a release notice. Meanwhile the
CONTRIBUTORS file would continue as a single list of all
contributions to date. This means whomever writes the release
notice has to synthesise the contributor list by hand, which is
tedious and risks omitting people.

My earlier suggestions had the list of names in the NEWS file for
each release (1), or in the CONTRIBUTORS file broken down by
release (2). These options seem better to me just from a practical
point of view - and we can still also credit people in the main text
of the NEWS file as we do now if appropriate.

So, how about a merger of (1) and (3)? i.e.

* The CONTRIBUTORS file remains a single alphabetical list
of all contributors to date (no change).
* Entries in the NEWS file for new features etc may continue
to credit authors as appropriate.
* The NEWS file will include at the end of each release section
an alphabetical list of contributors for that release (with new
contributors flagged). This will be re-used in the release notice.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Dec 21 11:49:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Dec 2009 11:49:29 -0500
Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS
	6.1.0 arguments
In-Reply-To: <bug-2966-42@http.bugzilla.open-bio.org/>
Message-ID: <200912211649.nBLGnTed003915@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-12-21 11:49 EST -------
I also found an issue with the PrimerSearchCommandline.  The command line
options -sequences and -primers do not appear to be used in EMBOSS6.1.0, having
been replaced by -seqall and -infile, respectively.  I changed the options
accordingly, and the modified files are available at
http://github.com/widdowquinn/biopython/tree/emboss-branch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From p.j.a.cock at googlemail.com  Tue Dec 22 04:25:27 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 22 Dec 2009 09:25:27 +0000
Subject: [Biopython-dev] Fwd: Debian python-biopython packaging for
	Biopython 1.53
In-Reply-To: <E3E486F3-8609-4291-BE56-D9D760FB8C87@ini.phys.ethz.ch>
References: <320fb6e00908110407x2c4132f8va17e19aaf2b224d0@mail.gmail.com>
	<48B3E023-F75A-4F50-90CE-6FDA7DDA9E4C@ini.phys.ethz.ch>
	<320fb6e00908120308w5077f598i428b6011912c6f37@mail.gmail.com>
	<783F8F61-58D6-4638-B2C7-5C206C321C13@ini.phys.ethz.ch>
	<320fb6e00908190305o3cb4523ct1645b98f4b284d43@mail.gmail.com>
	<4151f0acb1da52f12d3f08419d3171e9@ini.phys.ethz.ch>
	<320fb6e00908200748g78485c64kc19cea88c7c4cee@mail.gmail.com>
	<E3E486F3-8609-4291-BE56-D9D760FB8C87@ini.phys.ethz.ch>
Message-ID: <320fb6e00912220125w50a600c1xcf5e4750d70b39ca@mail.gmail.com>

Hi all,

Do any of our C experts know if this compilation warning is
important (under Linux Debain, query raised by Philipp Benner
who kindly packages Biopython for Debian, which also get
used on Ubuntu).

Thanks,

Peter

---------- Forwarded message ----------
From: Philipp Benner <philipp.benner at ini.phys.ethz.ch>
Date: Mon, Dec 21, 2009 at 6:34 PM
Subject: Debian python-biopython packaging for Biopython 1.53
To: Peter Cock <p.j.a.cock at googlemail.com>


Hey Peter,

I just uploaded the new release. Just a minor question:

dpkg-shlibdeps: warning: dependency on libpthread.so.0 could be
avoided if "debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Cluster/cluster.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Motif/_pwm.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/KDTree/_CKDTree.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/trie.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/PDB/mmCIF/MMCIFlex.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Cluster/cluster.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/trie.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Motif/_pwm.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cMarkovModel.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/PDB/mmCIF/MMCIFlex.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Nexus/cnexus.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cpairwise2.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/KDTree/_CKDTree.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Nexus/cnexus.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cpairwise2.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cMarkovModel.so"
were not uselessly linked against it (they use none of its symbols).

is this true? It might also be an error of dpkg-shlibdeps.

Regards,
Philipp

From biopython at maubp.freeserve.co.uk  Tue Dec 22 07:14:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 12:14:32 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
	<320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
	<320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>
Message-ID: <320fb6e00912220414t6429f1e5n792e5feeecbe633f@mail.gmail.com>

On Mon, Dec 21, 2009 at 4:48 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> So, how about a merger of (1) and (3)? i.e.
>
> * The CONTRIBUTORS file remains a single alphabetical list
> of all contributors to date (no change).
> * Entries in the NEWS file for new features etc may continue
> to credit authors as appropriate.
> * The NEWS file will include at the end of each release section
> an alphabetical list of contributors for that release (with new
> contributors flagged). This will be re-used in the release notice.

I've done that in github - how do the NEWS and CONTRIB file look?

http://github.com/biopython/biopython/commit/86d8d99aab894ab5f32a0e7a0c45d63a441da645

I haven't automatically included email addresses for the new contributors
since there is a risk of them being harvested for spam, so I figure that
should be "opt in".

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec 22 10:34:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 15:34:37 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
	<320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
Message-ID: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>

On Sun, Dec 20, 2009 at 6:06 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote:
>> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote:
>>>
>>> This is a vague idea (which I haven't tried yet), but maybe the
>>> Bio.SeqIO.index() function could take an optional argument
>>> (gzip=True, or something more general like archive=...) which
>>> would cause the file to be opened via the gzip module instead?
>>
>> Or: open=open -- accept a function that opens the file; by default, the
>> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
>> user-defined function to open zip files (since that's less straightforward).
>
> That's what I had in mind with the "archive=..." bit (I should have
> been clearer), but "open" is probably a better name for it (assuming
> it isn't going to become a reserved word in future versions of Python).

Proof of concept on github:
http://github.com/peterjc/biopython/tree/index-zip

This is using open_function as the new argument name (to match
the existing key_function and avoid any confusion with the built in
name open). I'm open to debate on this.

Points to note, this is untested on Windows. In particular we need
to look at gzipped plain text files using DOS/Windows new lines
(rare case?) plus gzipped plain text files using Unix new lines
(likely to be the more common of the two I'd expect). From my
initial checks, while gzip.open() does take a mode argument it
doesn't seem to support the "rU" value for universal new line
read mode. This spoils my plan to give the open_function both
the filename and the desired mode (generally "rU", but for SFF
files etc we will want to use "rb").

Peter

From biopython at maubp.freeserve.co.uk  Tue Dec 22 11:08:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 16:08:50 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
	<320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
	<320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>
Message-ID: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com>

On Tue, Dec 22, 2009 at 3:34 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Points to note, this is untested on Windows. In particular we need
> to look at gzipped plain text files using DOS/Windows new lines
> (rare case?) plus gzipped plain text files using Unix new lines
> (likely to be the more common of the two I'd expect). From my
> initial checks, while gzip.open() does take a mode argument it
> doesn't seem to support the "rU" value for universal new line
> read mode. This spoils my plan to give the open_function both
> the filename and the desired mode (generally "rU", but for SFF
> files etc we will want to use "rb").

The gzip mode issue is interesting... running on the Mac,
Leopard 10.5, using the Apple provided Python 2.5.2,
looking at a gzipped QUAL file everything is fine:

Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open("Quality/example.qual.gz", "r").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'
>>> gzip.open("Quality/example.qual.gz", "rb").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'
>>> gzip.open("Quality/example.qual.gz", "rU").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'

Looking at a gzipped FASTA file everything is fine:

>>> gzip.open("Quality/example.fasta.gz", "r").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'
>>> gzip.open("Quality/example.fasta.gz", "rb").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'
>>> gzip.open("Quality/example.fasta.gz", "rU").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'

But, there is a problem with my gzipped FASTQ file:

>>> gzip.open("Quality/example.fastq.gz", "r").read()
'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n'
>>> gzip.open("Quality/example.fastq.gz", "rb").read()
'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n'
>>> gzip.open("Quality/example.fastq.gz", "rU").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 220, in read
    self._read(readsize)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 292, in _read
    self._read_eof()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 311, in _read_eof
    raise IOError, "CRC check failed"
IOError: CRC check failed

I may have stumbled on a bug in the Python gzip library :(

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 07:00:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 07:00:56 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241200.nBOC0ukq031745@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #10 from ibdeno at gmail.com  2009-12-24 07:00 EST -------
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> I was also able to confirmed the problem is present in blastpgp 2.2.22,
> however it seems to have been fixed in the "new" BLAST+ suite, psiblast
> 2.2.22+ as described here:
> http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html
> 
> Given this new information, this does look like an NCBI BLAST bug, and not
> a problem in Biopython itself. We *might* be able to get our parser to cope
> with the funny BLAST output, but it does look difficult and risky to me.
> 

I think the best strategy will be to use the BLAST+ suite, since the "old"
programs will be abandoned, as I learnt from NCBI. Also, I think we should use
XML output.  I know I promised to work on testing that, but I don't think I
will able to do the test before Februare...

> Miguel - Is it possible the BLAST bug is relatively recent and first showed
> up when you updated blastpgp to 2.2.18?
> 

I had been using 2.2.18 for quite a while (months) and never had a problem. I
think I initially thought it might be a problem with the actual databases, more
than with the program...

Best regards,


Miguel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 10:25:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 10:25:15 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241525.nBOFPFxH003980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2009-12-24 10:25 EST -------
>From testing the current flex-based MMCIF parser, it seems that it is not quite
complete. I don't think it is necessary to be backwards compatible with it. I
rather have a well-designed MMCIF parser written independently, like the one by
Paul, and have it replace the current MMCIF parser over time. This also allows
us to have the design of the new parser more consistent with other Biopython
modules.

To do so, I suggest to have the new MMCIF parser in a new module MMCIF.py under
Bio.PDB, and let it coexist with the current MMCIF parser for the time being.

Since the new MMCIF parser does not use flex, I would think that the previous
division into MMCIF2Dict and MMCIFParser may not be needed for the new parser.
Paul, do you agree? Can the new parser live in a single MMCIF.py module?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 10:37:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 10:37:08 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241537.nBOFb83e004255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #15 from TallPaulInJax at yahoo.com  2009-12-24 10:37 EST -------
Hi Michiel,

"I have a well-designed MMCIF parser written independently": Very interesting!
Is it written solely in Python as well? I will say the parser I wrote is slower
than I would like, so if you have an alternative?

"Since the new MMCIF parser does not use flex, I would think that the previous
division into MMCIF2Dict and MMCIFParser may not be needed for the new parser."
I'm not expert enough in Python and in BioPython to know the correct call here.
Perhaps Peter could answer this? I personally like the separation of concerns
so that if someone else wanted to write their own parser, the code is modular
in nature and supports doing that.

Thanks for your help, Michiel!

Paul


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 26 05:08:05 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 26 Dec 2009 05:08:05 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912261008.nBQA85So025649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp  2009-12-26 05:08 EST -------
(In reply to comment #15)
> "I have a well-designed MMCIF parser written independently": Very interesting!
Actually I wrote "I *rather* have....". I don't have an MMCIF parser myself; I
was referring to your parser.

Btw, could you add a test case for the MMCIF parser (using some small data file
that we can include with the Biopython distribution)? Tests are not just
important to make sure everything works; often they are a very good example of
how the code works.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From eric.talevich at gmail.com  Mon Dec 28 20:51:40 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 28 Dec 2009 17:51:40 -0800
Subject: [Biopython-dev] Code review request for phyloxml branch
In-Reply-To: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com>
References: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com>
Message-ID: <3f6baf360912281751g5152a945p951dbbbcbffbddb1@mail.gmail.com>

Hi folks,

Here's an update on the status of Bio.Tree and TreeIO. I think I've taken
care of most of the blockers since the last review in September.

First, some links:
http://github.com/etal/biopython/tree/phyloxml/Bio/Tree/
http://github.com/etal/biopython/tree/phyloxml/Bio/TreeIO/
http://github.com/etal/biopython/tree/phyloxml/Tests/test_PhyloXML.py
http://github.com/etal/biopython/tree/phyloxml/Tests/test_Tree.py
http://biopython.org/wiki/PhyloXML

Discussion:

*TreeIO*
Conversion between Nexus, Newick and phyloXML tree file formats works; the
read/parse/write functions for each IO format use the same object types.
Neat!

The tree annotations (e.g. id) aren't preserved perfectly during conversions
-- I'll keep working on this, but I don't think it's a blocker. The taxon
names of terminal nodes are kept as "clade" names in phyloXML for
round-tripping. Tree topology and branch lengths seem OK.

Under the hood:
-- PhyloXMLIO is from GSoC
-- NewickIO is ported from the Bio.Nexus.Trees parser. I think it works the
same way.
-- NexusIO relies on Bio.Nexus.Nexus for parsing, then converts the
resulting Nexus.Trees.Tree objects to Bio.Tree.Newick objects. One day, when
Nexus.Trees is replaced by NewickIO in the main Nexus parser, then this
conversion can be dropped and NexusIO will be very simple.

*Tree*
The BaseTree object structure looks like this:*

-- BaseTree.**Tree* contains global tree information, like whether the tree
is rooted, and a reference to the root clade. The phyloXML Phylogeny object
inherits from this.*

-- BaseTree.**Subtree* contains local (clade- or node-specific) information,
and references to each of its direct descendents, recursively. The phyloXML
Clade object inherits from this. Nodes are implicit. I could add references
to the ancestor of each sub-tree without too much difficulty, but I haven't
needed them yet.

The same methods (get_terminals et al.) generally apply to both classes, so
I created a separate TreeMixin class from which both BaseTree.Tree and
BaseTree.Subtree inherit.

Bio.Tree.Newick contains simple subclasses of Tree and Subtree, and an
incomplete set of shims that track Bio.Nexus.Trees.Tree (minus the I/O).
This is to ease the deprecation and eventual replacement of Bio.Nexus.Trees,
as I imagine it:
(1) Port methods from Nexus.Trees to Bio.Tree, simplifying arguments where
reasonable (since the node IDs and adjacency list lookup are no longer
needed)
(2) Implement methods in Bio.Tree.Newick with the original argument lists,
but triggering a deprecation warning indicating the newer replacement method
(3) Replace Nexus.Trees with an import of Bio.Tree.Newick(IO) and a few more
shims to duplicate the original API -- so test_Nexus.py should still pass,
ideally (with deprecation warnings)
(4) In Nexus.Nexus, replace all usage of Nexus.Trees with proper usage of
NexusIO and Bio.Tree methods.
(5) Eventually delete Nexus.Trees and the shims in Bio.Tree.Newick.

I'm currently doing (1) and (2), with more emphasis on getting (1) right.
Not all of the important methods have been ported, but I'm happy with the
tree traversal methods.
*
Tests
*I created test_Tree.py to test the methods in Bio.Tree.BaseTree;
test_PhyloXML.py tests Bio.Tree.PhyloXML objects and Bio.TreeIO.PhyloXMLIO
parsing/writing.

I noticed that in Tests/Nexus/, the example file for internal node labels is
actually in Newick/NH format, not Nexus. That was briefly confusing, so
maybe that file should be renamed.

What do you think?

All the best,
Eric

From bugzilla-daemon at portal.open-bio.org  Tue Dec  1 12:28:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Dec 2009 07:28:33 -0500
Subject: [Biopython-dev] [Bug 2957] GenBank Writer Should Write Out Date
In-Reply-To: <bug-2957-42@http.bugzilla.open-bio.org/>
Message-ID: <200912011228.nB1CSXec001831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2957


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-01 07:28 EST -------
A slightly more robust version of this has been checked in. Future work could
handle date/time objects. Please reopen this bug if there are any problems.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Dec  1 19:34:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 1 Dec 2009 19:34:19 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utility Policy
	Change
In-Reply-To: <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov>
	<320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com>
Message-ID: <320fb6e00912011134u2481644aw5dfdfe9f9a3049f0@mail.gmail.com>

Hi all,

Attention NCBI Entrez users - the NCBI really do want you to include
your email address, and it will be mandatory in future! See below...

If using Bio.Entrez, the tool parameter will by default be set to
Biopython, but the email is omitted. We already encourage the email
to be included in our documentation but given the new NCBI guidance
I'd suggest we make omitting the email issue a warning in the next
release (and an error in the subsequent release of Biopython?).

Peter


---------- Forwarded message ----------
From: ?<utilities-announce at ncbi.nlm.nih.gov>
Date: Tue, Dec 1, 2009 at 6:59 PM
Subject: [Utilities-announce] NCBI E-Utility Policy Change
To: utilities-announce at ncbi.nlm.nih.gov


As part of an ongoing effort to ensure efficient access to the Entrez
Utilities (E-utilities) by all users, NCBI has decided to change the
usage policy for the E-utilities effective June 1, 2010. Effective on
June 1, 2010, all E-utility requests, either using standard URLs or
SOAP, must contain non-null values for both the &tool and &email
parameters. Any E-utility request made after June 1, 2010 that does
not contain values for both parameters will return an error explaining
that these parameters must be included in E-utility requests.


The value of the &tool parameter should be a URI-safe string that is
the name of the software package, script or web page producing the
E-utility request.


The value of the &email parameter should be a valid e-mail address for
the appropriate contact person or group responsible for maintaining
the tool producing the E-utility request.


NCBI uses these parameters to contact users whose use of the
E-utilities violates the standard usage policies described at
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements.
These usage policies are designed to prevent excessive requests from a
small group of users from reducing or eliminating the wider
community's access to the E-utilities. NCBI will attempt to contact a
user at the e-mail address provided in the &email parameter prior to
blocking access to the E-utilities.


NCBI realizes that this policy change will require many of our users
to change their code. Based on past experience, we anticipate that
most of our users should be able to make the necessary changes before
the June 1, 2010 deadline. If you have any concerns about making these
changes by that date, or if you have any questions about these
policies, please contact eutilities at ncbi.nlm.nih.gov.


Thank you for your understanding and cooperation in helping us
continue to deliver a reliable and efficient web service.


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
-------------- next part --------------
_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

From chapmanb at 50mail.com  Wed Dec  2 12:57:44 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 2 Dec 2009 07:57:44 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com>
	<20090406220826.GH43636@sobchak.mgh.harvard.edu>
	<320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
Message-ID: <20091202125744.GA46415@sobchak.mgh.harvard.edu>

Hi Peter;

> Brad has some GFF parsing code he as been working on, which
> would be nice to merge into Biopython at some point. See:
> 
> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html
> 
> As we started to discuss earlier this year, we need to think about
> what to do with the existing (old) Bio.GFF module. This was written
> by Michael Hoffman back in 2002 which accesses MySQL General
> Feature Format (GFF) databases created with BioPerl.
> 
> I've been looking at the old Bio.GFF code, and there are a lot of
> redundant things like its own GenBank/EMBL location parsing,
> plus its own location objects and its own Feature objects (rather
> than reusing Bio.SeqFeature which should have sufficed).

I'm ambivalent on deprecating GFF. Agreed that some of it is not
well integrated with the rest of Biopython, with the
Location/LocationFromString code being the most duplicated. It's too
bad feature were reimplemented as well. Is Michael around at all?

> I want to suggest we deprecate Michael Hoffman's Bio.GFF module
> in Biopython 1.53 (I'm hoping we can do this next month, Dec 2009).
> Depending on how soon Brad's code is ready to be merged (which I
> am assuming could be Biopython 1.54, spring 2010), we can perhaps
> accelerate removal of the old module.

The current structure of the GFF code does not require removing what
is currently there. It needs a couple of lines in __init__.py to
expose the useful classes at the top level:

from GFFParser import GFFParser, DiscoGFFParser, GFFExaminer
from GFFOutput import GFF3Writer

and we'd need to move the MySQLdb check to the Connection class so
it's only needed if you are actually using the database code.

So these can happen in parallel. Ideally, I'd like to get the GFF
stuff in sooner rather than later. The main item on my todo list is
finishing the documentation, with the stubs here:

http://biopython.org/wiki/GFF_Parsing

If I crank that out what do we think about putting it in with the
__init__.py modifications I suggested?

Brad


From mjldehoon at yahoo.com  Wed Dec  2 14:29:27 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Dec 2009 06:29:27 -0800 (PST)
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
Message-ID: <317375.58712.qm@web62401.mail.re1.yahoo.com>

--- On Wed, 12/2/09, Brad Chapman <chapmanb at 50mail.com> wrote:
> If I crank that out what do we think about putting it in
> with the __init__.py modifications I suggested?

I'd definitely welcome a GFF parser in Biopython, but I think the current code needs to be simplified and its usage more consistent with other Biopython modules. It's great that the documentation is available. It's a big help in designing the module, in particular what its usage looks like to the user.

Let's start from basic GFF parsing. This is the example in the documentation:

>>> from BCBio.GFF import GFFParser
>>> in_file = "your_file.gff"
>>> parser = GFFParser()
>>> in_handle = open(in_file)
>>> for rec in parser.parse(in_handle):
...    print rec
>>> in_handle.close()

What is the purpose of creating the parser first, and then calling parser.parse on the in_handle? I'd much rather have

>>> from BCBio import GFF
>>> in_file = "your_file.gff"
>>> in_handle = open(in_file)
>>> for rec in GFF.parse(in_handle):
...    print rec
>>> in_handle.close()

which is how most other Biopython parsers work.

--Michiel.


From chapmanb at 50mail.com  Thu Dec  3 14:25:34 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 3 Dec 2009 09:25:34 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <317375.58712.qm@web62401.mail.re1.yahoo.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
Message-ID: <20091203142534.GF51407@sobchak.mgh.harvard.edu>

Hi Michiel;

> > If I crank that out what do we think about putting it in
> > with the __init__.py modifications I suggested?
> 
> I'd definitely welcome a GFF parser in Biopython, but I think the
> current code needs to be simplified and its usage more consistent
> with other Biopython modules. It's great that the documentation is
> available. It's a big help in designing the module, in particular what
> its usage looks like to the user.

Awesome. I welcome these suggestions; it's really helpful to have
fresh eyes looking at it. Hopefully moving it into Biopython will
stimulate that.

> Let's start from basic GFF parsing. This is the example in the documentation:
[...]
> What is the purpose of creating the parser first, and then calling
> parser.parse on the in_handle? I'd much rather have
>
> >>> from BCBio import GFF
> >>> in_file = "your_file.gff"
> >>> in_handle = open(in_file)
> >>> for rec in GFF.parse(in_handle):
> ...    print rec
> >>> in_handle.close()

Great -- done for parsing and writing and committed to GitHub. The
documentation is updated as well.

Happy to get other comments and thoughts. Thanks again,
Brad


From biopython at maubp.freeserve.co.uk  Thu Dec  3 14:53:44 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 14:53:44 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091203142534.GF51407@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>

On Thu, Dec 3, 2009 at 2:25 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Great -- done for parsing and writing and committed to GitHub. The
> documentation is updated as well.
>
> Happy to get other comments and thoughts. Thanks again,
>

I understand that GFF files are complex, and a simple "record
iterator" isn't flexible enough to cover all use cases - hence the
need for a complex parser class. That said, Michiel is right that
GFF.parse() or GFF.read() functions would be consistent with
other bits of Biopython, and would provide for the simple use
cases.

Looking at your code, BCBio.GFF.parse(...) would return
SeqRecord objects (with SeqFeatures). That seems
redundant to me as one expect people to just use
Bio.SeqIO.parse(handle, "gff3") instead. I would instead
have expected BCBio.GFF.parse(...) to iterate over the
features in a GFF file.

Also, and we'd touched on this before - I'd much prefer to
have the GFF module quite "low level" using either new
GFF-specific classes or simple Python objects (e.g. for
each feature use a tuple of ints and strings for the first
feature columns plus a dict for the final extendible
column of annotation).

>From a technical point of view, a justification for this
separation is the GFF details are not a perfect fit to the
SeqRecord and SeqFeature objects and forcing their
use adds unnecessary overheads for people wanting
to work directly with the features themselves.

Also, by splitting the code into basic parsing and a
SeqRecord/SeqFeature conversion layer (which I
would put in Bio/SeqIO/GffIO.py) we can add the
code in two steps (first GFF parsing, then SeqIO
support).

I think this split is useful as this is a very big job to do
properly: Once we have GFF to SeqRecord parsing,
we need to try and ensure that it is compatible with the
GenBank to SeqRecord parsing. This is important as
we would in effect be extending Biopython to allow
GFF3 to GenBank conversions. For testing all this,
we can grab the same data in the two file formats
(e.g. from the NCBI) and perhaps also use EMBOSS.

You may recall we talked to Peter Rice (from EMBOSS)
about this - there are some important issues here like
ontology mapping where we should be able to reuse a
lot of the work EMBOSS has already done (and use the
EMBOSS tools to help validate our mapping).

i.e. While I may be being overly cautious, I think that
while adding GFF parsing and GFF to SeqRecord
mapping is very important, it is also very complex.
Therefore breaking this into a two stage task makes
managing and testing it easier - as well as seeming
a good idea for the code itself.

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Dec  3 15:03:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Dec 2009 10:03:29 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912031503.nB3F3Tu8013049@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-03 10:03 EST -------
Brad,

Now that Chris at BioPerl is interested, I am confident
we can get the SQLite schema into BioSQL in the near future:
http://lists.open-bio.org/pipermail/biosql-l/2009-November/001655.html

Do you want to update your patch (if needed) and put this
up on a Biopython branch in github? How soon do you think
it could be ready to merge? It would be nice to have this
in the next release (even if we put a bug "beta" warning in)?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Dec  3 15:30:54 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 3 Dec 2009 15:30:54 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com>
	<20090406220826.GH43636@sobchak.mgh.harvard.edu>
	<320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com>
	<20091202125744.GA46415@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912030730rb66c2dav1993465ba25f9f5f@mail.gmail.com>

On Wed, Dec 2, 2009 at 12:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
>
>> Brad has some GFF parsing code he as been working on, which
>> would be nice to merge into Biopython at some point. See:
>>
>> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html
>>
>> As we started to discuss earlier this year, we need to think about
>> what to do with the existing (old) Bio.GFF module. This was written
>> by Michael Hoffman back in 2002 which accesses MySQL General
>> Feature Format (GFF) databases created with BioPerl.
>>
>> I've been looking at the old Bio.GFF code, and there are a lot of
>> redundant things like its own GenBank/EMBL location parsing,
>> plus its own location objects and its own Feature objects (rather
>> than reusing Bio.SeqFeature which should have sufficed).
>
> I'm ambivalent on deprecating GFF. Agreed that some of it is not
> well integrated with the rest of Biopython, with the
> Location/LocationFromString code being the most duplicated. It's too
> bad feature were reimplemented as well. Is Michael around at all?

I got in touch with Michael Hoffman - he has moved from the EBI to
the University of Washington but his EBI email address still worked.
He said:

"Please feel free to deprecate the module or make any
necessary changes for the project."

Even if you (Brad) didn't have a new GFF parser waiting to be
added to Biopython, I would still want to do something with
Bio.GFF to reduce the redundancy of location and feature code.
Deprecation is the simplest solution (but I may be able to
reuse some of his location string parsing code on Bug 2738).

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Dec  3 15:32:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Dec 2009 10:32:31 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200912031532.nB3FWV7G013739@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-03 10:32 EST -------
Note - we may be able to reuse some of the location string parsing ideas in
Bio/GFF/easy.py here too...

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 12:31:51 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 07:31:51 -0500
Subject: [Biopython-dev] [Bug 2961] New: Adding undocumented file format
	switches to MUSCLE wrapper
Message-ID: <bug-2961-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961

           Summary: Adding undocumented file format switches to MUSCLE
                    wrapper
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


As discussed on the mailing list, and confirmed with MUSCLE author Robert
Edgar,
there are a number of useful command line arguments for things like PHYLIP
output
(both interlaced and sequential) which the Bio.Align.Applications wrapper does
not support. See:
http://lists.open-bio.org/pipermail/biopython/2009-December/005881.html

We should add these.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 12:50:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 07:50:25 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041250.nB4CoP72007627@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #1 from cymon.cox at gmail.com  2009-12-04 07:50 EST -------
Created an attachment (id=1408)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1408&action=view)
Add PHYLIP output to Muscle command line interface


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 13:14:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:14:08 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041314.nB4DE8aA008792@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1408 is|0                           |1
           obsolete|                            |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-04 08:14 EST -------
(From update of attachment 1408)
Patch applied.

Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
all take a filename)?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 13:21:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:21:52 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041321.nB4DLqsd009037@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #3 from cymon.cox at gmail.com  2009-12-04 08:21 EST -------
(In reply to comment #2)
> (From update of attachment 1408 [details])
> Patch applied.
> 
> Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
> all take a filename)?

! Is there anything else undocumented?

OK, I'll do that asap. I'll also add tests - change test suite to use
subprocess module etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  4 13:36:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Dec 2009 08:36:11 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912041336.nB4DaBvS009365@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-04 08:36 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 1408 [details] [details])
> > Patch applied.
> > 
> > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which
> > all take a filename)?
> 
> ! Is there anything else undocumented?

Robert did imply there could be other things as his documentation
was out of sync with the code :(

These after of limited value given you can use "-phyi -out filename.phy"
as an alternative to "-phyiout filename.phy" however one bonus feature
is these options allow you to get multiple output files in one run (e.g.
both HTML output to inspect visually and ClustalW output to parse).

> OK, I'll do that asap. I'll also add tests - change test suite to use
> subprocess module etc.

I'd forgotten about that (using subprocess rather than generic_run
in our unit tests). Could you do that as a separate patch please?

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chapmanb at 50mail.com  Fri Dec  4 13:40:10 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 4 Dec 2009 08:40:10 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
Message-ID: <20091204134010.GK51407@sobchak.mgh.harvard.edu>

Hi all;
Peter, thanks for the feedback. Thoughts below.

> Looking at your code, BCBio.GFF.parse(...) would return
> SeqRecord objects (with SeqFeatures). That seems
> redundant to me as one expect people to just use
> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
> have expected BCBio.GFF.parse(...) to iterate over the
> features in a GFF file.

This would work for simple cases, but for most real life cases you
will likely want to limit the file to a subset of things you are
interested in. It helps reduce memory problems, and is equivalent to
a track system view in UCSC or Ensembl. I find it very useful for
all of the work I've done with it.

We could use SeqIO here, but then there is the issue of passing
along the additional arguments. The simplicity of SeqIO is really
nice, so not sure if we'd want to clutter SeqIO with it.

So we could support basic parsing in SeqIO, but it would be useful to
have this GFF specific parsing as the additional arguments will be a
regular use case.

> Also, and we'd touched on this before - I'd much prefer to
> have the GFF module quite "low level" using either new
> GFF-specific classes or simple Python objects (e.g. for
> each feature use a tuple of ints and strings for the first
> feature columns plus a dict for the final extendible
> column of annotation).

Yes, it is implemented this way. The parse_simple function returns
a line by line parse of the file as a dictionary, which is then used
to build up the SeqFeature objects:

http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py

We can document and flesh that out, although I'm not really sure how
useful it will be. It's pretty easy to build your own simple
line-by-line GFF parser; the only advantage of this code over a
home-brew is that it handles tricky annotation cases.

For all of my uses, the real win was being able to build up the
multiple transcript exon/intron structures from the file. This is
not trivial to do on your own, and the real win of the code is in
handling this, especially for older GFF2 and GTF formatted files.

> From a technical point of view, a justification for this
> separation is the GFF details are not a perfect fit to the
> SeqRecord and SeqFeature objects and forcing their
> use adds unnecessary overheads for people wanting
> to work directly with the features themselves.

Why are SeqRecord and SeqFeature not appropriate for GFF? We could 
improve them to make things more lightweight, as we discussed
previously, but conceptually the values fit into the framework fine.

> Also, by splitting the code into basic parsing and a
> SeqRecord/SeqFeature conversion layer (which I
> would put in Bio/SeqIO/GffIO.py) we can add the
> code in two steps (first GFF parsing, then SeqIO
> support).

We can do this as is. I'm not suggesting SeqIO support right now,
and want to target getting the GFF parser as is into Biopython.

> I think this split is useful as this is a very big job to do
> properly: Once we have GFF to SeqRecord parsing,
> we need to try and ensure that it is compatible with the
> GenBank to SeqRecord parsing. This is important as
> we would in effect be extending Biopython to allow
> GFF3 to GenBank conversions. For testing all this,
> we can grab the same data in the two file formats
> (e.g. from the NCBI) and perhaps also use EMBOSS.

Do you think GFF to GenBank is a common use case? Agreed that it is
very hard, but this really had less to do with the object
structure in Biopython and more to do with how things 
are represented and named in the original source files. GenBank has
some "consistency" since it is produced mostly by NCBI, but GFF
files are all over the place.

This can be tackled later if someone wants, but right now my goals
are simply:

- Produce Biopython objects from GFF3/GTF/GFF2 files
- Represent nested features
- Allow GFF2/GTF to GFF3 conversion

This should be done with the current code. We can formalize the raw
parse_simple output for the line-by-line if people find it useful,
but otherwise we should leave these bigger projects for down the
line.

Brad


From biopython at maubp.freeserve.co.uk  Fri Dec  4 14:25:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Dec 2009 14:25:40 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091204134010.GK51407@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>

On Fri, Dec 4, 2009 at 1:40 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> Peter, thanks for the feedback. Thoughts below.
>
>> Looking at your code, BCBio.GFF.parse(...) would return
>> SeqRecord objects (with SeqFeatures). That seems
>> redundant to me as one expect people to just use
>> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
>> have expected BCBio.GFF.parse(...) to iterate over the
>> features in a GFF file.
>
> This would work for simple cases, but for most real life cases you
> will likely want to limit the file to a subset of things you are
> interested in. It helps reduce memory problems, and is equivalent to
> a track system view in UCSC or Ensembl. I find it very useful for
> all of the work I've done with it.

Understood - a feature returning Bio.GFF.parse() function could
take various arguments, or for full flexibility, the user can use the
parser object directly.

> We could use SeqIO here, but then there is the issue of passing
> along the additional arguments. The simplicity of SeqIO is really
> nice, so not sure if we'd want to clutter SeqIO with it.
>
> So we could support basic parsing in SeqIO, but it would be useful to
> have this GFF specific parsing as the additional arguments will be a
> regular use case.

This is already catered for in that Bio.SeqIO.parse() and read()
don't take arbitrary arguments (currently), but the underlying
Bio.SeqIO.XxxxIO.XxxIterator() they invoke may do so. i.e. You
could have Bio.SeqIO.GffIO.GffIterator() and perhaps variants
(e.g. GFF2 vs GFF3) which take filtering arguments.

>> Also, and we'd touched on this before - I'd much prefer to
>> have the GFF module quite "low level" using either new
>> GFF-specific classes or simple Python objects (e.g. for
>> each feature use a tuple of ints and strings for the first
>> feature columns plus a dict for the final extendible
>> column of annotation).
>
> Yes, it is implemented this way. The parse_simple function returns
> a line by line parse of the file as a dictionary, which is then used
> to build up the SeqFeature objects:
>
> http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py
>
> We can document and flesh that out, although I'm not really sure how
> useful it will be. It's pretty easy to build your own simple
> line-by-line GFF parser; the only advantage of this code over a
> home-brew is that it handles tricky annotation cases.

I still think it would be useful to have Bio/GFF/Parser.py (or
similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
similar) to turn this into SeqRecord and SeqFeature objects.

> For all of my uses, the real win was being able to build up the
> multiple transcript exon/intron structures from the file. This is
> not trivial to do on your own, and the real win of the code is in
> handling this, especially for older GFF2 and GTF formatted files.
>
>> From a technical point of view, a justification for this
>> separation is the GFF details are not a perfect fit to the
>> SeqRecord and SeqFeature objects and forcing their
>> use adds unnecessary overheads for people wanting
>> to work directly with the features themselves.
>
> Why are SeqRecord and SeqFeature not appropriate for GFF? We could
> improve them to make things more lightweight, as we discussed
> previously, but conceptually the values fit into the framework fine.

The nested features that worry me. Perhaps the existing
location operator (e.g. "join") could be set to something
like "parent/child" if the subfeatures is used to hold child
features rather than the elements of a join? We need
the GenBank output code etc to be able to tell these
apart reliably.

>> Also, by splitting the code into basic parsing and a
>> SeqRecord/SeqFeature conversion layer (which I
>> would put in Bio/SeqIO/GffIO.py) we can add the
>> code in two steps (first GFF parsing, then SeqIO
>> support).
>
> We can do this as is. I'm not suggesting SeqIO support right now,
> and want to target getting the GFF parser as is into Biopython.

My point is the moment you include GFF -> SeqRecord
code (even if not explicitly via the Bio.SeqIO namespace)
it opens us up to people giving these SeqRecord objects
to SeqIO for output (e.g. as GenBank).

>> I think this split is useful as this is a very big job to do
>> properly: Once we have GFF to SeqRecord parsing,
>> we need to try and ensure that it is compatible with the
>> GenBank to SeqRecord parsing. This is important as
>> we would in effect be extending Biopython to allow
>> GFF3 to GenBank conversions. For testing all this,
>> we can grab the same data in the two file formats
>> (e.g. from the NCBI) and perhaps also use EMBOSS.
>
> Do you think GFF to GenBank is a common use case?

I suspect its something I'd want to do it when working with
new genome annotations. GeneMark produces GFF, while
Prodigal produces (simple) GenBank. The SOLiD pipeline
corona produces GFF. Sometimes you can get both, the
tool RAST outputs GenBank, GFF, GTF and EMBL files.

> Agreed that it is very hard, but this really had less to do
> with the object structure in Biopython and more to do
> with how things are represented and named in the
> original source files. GenBank has some "consistency"
> since it is produced mostly by NCBI, but GFF files are
> all over the place.
>
> This can be tackled later if someone wants, but right
> now my goals are simply:
>
> - Produce Biopython objects from GFF3/GTF/GFF2 files
> - Represent nested features
> - Allow GFF2/GTF to GFF3 conversion
>
> This should be done with the current code. We can
> formalize the raw parse_simple output for the line-by-line
> if people find it useful, but otherwise we should leave
> these bigger projects for down the line.

Worth goals, but if by "Produce Biopython objects from
GFF3/GTF/GFF2 files" you mean SeqRecords with
SeqFeatures, (as I said above) we are opening up the
GFF to GenBank can of worms. There is no "later" :(

Peter


From mjldehoon at yahoo.com  Sat Dec  5 15:54:19 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Dec 2009 07:54:19 -0800 (PST)
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
Message-ID: <983129.2133.qm@web62408.mail.re1.yahoo.com>

I didn't realize that the GFF parser returns SeqRecords. I agree with Peter that a parser returning SeqRecords should be accessed through Bio.SeqIO, while a lower-level parser can live in Bio.GFF.

--Michiel

--- On Thu, 12/3/09, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Bio.GFF and Brad's code
> To: "Brad Chapman" <chapmanb at 50mail.com>, biopython-dev at lists.open-bio.org
> Date: Thursday, December 3, 2009, 9:53 AM
> On Thu, Dec 3, 2009 at 2:25 PM, Brad
> Chapman <chapmanb at 50mail.com>
> wrote:
> >
> > Great -- done for parsing and writing and committed to
> GitHub. The
> > documentation is updated as well.
> >
> > Happy to get other comments and thoughts. Thanks
> again,
> >
> 
> I understand that GFF files are complex, and a simple
> "record
> iterator" isn't flexible enough to cover all use cases -
> hence the
> need for a complex parser class. That said, Michiel is
> right that
> GFF.parse() or GFF.read() functions would be consistent
> with
> other bits of Biopython, and would provide for the simple
> use
> cases.
> 
> Looking at your code, BCBio.GFF.parse(...) would return
> SeqRecord objects (with SeqFeatures). That seems
> redundant to me as one expect people to just use
> Bio.SeqIO.parse(handle, "gff3") instead. I would instead
> have expected BCBio.GFF.parse(...) to iterate over the
> features in a GFF file.
> 
> Also, and we'd touched on this before - I'd much prefer to
> have the GFF module quite "low level" using either new
> GFF-specific classes or simple Python objects (e.g. for
> each feature use a tuple of ints and strings for the first
> feature columns plus a dict for the final extendible
> column of annotation).
> 
> >From a technical point of view, a justification for
> this
> separation is the GFF details are not a perfect fit to the
> SeqRecord and SeqFeature objects and forcing their
> use adds unnecessary overheads for people wanting
> to work directly with the features themselves.
> 
> Also, by splitting the code into basic parsing and a
> SeqRecord/SeqFeature conversion layer (which I
> would put in Bio/SeqIO/GffIO.py) we can add the
> code in two steps (first GFF parsing, then SeqIO
> support).
> 
> I think this split is useful as this is a very big job to
> do
> properly: Once we have GFF to SeqRecord parsing,
> we need to try and ensure that it is compatible with the
> GenBank to SeqRecord parsing. This is important as
> we would in effect be extending Biopython to allow
> GFF3 to GenBank conversions. For testing all this,
> we can grab the same data in the two file formats
> (e.g. from the NCBI) and perhaps also use EMBOSS.
> 
> You may recall we talked to Peter Rice (from EMBOSS)
> about this - there are some important issues here like
> ontology mapping where we should be able to reuse a
> lot of the work EMBOSS has already done (and use the
> EMBOSS tools to help validate our mapping).
> 
> i.e. While I may be being overly cautious, I think that
> while adding GFF parsing and GFF to SeqRecord
> mapping is very important, it is also very complex.
> Therefore breaking this into a two stage task makes
> managing and testing it easier - as well as seeming
> a good idea for the code itself.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From MatatTHC at gmx.de  Sun Dec  6 14:18:40 2009
From: MatatTHC at gmx.de (Matthias Bernt)
Date: Sun, 06 Dec 2009 15:18:40 +0100
Subject: [Biopython-dev] Genetic Code
Message-ID: <20091206141840.67400@gmx.net>

Hi, 

The genetic codes you provide in Bio.Data.CodonTable are somewhat out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. 

Regards, 
Matthias
-- 
GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


From biopython at maubp.freeserve.co.uk  Sun Dec  6 14:55:24 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 6 Dec 2009 14:55:24 +0000
Subject: [Biopython-dev] Genetic Code
In-Reply-To: <20091206141840.67400@gmx.net>
References: <20091206141840.67400@gmx.net>
Message-ID: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>

On Sun, Dec 6, 2009 at 2:18 PM, Matthias Bernt <MatatTHC at gmx.de> wrote:
> Hi,
>
> The genetic codes you provide in Bio.Data.CodonTable are somewhat
> out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code
> one start codon is missing.

Confirmed - could you file a bug please?
http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

Peter


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 15:07:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:07:23 -0500
Subject: [Biopython-dev] [Bug 2962] New: deprecated generic code
Message-ID: <bug-2962-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962

           Summary: deprecated generic code
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: MatatTHC at gmx.de


The genetic codes provided in Bio.Data.CodonTable are 
out of date. E.g. in the mitochondrial echinoderm (id 9) 
genetic code one start codon is missing.

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 15:07:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:07:43 -0500
Subject: [Biopython-dev] [Bug 2963] New: deprecated genetic code
Message-ID: <bug-2963-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2963

           Summary: deprecated genetic code
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: MatatTHC at gmx.de


The genetic codes provided in Bio.Data.CodonTable are 
out of date. E.g. in the mitochondrial echinoderm (id 9) 
genetic code one start codon is missing.

It looks like we have only got Version 3.4 (based on a visual
inspection), but the latest version is Version 3.9. We should
just need to re-run the script to generate these. Also the
original URL noted in the Biopython source code of
ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now
ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 15:35:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:35:09 -0500
Subject: [Biopython-dev] [Bug 2963] deprecated genetic code
In-Reply-To: <bug-2963-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061535.nB6FZ9qY029156@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2963


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 10:35 EST -------


*** This bug has been marked as a duplicate of bug 2962 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 15:35:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 10:35:21 -0500
Subject: [Biopython-dev] [Bug 2962] deprecated generic code
In-Reply-To: <bug-2962-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061535.nB6FZL0I029172@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 10:35 EST -------
*** Bug 2963 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 16:09:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 11:09:28 -0500
Subject: [Biopython-dev] [Bug 2962] deprecated generic code
In-Reply-To: <bug-2962-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061609.nB6G9Sk9030056@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2962


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 11:09 EST -------
The NCBI codon tables have been updated from version 3.4 to 3.9, which adds
a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23).
Note that Table 14 which used to be called "Flatworm Mitochondrial" is now
called "Alternative Flatworm Mitochondrial", and "Flatworm Mitochondrial" is
now an alias for Table 9 ("Echinoderm Mitochondrial").

See:
http://github.com/biopython/biopython/commit/74ba9d295b2cd6c6fa6862e91f1e1e59300deeb6

Marking as fixed - but feel free to reopen this is I missed anything.

Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Sun Dec  6 16:11:08 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 6 Dec 2009 16:11:08 +0000
Subject: [Biopython-dev] Genetic Code
In-Reply-To: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>
References: <20091206141840.67400@gmx.net>
	<320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com>
Message-ID: <320fb6e00912060811x1fc336ech6245221741372c62@mail.gmail.com>

On Sun, Dec 6, 2009 at 2:55 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Confirmed - could you file a bug please?
> http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython

Thanks - I was expecting to look at this next week, but had
some spare time this afternoon after all. It should be fixed,
you can grab the latest code and reinstall to test:
http://www.biopython.org/wiki/SourceCode

Peter


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 17:46:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 12:46:55 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061746.nB6Hkt7x032479@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #5 from cymon.cox at gmail.com  2009-12-06 12:46 EST -------
Created an attachment (id=1409)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view)
Patch for output file fomat options


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 18:50:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 13:50:08 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061850.nB6Io80P001234@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 13:50 EST -------
(In reply to comment #5)
> Created an attachment (id=1409)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) [details]
> Patch for output file fomat options
> 

Applied with minor changes to the docstrings - Bio.AlignIO will now
accept the default CLUSTALW output from MUSCLE as is. Thanks!

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 19:10:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:10:01 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061910.nB6JA1p3001668@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #7 from cymon.cox at gmail.com  2009-12-06 14:10 EST -------
Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view)
Change Application cmdline tests to use subprocess module


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 19:36:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:36:27 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061936.nB6JaRo0002258@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 14:36 EST -------
(In reply to comment #7)
> Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details]
> Change Application cmdline tests to use subprocess module
> 

Lovely - applied as is - thanks again :)

Did you want to add tests for the new MUSCLE output options, or can we close
this bug now Cymon?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 19:43:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:43:12 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061943.nB6JhCOd002510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


------- Comment #9 from cymon.cox at gmail.com  2009-12-06 14:43 EST -------
(In reply to comment #8)
> (In reply to comment #7)
> > Created an attachment (id=1410)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] [details]
> > Change Application cmdline tests to use subprocess module
> > 
> 
> Lovely - applied as is - thanks again :)
> 
> Did you want to add tests for the new MUSCLE output options, or can we close
> this bug now Cymon?

There's is one in the patch called: test_with_multiple_output_formats that
writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and
-clwstrict options.

I think it can be closed.
Cheers, C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec  6 19:47:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 6 Dec 2009 14:47:11 -0500
Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches
	to MUSCLE wrapper
In-Reply-To: <bug-2961-42@http.bugzilla.open-bio.org/>
Message-ID: <200912061947.nB6JlBHi002609@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2961


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-06 14:47 EST -------
(In reply to comment #9)
> > Did you want to add tests for the new MUSCLE output options, or can we
> > close this bug now Cymon?
> 
> There's is one in the patch called: test_with_multiple_output_formats that
> writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and
> -clwstrict options.

So there is - I missed that. Lovely :)

Marking bug as fixed.
Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 09:16:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 04:16:42 -0500
Subject: [Biopython-dev] [Bug 2964] New: placing x-axis of graph track at
	the bottom or top of the track
Message-ID: <bug-2964-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964

           Summary: placing x-axis of graph track at the bottom or top of
                    the track
           Product: Biopython
           Version: 1.52
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: Daniel.Nicorici at gmail.com


By default when one uses the graph track the axis is placed automatically in
the middle of the track (which is given by the mean of the all values which are
plotted).

It would be great if the x-axis of the graph track could be placed at the
bottom of the track also and the plotting of the values could be done
accordingly. This would allow one to plot for example the short-read coverage
in next-gen sequencing data.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 09:48:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 04:48:11 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912070948.nB79mBTh022941@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #1 from Daniel.Nicorici at gmail.com  2009-12-07 04:48 EST -------
This has feature has been added in:

http://github.com/ndaniel/biopython/tree/x-axis_GenomeDiagram/Bio/Graphics/GenomeDiagram/

Also, here a small additional bug has been fixed, i.e. the line/bar graphs are
drawn from the first element to the last element of the graph and not from the
origin to the end of the x-axis as it was original.

One can specify that the x-axis should be drawn at bottom of the track by
specifying the argument x_axis='bottom' for new_track, e.g.
gdt_features=gdd.new_track(2,x_axis='bottom').

Below one may find two examples where the x-axis is drawn in the middle (as it
is originally done by the GenomeDiagram) and bottom of the track (the new
feature added to GenomeDiagram).

====Example_1:_Using_Graph_from_GenomeDiagram_where_the_x-axis_is_at_the_middle_of_track(as_it_is_originally)=============================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()

# Add three features
feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2)
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.append((250,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
gds_features.new_graph(coverage, 'coverage', style='bar')

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
============================================


====Example_2:_Using_Graph_from_GenomeDiagram_where_x-axis_is_at_the_bottom_of_track=============================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()

# Add three features
feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2,x_axis='bottom')
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0))) # this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.append((250,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))# this is need in order skip the interpolation
done by GenomeDiagram for missing values
gds_features.new_graph(coverage, 'coverage', style='bar')

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
============================================

Best Regards,
Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 10:55:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 05:55:12 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071055.nB7AtCol024504@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-07 05:55 EST -------
I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
having the x-axis line at the middle y-value (center or centre=None). Try
setting 
center to zero when you create the Graph object. If you could give a cut down
example it would be easier to help.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Dec  7 11:34:11 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 11:34:11 +0000
Subject: [Biopython-dev] Biopython git access for Cymon
Message-ID: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>

Dear all,

It is a little overdue, but I'm pleased to announce Cymon Cox
now has write access to the Biopython repository.

Cymon has made contributions to Biopython over many years,
initially with the modules Bio.Nexus and Bio.Sequencing
(together with Frank Kauff), and more recently with
improvements to our BioSQL wrappers (especially on
PostgreSQL) and his recent work on alignment wrappers.

I'd previously talked to Cymon about giving him CVS access,
and he said we might as well wait until after the git transition.
I've just checked in a few patches on his behalf (alignment tool
wrappers), which served to remind me of this - it would have
saved me some work to just say "Yes, please check that in" ;)

On behalf of the Biopython project, welcome (fully) to the
development team Cymon, and thanks again for all your
work to date.

Regards,

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 11:38:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:38:27 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071138.nB7BcROx026201@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #3 from Daniel.Nicorici at gmail.com  2009-12-07 06:38 EST -------
(In reply to comment #2)
> I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
> having the x-axis line at the middle y-value (center or centre=None). Try
> setting 
> center to zero when you create the Graph object. If you could give a cut down
> example it would be easier to help.

Yes, I am referring to GenomeDiagram.

If one sets the center to zero then the lower half of the track (below the
x-axis) is always empty and unused when all values are positive, e.g. CG
content, short-read coverage have positive values.

This feature allows one to use the entire track for plotting and not only half
of it when setting center to zero is used.

Best Regards,
Daniel

> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 11:48:32 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:48:32 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071148.nB7BmW8A026423@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #4 from Daniel.Nicorici at gmail.com  2009-12-07 06:48 EST -------
Here is the cut down example of what I mean:

=====================================================
import Bio.SeqFeature
import Bio.Graphics.GenomeDiagram
import random

gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram')


gdt_features=gdd.new_track(1)
gds_features=gdt_features.new_set()


feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None)
gds_features.add_feature(feature,name="Forward",label=True)

feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1)
gds_features.add_feature(feature,name="Forward",label=True)

# Add graph
gdt_features=gdd.new_track(2)
gds_features=gdt_features.new_set('graph')
# generate some random values for plotting
coverage=[]
coverage.append((50,float(0)))
coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)])
coverage.append((100,float(0)))
coverage.append((250,float(0)))
coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)])
coverage.append((400,float(0)))
gds_features.new_graph(coverage, 'coverage', style='bar',center=0)

gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500)
gdd.write("Test_gaph.pdf","pdf")
===========================================

The values which are plotted here in this are in range 0 to 400 and the
GenomeDiagram's y-axis range is from -400 to 400 when center is set to 0. It is
really odd choice of a y-axis range of -n to n when all the values which are to
be plotted are in range 0 to n.

The feature proposed here allows the entire track to be used instead of using
half of the track and also having a better range for y-axis.


(In reply to comment #2)
> I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to
> having the x-axis line at the middle y-value (center or centre=None). Try
> setting 
> center to zero when you create the Graph object. If you could give a cut down
> example it would be easier to help.
> 
> Peter
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 11:59:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 06:59:33 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071159.nB7BxXs5026717@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
            Summary|placing x-axis of graph     |placing x-axis of graph
                   |track at the bottom or top  |track at the bottom or top
                   |of the track                |of the track in
                   |                            |GenomeDiagram


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-07 06:59 EST -------
When I wrote comment 2, I hadn't seen comment 1 with the github link and
examples.

Leighton and I had (some time ago now) chatted about a related enhancement
allowing the user to give the y-limits. With than in mind, it makes sense to
give the x-axis vertical position in terms of a y-coordinate (rather than a few
limited options like top, middle and bottom). This would be more flexible.

Marking this as an enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chapmanb at 50mail.com  Mon Dec  7 12:12:45 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 7 Dec 2009 07:12:45 -0500
Subject: [Biopython-dev] Biopython git access for Cymon
In-Reply-To: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>
References: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com>
Message-ID: <20091207121245.GM51407@sobchak.mgh.harvard.edu>

Hi all;

> It is a little overdue, but I'm pleased to announce Cymon Cox
> now has write access to the Biopython repository.
> 
> Cymon has made contributions to Biopython over many years,
> initially with the modules Bio.Nexus and Bio.Sequencing
> (together with Frank Kauff), and more recently with
> improvements to our BioSQL wrappers (especially on
> PostgreSQL) and his recent work on alignment wrappers.

Awesome. Congrats Cymon and thanks for all your excellent work. Well
deserved.

Brad


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 12:15:03 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 07:15:03 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071215.nB7CF3pE027513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #6 from Daniel.Nicorici at gmail.com  2009-12-07 07:15 EST -------


(In reply to comment #5)
> When I wrote comment 2, I hadn't seen comment 1 with the github link and
> examples.
> 

;-)

> Leighton and I had (some time ago now) chatted about a related enhancement
> allowing the user to give the y-limits.

I think that it is need enhancement. Let's see if others think that same! ;-)

> With than in mind, it makes sense to
> give the x-axis vertical position in terms of a y-coordinate (rather than a few
> limited options like top, middle and bottom). This would be more flexible.

This sounds good and I agree that it is more flexible.

Indeed that options like "top, middle, bottom" are limited but still the
scaling is done automatically and the user does not have to know in what range
are his/her values are and what are the minimum and maximum and what axis
position matches all the graphs which he/she wants to generate.

I am sure that this can be done better than I did it.

> 
> Marking this as an enhancement.

Ok.

Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 13:03:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:03:14 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071303.nB7D3Esa029362@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #7 from lpritc at scri.sari.ac.uk  2009-12-07 08:03 EST -------
(In reply to comment #6)
> 
> (In reply to comment #5)
> > Leighton and I had (some time ago now) chatted about a related enhancement
> > allowing the user to give the y-limits.
> 
> I think that it is need enhancement. Let's see if others think that same! ;-)

Oh, it definitely does! ;)  Thank you for taking the time to improve it.

> > With than in mind, it makes sense to
> > give the x-axis vertical position in terms of a y-coordinate (rather than a few
> > limited options like top, middle and bottom). This would be more flexible.
> 
> This sounds good and I agree that it is more flexible.

This is my preferred option.

> Indeed that options like "top, middle, bottom" are limited but still the
> scaling is done automatically and the user does not have to know in what range
> are his/her values are and what are the minimum and maximum and what axis
> position matches all the graphs which he/she wants to generate.
> 
> I am sure that this can be done better than I did it.

By allowing the position of the axis to take any value within the data range,
this still allows 'top', 'middle' and 'bottom' to be defined as functions of
the data with, e.g.

x_axis_pos = min(data)          # bottom
x_axis_pos = max(data)         # middle
x_axis_pos = median(data)   # top

and also allows for explicit placing of the axis at specified points on the
y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.)

Cheers,

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 13:05:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:05:11 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071305.nB7D5B22029508@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #8 from lpritc at scri.sari.ac.uk  2009-12-07 08:05 EST -------
(In reply to comment #7)

> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # middle
> x_axis_pos = median(data)   # top

x_axis_pos = min(data)          # bottom
x_axis_pos = max(data)         # top
x_axis_pos = median(data)   # middle

D'oh!

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 13:25:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 08:25:29 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071325.nB7DPTSH030274@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #9 from Daniel.Nicorici at gmail.com  2009-12-07 08:25 EST -------

(In reply to comment #8)

Ok.

> (In reply to comment #7)
> 
> > x_axis_pos = min(data)          # bottom
> > x_axis_pos = max(data)         # middle
> > x_axis_pos = median(data)   # top
> 
> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # top
> x_axis_pos = median(data)   # middle
> 
> D'oh!
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Dec  7 13:28:10 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 13:28:10 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
Message-ID: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>

Hi all,

I would like us to do the Biopython 1.53 release this month.

We still have lots of new stuff that hasn't yet landed on the
trunk, but despite that, looking at the NEWS file we have
had plenty of improvements in the two months and a bit
since Biopython 1.52 was released:

http://biopython.open-bio.org/SRC/biopython/NEWS
http://github.com/biopython/biopython/blob/master/NEWS

One good reason for doing Biopython 1.53 soon is the
NCBI said they plan to start using the new Jan 2010 DTD
files for MedLine/PubMed as early as mid December:
http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html

Any comments on how things stand on the trunk - is there
anything people think needs to be fixed before the release?

Thanks,

Peter


From eric.talevich at gmail.com  Mon Dec  7 16:33:30 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 7 Dec 2009 11:33:30 -0500
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
Message-ID: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>

On Mon, Dec 7, 2009 at 8:28 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Hi all,
>
> I would like us to do the Biopython 1.53 release this month.
>
> We still have lots of new stuff that hasn't yet landed on the
> trunk, but despite that, looking at the NEWS file we have
> had plenty of improvements in the two months and a bit
> since Biopython 1.52 was released:
>
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://github.com/biopython/biopython/blob/master/NEWS
>
> One good reason for doing Biopython 1.53 soon is the
> NCBI said they plan to start using the new Jan 2010 DTD
> files for MedLine/PubMed as early as mid December:
> http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html
>
> Any comments on how things stand on the trunk - is there
> anything people think needs to be fixed before the release?
>
>
I'll chime in about the status of the Summer of Code stuff.

For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
the TreeIO API will work independently of file formats. For Bio.Tree, I'm
about halfway done porting the Nexus tree methods, though it'll go faster
now that the semester's over. (I'll post the details and ask for a code
review soon.)

My phyloxml branch won't be ready to land in time for a December release,
but merging it into the trunk right after that is feasible. That would
everyone time to try it out and suggest changes before Biopython 1.54
cements the API.

Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
touched since Aug. 19. It will need some love before it can be merged into
the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
rescue it after TreeIO lands?

Cheers,
Eric


From biopython at maubp.freeserve.co.uk  Mon Dec  7 16:48:34 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Dec 2009 16:48:34 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
Message-ID: <320fb6e00912070848i4153ee33w9df5c7df65a4c225@mail.gmail.com>

On Mon, Dec 7, 2009 at 4:33 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>
> I'll chime in about the status of the Summer of Code stuff.

Thanks

> For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
> and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
> the TreeIO API will work independently of file formats. For Bio.Tree, I'm
> about halfway done porting the Nexus tree methods, though it'll go faster
> now that the semester's over. (I'll post the details and ask for a code
> review soon.)
>
> My phyloxml branch won't be ready to land in time for a December release,
> but merging it into the trunk right after that is feasible. That would
> everyone time to try it out and suggest changes before Biopython 1.54
> cements the API.

That is what I was hoping for. Fingers crossed Tiago will be able to
spare some time to go over the basics of the phyloxml and TreeIO
work - more eyes on the code would be great.

> Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
> touched since Aug. 19. It will need some love before it can be merged into
> the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
> rescue it after TreeIO lands?

That sounds good as a tentative plan - Nick may want to be more
involved, but you would be the next logical choice to handle this.

Cheers,

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Dec  7 18:56:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Dec 2009 13:56:20 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912071856.nB7IuKI7007552@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #10 from Daniel.Nicorici at gmail.com  2009-12-07 13:56 EST -------


(In reply to comment #7)
> (In reply to comment #6)
> > 
> > (In reply to comment #5)
> > > Leighton and I had (some time ago now) chatted about a related enhancement
> > > allowing the user to give the y-limits.
> > 
> > I think that it is need enhancement. Let's see if others think that same! ;-)
> 
> Oh, it definitely does! ;)  Thank you for taking the time to improve it.
> 
> > > With than in mind, it makes sense to
> > > give the x-axis vertical position in terms of a y-coordinate (rather than a few
> > > limited options like top, middle and bottom). This would be more flexible.
> > 
> > This sounds good and I agree that it is more flexible.
> 
> This is my preferred option.
> 
> > Indeed that options like "top, middle, bottom" are limited but still the
> > scaling is done automatically and the user does not have to know in what range
> > are his/her values are and what are the minimum and maximum and what axis
> > position matches all the graphs which he/she wants to generate.
> > 
> > I am sure that this can be done better than I did it.
> 
> By allowing the position of the axis to take any value within the data range,
> this still allows 'top', 'middle' and 'bottom' to be defined as functions of
> the data with, e.g.
> 
> x_axis_pos = min(data)          # bottom
> x_axis_pos = max(data)         # middle
> x_axis_pos = median(data)   # top
> 
> and also allows for explicit placing of the axis at specified points on the
> y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.)
>

It looks a little bit confusing too me now because I see that there are two
sides of the problem (or two bugs?), as following:
1) drawing a line orthogonal on y-axis at any position which represents the
x-axis (this does not affect how the values are plotted and in what interval)
2) in the case of bar plotting (partially affects also linear plotting), the
values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
y=-inf...+inf) unless the user specify something else and not to be drawn by
default from some arbitrary point, e.g. median, mean, etc., as it is done now. 

I have the feeling that the solution presented here affects only the point 1)
and not 2).

Please, could you elaborate more such that maybe I could implement your
suggestion?

BR,
Daniel

> Cheers,
> 
> L.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 08:49:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 03:49:59 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912080849.nB88nx00030750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #11 from lpritc at scri.sari.ac.uk  2009-12-08 03:49 EST -------
(In reply to comment #10)

> It looks a little bit confusing too me now because I see that there are two
> sides of the problem (or two bugs?), as following:
> 1) drawing a line orthogonal on y-axis at any position which represents the
> x-axis (this does not affect how the values are plotted and in what interval)
> 2) in the case of bar plotting (partially affects also linear plotting), the
> values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
> y=-inf...+inf) unless the user specify something else and not to be drawn by
> default from some arbitrary point, e.g. median, mean, etc., as it is done now. 
> 
> I have the feeling that the solution presented here affects only the point 1)
> and not 2).
> 
> Please, could you elaborate more such that maybe I could implement your
> suggestion?

I see why you've distinguished between the two cases, but I think they can be
handled by the earlier suggestion to implement the location of the x-axis in
the context of also allowing the user to set y-axis limits (see comment #5). 
It's the combination of allowing y-axis limits and the location of x-axis
crossing that gives the greatest flexibility.  For example, if y-limit
selection and x-axis crossing point were under user control...

...if you wanted to continue with the current behaviour, you'd not set any
y-limits, and not specify the location of the x-axis.

...if you wanted to draw short read coverage, you'd set the lower y-limit to 0,
and set the location of the x-axis to zero (if that was not the default).  This
should draw bars with their bases on the bottom/inner of the track, and the
scale running along the bottom/inner of the track. 

...if you wanted to represent some data as a bar graph, with a special meaning
for the mean (or median) value, you could optionally set y-limits, but have the
x-axis cross at mean(data) or median(data).  This should draw bars with their
bases on the x-axis, and the axis located at the mean/median value for the
data.

Does this help clarify what I meant, above?

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chapmanb at 50mail.com  Tue Dec  8 13:33:12 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 8 Dec 2009 08:33:12 -0500
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
Message-ID: <20091208133312.GE74538@sobchak.mgh.harvard.edu>

Peter and Michiel;
Thanks for the thoughts. Tried to combine these below:

Michiel:
> I didn't realize that the GFF parser returns SeqRecords. I agree with
> Peter that a parser returning SeqRecords should be accessed through
> Bio.SeqIO, while a lower-level parser can live in Bio.GFF.

Peter:
> My point is the moment you include GFF -> SeqRecord
> code (even if not explicitly via the Bio.SeqIO namespace)
> it opens us up to people giving these SeqRecord objects
> to SeqIO for output (e.g. as GenBank).
[...]
> Worth goals, but if by "Produce Biopython objects from
> GFF3/GTF/GFF2 files" you mean SeqRecords with
> SeqFeatures, (as I said above) we are opening up the
> GFF to GenBank can of worms. There is no "later" :(

We seem to have a very different view of SeqRecords/SeqFeatures. To
me, they are a convenient well thought out object model to capture
annotations and features associated with a sequence. They have the
advantage that people who have used Biopython will be familiar with
the object model. That's why I chose to use them for representing GFF,
as opposed to a GFF specific class.

You are adding on two extra conditions:

- If something produces SeqRecords, it needs to come from SeqIO.
- If you have a SeqRecord, it has to be compatible with GenBank
  output.

This quickly ties us up to the not-that-great GenBank way of
representing features and locations, and makes it hard to add on more
flexible formats like GFF. Converting between very different feature
representations is going to be complex and a whole new problem; 
why do you have to support that to use a SeqRecord in your code?

Overall, I'd like to see it be simpler for people to contribute and
add parsers to Biopython.

> I still think it would be useful to have Bio/GFF/Parser.py (or
> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
> similar) to turn this into SeqRecord and SeqFeature objects.

This appears to be about where the code lives. Personally, I prefer
having things under the GFF namespace and then building thin
wrappers around if in SeqIO if desired. Practically, I want to leave
SeqIO inclusion out right now and try to argue only for getting the
GFF specific parser in.

> The nested features that worry me. Perhaps the existing
> location operator (e.g. "join") could be set to something
> like "parent/child" if the subfeatures is used to hold child
> features rather than the elements of a join? We need
> the GenBank output code etc to be able to tell these
> apart reliably.

Right now I don't set the location operator at all. The parent/child
model is much more flexible than the GenBank operator stuff, so
maybe the right way to go is to phase out using the operator at all.
If it is set to nothing than parent/child is assumed, and GenBank
output can add in all of the operators at output time.

Brad


From chapmanb at 50mail.com  Tue Dec  8 14:03:54 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 8 Dec 2009 09:03:54 -0500
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
Message-ID: <20091208140354.GG74538@sobchak.mgh.harvard.edu>

Hi Eric;

> I'll chime in about the status of the Summer of Code stuff.
> 
> For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees
> and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so
> the TreeIO API will work independently of file formats. For Bio.Tree, I'm
> about halfway done porting the Nexus tree methods, though it'll go faster
> now that the semester's over. (I'll post the details and ask for a code
> review soon.)
> 
> My phyloxml branch won't be ready to land in time for a December release,
> but merging it into the trunk right after that is feasible. That would
> everyone time to try it out and suggest changes before Biopython 1.54
> cements the API.

This sounds awesome. Thanks for keeping up with the code; looking
forward to seeing it get in to the main branch.

> Separately: GitHub says Nick Matzke's BioGeography branch hasn't been
> touched since Aug. 19. It will need some love before it can be merged into
> the trunk. Is there a plan for this, Peter or Brad? If not, should I try to
> rescue it after TreeIO lands?

No plan from my end; hopefully Nick will chime in. If Nick doesn't
have time, it would be beyond great if you could finalize and merge the
most useful parts. Thanks for volunteering on this.

Brad


From biopython at maubp.freeserve.co.uk  Tue Dec  8 14:15:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 14:15:30 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <20091208133312.GE74538@sobchak.mgh.harvard.edu>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>

On Tue, Dec 8, 2009 at 1:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> We seem to have a very different view of SeqRecords/SeqFeatures. To
> me, they are a convenient well thought out object model to capture
> annotations and features associated with a sequence. They have the
> advantage that people who have used Biopython will be familiar with
> the object model. That's why I chose to use them for representing GFF,
> as opposed to a GFF specific class.

OK, but (as I expand on below), your planned use of the SeqFeature
(while legitimate) appears to risk being inconsistent with existing parts
of the Biopython code base (in particular, GenBank output, and maybe
GenomeDiagram).

> You are adding on two extra conditions:
>
> - If something produces SeqRecords, it needs to come from SeqIO.

It was more of an aim than a rule. Isn't true of all the existing code for
historical reasons, e.g. Bio.SeqIO "genbank" support acts as a thin
wrapper to Bio.GenBank which does offer SeqRecord objects. For
a user perspective, if you want a SeqRecord from a sequence file,
the first point of call should be Bio.SeqIO.

> - If you have a SeqRecord, it has to be compatible with GenBank
> ?output.
>
> This quickly ties us up to the not-that-great GenBank way of
> representing features and locations, and makes it hard to add on more
> flexible formats like GFF. Converting between very different feature
> representations is going to be complex and a whole new problem;
> why do you have to support that to use a SeqRecord in your code?

The big aim of Bio.SeqIO was to allow using many different file
formats with the same object representation. Implicitly (assuming
the required data is present), input from one file format could be
output in another format. The problem lots of current code in
Biopython uses SeqRecord/SeqFeatures in a particular way
(GenBank/EMBL parsers, GenomeDiagram, GenBank output).
Unfortunately, for GFF files it seems this isn't the most natural
way to use SeqFeature objects (where you need real nesting).

> Overall, I'd like to see it be simpler for people to contribute and
> add parsers to Biopython.

I hope that for simple file formats this already the case. But for
annotation rich file formats, if we want SeqIO to continue to be
useful for conversion, this by neccessity requires some
awareness of how the other parsers/writers will represent
the same data.

One option for contributions is to offer a "low level" parser
using basic Python datatypes or simple file-type specific
records. Then someone more familiar with SeqIO and the
other file formats can write a SeqRecord converter in order
to integrate it into Bio.SeqIO.  This is basically how Ace,
Phred, SwissProt (and probably others) were done.

>> I still think it would be useful to have Bio/GFF/Parser.py (or
>> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or
>> similar) to turn this into SeqRecord and SeqFeature objects.
>
> This appears to be about where the code lives. Personally, I prefer
> having things under the GFF namespace and then building thin
> wrappers around if in SeqIO if desired. Practically, I want to leave
> SeqIO inclusion out right now and try to argue only for getting the
> GFF specific parser in.

Where the code lives isn't a big issue. You can do a thin
wrapper in Bio.SeqIO calling Bio.GFF (where Bio.GFF makes
SeqRecords), or a fat wrapper (where Bio.GFF does not make
SeqRecords).

The problem (as I see it) is SeqIO integration and how your
desired use of SeqFeatures will impact this.

>> The nested features that worry me. Perhaps the existing
>> location operator (e.g. "join") could be set to something
>> like "parent/child" if the subfeatures is used to hold child
>> features rather than the elements of a join? We need
>> the GenBank output code etc to be able to tell these
>> apart reliably.
>
> Right now I don't set the location operator at all. The parent/child
> model is much more flexible than the GenBank operator stuff, so
> maybe the right way to go is to phase out using the operator at all.
> If it is set to nothing than parent/child is assumed, and GenBank
> output can add in all of the operators at output time.

I agree that using SeqFeature sub-features for parent/child
relationships makes a lot of sense. BUT, we have a lot of
existing code which follows the GenBank/EMBL parser
route of using this for joins (and a few other corner cases).

There are other annoyances with the current SeqFeature
and FeatureLocation model - the strand and location operator
are part of the SeqFeature not the FeatureLocation. It would
make more sense to me to move them to the FeatureLocation
(and have that handle joins itself). Or, move everything to
the SeqFeature (and get rid of the FeatureLocation object).

I think the best route forward is to plan a transition of the
SeqFeature object to allow nice handling of real nested
relationships, and a reworking of complex location handling.
Then (hopefully) we can have the GenBank/EMBL/GFF3
parsers all using the SeqFeature in a consistent way.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 16:56:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 11:56:17 -0500
Subject: [Biopython-dev] [Bug 2965] New: Updating Bio.Restriction with
	latest REBASE data
Message-ID: <bug-2965-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965

           Summary: Updating Bio.Restriction with latest REBASE data
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The Bio/Restriction/Restriction_Dictionary.py file hasn't been updated since
2004.

The latest REBASE restriction digest files seem to be from Nov 29 2009,
ftp://ftp.neb.com/pub/rebase/

This bug is to update Restriction_Dictionary.py to use the Nov 2009 data. I
have tried and failed as described below:

----------------------------------------------------------------------------

I manually downloading these files to the Scripts/Restriction directory:
ftp://ftp.neb.com/pub/rebase/emboss_e.912
ftp://ftp.neb.com/pub/rebase/emboss_r.912
ftp://ftp.neb.com/pub/rebase/emboss_s.912

And then ran ranacompiler.py which generated a new Restriction_Dictionary.py

As an aside, module sre is deprecate, re is suggested instead. Other
interesting output:

WARNING : HaeIV cut twice with different overhang length each time.            
        Unable to deal with this behaviour.             
        This enzyme will not be included in the database. Sorry.
        Checking : Anyway, HaeIV is not commercially available.


WARNING : TaqII has two different sites.


The new database contains 753 enzymes.

So far so good, but using the new Restriction_Dictionary.py the unit tests
fail:

$ python test_Restriction.py
Traceback (most recent call last):
  File "test_Restriction.py", line 6, in <module>
    from Bio.Restriction import *
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/__init__.py",
line 61, in <module>
    from Bio.Restriction.Restriction import *
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py",
line 2358, in <module>
    newenz = T(k, bases, enzymedict[k])
  File
"/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py",
line 216, in __init__
    cls.compsite = re.compile(cls.compsite)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 188, in compile
    return _compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py",
line 241, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character in group name


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 17:02:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 12:02:42 -0500
Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest
	REBASE data
In-Reply-To: <bug-2965-42@http.bugzilla.open-bio.org/>
Message-ID: <200912081702.nB8H2g4b014553@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-08 12:02 EST -------
To be more precise, running Bio/Restriction/Restriction.py in IDLE and looking
at the stack track, the regular expression failing is for enzyme CviKI-1,

(?P<CviKI-1>[AG]GC[CT])|(?P<CviKI-1_as>[AG]GC[CT])

The problem seems to be the hyphen/minus sign in the enzyme name which is
being used as a group name in the regular expression. I think this is the
only Enzyme with this name. Since it can't be used as a python name either,
we should probably map it to an underscore:

>>> import re
>>> re.compile('(?P<CviKI\-1>[AG]GC[CT])|(?P<CviKI\-1_as>[AG]GC[CT])')
...
error: bad character in group name
>>> re.compile('(?P<CviKI_1>[AG]GC[CT])|(?P<CviKI_1_as>[AG]GC[CT])')
<_sre.SRE_Pattern object at 0xe8d700>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  8 17:50:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Dec 2009 12:50:29 -0500
Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest
	REBASE data
In-Reply-To: <bug-2965-42@http.bugzilla.open-bio.org/>
Message-ID: <200912081750.nB8HoTDW016476@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2965


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-08 12:50 EST -------
Fixed by mapping hyphen to an underscore.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From kellrott at gmail.com  Tue Dec  8 22:00:11 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Tue, 8 Dec 2009 14:00:11 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <20091208140354.GG74538@sobchak.mgh.harvard.edu>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
Message-ID: <bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>

Speaking of stuff that may not be ready for 1.53, but should start speeding
up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the
Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting
in the jython branch, but I can spin it into a separate branch).
Right now it's missing the code to parse HMMER2, there needs to be more
extensive unit testing, and the API needs to be nailed down with some
documentation.
Is there anybody else that needs HMMER and Pfam support?

Kyle


From biopython at maubp.freeserve.co.uk  Tue Dec  8 22:18:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:18:03 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
Message-ID: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:00 PM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> Speaking of stuff that may not be ready for 1.53, but should start speeding
> up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the
> Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting
> in the jython branch, but I can spin it into a separate branch).
> Right now it's missing the code to parse HMMER2, there needs to be more
> extensive unit testing, and the API needs to be nailed down with some
> documentation.
> Is there anybody else that needs HMMER and Pfam support?
>
> Kyle

That had caught my eye, and it is potentially of direct interest to
me personally. I will probably skip HMMER2 and go straight to
HMMER3 though ;)

On a related point, I am reasonably confident we can get most
of Biopython running on Jython 2.5.1 in time for the release.
Other than things that Jython doesn't support at all, i.e. the C
code, DTD parsing (needed for Bio.Entrez), and the lack of a
buffer function (not important, only used in deprecated code
now), the only remaining hurdle is Bio.Restriction, and I think
I have solved that. I will be testing this tomorrow (time
permitting). Your groundwork has been very useful here Kyle.

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec  8 22:30:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:30:20 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
	<320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
Message-ID: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>

On Tue, Dec 8, 2009 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> I agree that using SeqFeature sub-features for parent/child
> relationships makes a lot of sense. BUT, we have a lot of
> existing code which follows the GenBank/EMBL parser
> route of using this for joins (and a few other corner cases).
>
> There are other annoyances with the current SeqFeature
> and FeatureLocation model - the strand and location operator
> are part of the SeqFeature not the FeatureLocation. It would
> make more sense to me to move them to the FeatureLocation
> (and have that handle joins itself). Or, move everything to
> the SeqFeature (and get rid of the FeatureLocation object).
>
> I think the best route forward is to plan a transition of the
> SeqFeature object to allow nice handling of real nested
> relationships, and a reworking of complex location handling.
> Then (hopefully) we can have the GenBank/EMBL/GFF3
> parsers all using the SeqFeature in a consistent way.
>

Just to add some ideas to this thread for discussion,
on possible ways forward without breaking backwards
compatibility... hopefully this is clear, I did have a glass
of wine with dinner ;)

Given the way the existing SeqFeature list property
subfeatures is used (by the GenBank/EMBL parser
etc), would it make sense for the GFF needs to add
a new list for child features (say property "children"),
and perhaps another property (maybe "parent") which
can point back at the parent SeqFeature. i.e. A sort
of tree, allowing us to represent genes, exons, etc.

Note we may want to use weak references in the above
(children/parent references) to assist the python GC.

Given the above, potentially the GenBank/EMBL
parser could be enhanced to use these new properties
(e.g. for linking gene and CDS features in bacteria,
or CDS and mat_peptide features in viruses etc).

[This still leaves the ontology issues - which might
be best dealt with by the GenBank output code]

Peter


From kellrott at gmail.com  Tue Dec  8 22:42:54 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Tue, 8 Dec 2009 14:42:54 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
Message-ID: <bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>

>
> On a related point, I am reasonably confident we can get most
> of Biopython running on Jython 2.5.1 in time for the release.
> Other than things that Jython doesn't support at all, i.e. the C
> code, DTD parsing (needed for Bio.Entrez), and the lack of a
> buffer function (not important, only used in deprecated code
> now), the only remaining hurdle is Bio.Restriction, and I think
> I have solved that. I will be testing this tomorrow (time
> permitting).


The last bit for 'full' jython support is getting BioSQL working.
Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in
Jython.  My jython port also has work that moves the BioSQL interface from
the internal ORM to a SqlAlchemy interface.  Of course that is a little
controversial because it introduces a dependency on another python package.
Of course it takes care of sqlite and Java MySql connector support at the
same time, so it does have some pluses.

Kyle


From biopython at maubp.freeserve.co.uk  Tue Dec  8 22:46:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 22:46:19 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<bb02be080912081442j7dfb1c8hf909527883eeb4f4@mail.gmail.com>
Message-ID: <320fb6e00912081446w303edd73qe3a5dad964314487@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:42 PM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> The last bit for 'full' jython support is getting BioSQL working.
> Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in
> Jython.? My jython port also has work that moves the BioSQL interface from
> the internal ORM to a SqlAlchemy interface.? Of course that is a little
> controversial because it introduces a dependency on another python package.
> Of course it takes care of sqlite and Java MySql connector support at the
> same time, so it does have some pluses.

Fair point w.r.t. "full" jython support ;)

I would be more comfortable with BioSQL on Jython working
directly with sqlite (once we add that to BioSQL) and the Java
MySql connector directly (without the extra dependency on
SQLAlchemy).

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec  8 23:38:04 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Dec 2009 23:38:04 +0000
Subject: [Biopython-dev] Bio.GFF and Brad's code
In-Reply-To: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>
References: <20091202125744.GA46415@sobchak.mgh.harvard.edu>
	<317375.58712.qm@web62401.mail.re1.yahoo.com>
	<20091203142534.GF51407@sobchak.mgh.harvard.edu>
	<320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com>
	<20091204134010.GK51407@sobchak.mgh.harvard.edu>
	<320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com>
	<20091208133312.GE74538@sobchak.mgh.harvard.edu>
	<320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com>
	<320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com>
Message-ID: <320fb6e00912081538o635347ceh8e10aa4863e538e9@mail.gmail.com>

On Tue, Dec 8, 2009 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>
>> There are other annoyances with the current SeqFeature
>> and FeatureLocation model - the strand and location operator
>> are part of the SeqFeature not the FeatureLocation. It would
>> make more sense to me to move them to the FeatureLocation
>> (and have that handle joins itself). Or, move everything to
>> the SeqFeature (and get rid of the FeatureLocation object).
>>

In addition to the strand and location operator, there is also
(sometimes) a database cross reference (properties ref and
db_ref, e.g. in contig files). Again, this is conceptually part
of the feature location (and stored that way in BioSQL if I
recall correctly).

One example of where it would make sense to move things
like the database, operator and strand to the FeatureLocation
is the coded_by information in some GenPept file annotation,
a use case very recently raised on the main mailing list:
http://lists.open-bio.org/pipermail/biopython/2009-December/005910.html
The current FeatureLocation simply can't be used here -
although a full SeqFeature could be.

Peter


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 09:56:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 04:56:34 -0500
Subject: [Biopython-dev] [Bug 2966] New: Primer3Commandline does not use
	EMBOSS 6.1.0 arguments
Message-ID: <bug-2966-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966

           Summary: Primer3Commandline does not use EMBOSS 6.1.0 arguments
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


Several arguments for EMBOSS eprimer3 are different in version 6.1.0 from those
used in Primer3Commandline.  I have updated Primer3Commandline locally (and
added documentation strings), and will make this available via github with some
other proposed changes shortly, after talking to Peter.

This revealed another bug, which I will submit separately.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 10:07:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:07:14 -0500
Subject: [Biopython-dev] [Bug 2967] New: AbstractCommandline silently
	accepts invalid parameter options
Message-ID: <bug-2967-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967

           Summary: AbstractCommandline silently accepts invalid parameter
                    options
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


While investigating Bug 2996 I noticed that AbstractCommandline was silently
accepting invalid parameter options when passed by setting attributes.  For
example:

    cline = Primer3Commandline(bogus=True)
    cline.sequence = filename

raises the appropriate ValueError, as the parameter name 'bogus' is being
compared to the self.parameters list when setting, and is found not to be
valid.  However, the following code:

    cline = Primer3Commandline()
    cline.sequence = filename
    cline.bogus = True    # Invalid argument not flagged up
    cline.sequnce = True  # Mistyped argument not flagged up


silently sets the invalid cline.bogus and cline.sequnce attributes without
warning.  Parameters set via attribute are not validated with the
setter/getters defined for the properties in AbstractCommandline.__init__  This
could (did!) lead the user to think that parameters are set when they are not,
under at least two circumstances:

1) Typos in the parameter name
2) Using a parameter unsupported by the interface (see Bug 2996).

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 10:08:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:08:12 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091008.nB9A8Cc5008147@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 05:08 EST -------
Sorry, I'm referring to bug 2966 in the post above.  My bad.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 10:46:11 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 05:46:11 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091046.nB9AkBXi009268@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 05:46 EST -------
(In reply to comment #0)
> While investigating Bug 2996 I noticed that AbstractCommandline was silently
> accepting invalid parameter options when passed by setting attributes.  For
> example:
> 
>     cline = Primer3Commandline(bogus=True)
>     cline.sequence = filename
> 
> raises the appropriate ValueError, as the parameter name 'bogus' is being
> compared to the self.parameters list when setting, and is found not to be
> valid.  However, the following code:
> 
>     cline = Primer3Commandline()
>     cline.sequence = filename
>     cline.bogus = True    # Invalid argument not flagged up
>     cline.sequnce = True  # Mistyped argument not flagged up
> 
> 
> silently sets the invalid cline.bogus and cline.sequnce attributes without
> warning.  Parameters set via attribute are not validated with the
> setter/getters defined for the properties in AbstractCommandline.__init__
> This could (did!) lead the user to think that parameters are set when they
> are not, under at least two circumstances:
> 
> 1) Typos in the parameter name
> 2) Using a parameter unsupported by the interface

This is normal Python object behaviour - you can add any "property" like this
at run time,

>>> class Dummy(object) :
...     pass
... 
>>> d = Dummy()
>>> d.name = "Fred"
>>> dir(d)
['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',
'__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__str__', '__weakref__', 'name']
>>> d.name
'Fred'

We might still be able to block this via __setattr__, this needs some
experimentation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 12:23:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:23:34 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091223.nB9CNYtT012354@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #3 from lpritc at scri.sari.ac.uk  2009-12-09 07:23 EST -------
(In reply to comment #2)

> This is normal Python object behaviour - you can add any "property" like this
> at run time,

[...]

Oddly enough, I was already aware of that... ;)

The issue is that the setting of parameters via attributes fails silently, but
is demonstrated in the tutorial and is in any case often rather more convenient
than declaring the parameters on instantiation, so is very likely to be used in
anger.  This potentially (and *actually* in my case, when attempting to use
EMBOSS 6.1.0 parameter names with eprimer3) leads to cases where the user might
expect that command-line options have been set, when they in fact haven't.  

> We might still be able to block this via __setattr__, this needs some
> experimentation.

That seems to be the best route to me, initially.  It might be worth removing
the property magic in the AbstractCommandline.__init__(), and instead use
__setattr__, __getattr__, and __delattr__, having them behave appropriately for
known parameter names.

I'll have a go at doing that and put it in with the EMBOSS stuff I'm working
on, just now.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 12:28:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:28:07 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091228.nB9CS7vS012457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-12-09 07:28 EST -------
(In reply to comment #3)
> (In reply to comment #2)
>
> > We might still be able to block this via __setattr__, this needs some
> > experimentation.
> 
> That seems to be the best route to me, initially.  It might be worth removing
> the property magic in the AbstractCommandline.__init__(), and instead use
> __setattr__, __getattr__, and __delattr__, having them behave appropriately for
> known parameter names.
> 
> I'll have a go at doing that and put it in with the EMBOSS stuff I'm working
> on, just now.

Peter has pointed out that he'd like to retain discoverability, and so restrict
the change to a validating __setattr__ - which seems reasonable.

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 12:53:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 07:53:00 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091253.nB9Cr0cP013048@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


------- Comment #5 from lpritc at scri.sari.ac.uk  2009-12-09 07:53 EST -------
This works for me, at the moment:

    def __setattr__(self, name, value):
        """ Workaround for a user interface issue.  Without this __setattr__
            attribute-based assignment of parameters will silently accept
invalid 
            parameters, leading to known instances of the user assuming that 
            parameters for the application are set, when they are not.
            This workaround uses a whitelist of object attributes, and 
            sets the object attribute list as normal, for these.  Other
            attributes are assumed to be parameters, and passed to the 
            self.set_parameter method for validation and assignment.
        """
        attr_whitelist = ['parameters', 'program_name']     # Allowed
attributes
        if name not in attr_whitelist:       # If not in whitelist, treat
            self.set_parameter(name, value)  # as parameter
        else:
            self.__dict__[name] = value


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Dec  9 13:21:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 13:21:50 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
Message-ID: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>

On Tue, Dec 8, 2009 at 10:18 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> On a related point, I am reasonably confident we can get most
> of Biopython running on Jython 2.5.1 in time for the release.
> Other than things that Jython doesn't support at all, i.e. the C
> code, DTD parsing (needed for Bio.Entrez), and the lack of a
> buffer function (not important, only used in deprecated code
> now), the only remaining hurdle is Bio.Restriction, and I think
> I have solved that. I will be testing this tomorrow (time
> permitting). Your groundwork has been very useful here Kyle.

I'm stuck again with Bio.Restriction under Jython. I've got the
Bio/Restriction/Restriction_Dictionary.py to load under Jython
(just = the Nov 2009 update isn't helping to keep the code
size down), but doing test_Restriction.py hits the JVM limit.

Furthermore, there is a little bit of C code in Bio.Restriction
(which I think we can replace with plain python).

Peter


From biopython at maubp.freeserve.co.uk  Wed Dec  9 14:18:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 14:18:19 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
Message-ID: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>

On Wed, Dec 9, 2009 at 1:21 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Furthermore, there is a little bit of C code in Bio.Restriction
> (which I think we can replace with plain python).
>

I've replaced the C module Bio.Restriction.DNAUtils with
Python code, and deprecated it. I am surprised it was
written in C in the first place!

Peter


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 15:04:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 10:04:10 -0500
Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts
	invalid parameter options
In-Reply-To: <bug-2967-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091504.nB9F4AUM017626@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2967


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 10:04 EST -------
Fix committed - almost as is, I also added a doctest.

Thanks!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Dec  9 15:57:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 15:57:20 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
Message-ID: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>

Good news:

I've tweaked the RestrictionCompiler.py code to modify how it generates
Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries
incrementally. Together with the removal of the C code DNAUtils, this
means (after a clean install) that Jython likes Bio.Restriction and that
test_Restiction.py passes on Jython 2.5.1 (and C Python too).

Bad news:

I think I have broken test_CAPS.py (under both Jython and Python).
It looks like it hits some bits of Bio.Restriction are not covered by
test_Restiction.py

I'm working on it still ...

Peter


From biopython at maubp.freeserve.co.uk  Wed Dec  9 16:25:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Dec 2009 16:25:28 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
	<320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
Message-ID: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>

On Wed, Dec 9, 2009 at 3:57 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Good news:
>
> I've tweaked the RestrictionCompiler.py code to modify how it generates
> Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries
> incrementally. Together with the removal of the C code DNAUtils, this
> means (after a clean install) that Jython likes Bio.Restriction and that
> test_Restiction.py passes on Jython 2.5.1 (and C Python too).
>
> Bad news:
>
> I think I have broken test_CAPS.py (under both Jython and Python).
> It looks like it hits some bits of Bio.Restriction are not covered by
> test_Restiction.py
>
> I'm working on it still ...

Solved: the check_bases function in Bio.Restriction also used to
make things uppercase (but the docstring didn't make this clear
and the C code was non-obvious).

I think this means the whole test suite passes on Jython 2.5.1
(barring those bits with C code dependencies, BioSQL, or the
known Jython issues with DTD passing or the missing buffer
function).

Kyle - could you confirm this on your machine please?

Thanks,

Peter


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 17:57:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 12:57:37 -0500
Subject: [Biopython-dev] [Bug 2968] New: Modifications to Emboss eprimer3
	parser and associated files
Message-ID: <bug-2968-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2968

           Summary: Modifications to Emboss eprimer3 parser and associated
                    files
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


The existing Emboss primer3/eprimer3 code has a couple of issues, and some
scope for improvement:

- The existing Primer3.py parser code can only parse output when eprimer3 is
applied to a single sequence.  When eprimer3 is applied to multiple sequence
input, it groups all primers for all sequences into a single record, which may
incorrectly associate primers with the wrong sequences in downstream analysis.
- The current parser lacks an iterator for iterating over multiple sequence
output
- The current parser creates 'ghost' primers for all primer pairs, with length
zero and sequence as an empty string; it does not do this for internal oligos. 
A more intuitive solution might be to return None for absent primers/oligos
- The current data model stores all primer data as individual attributes.  It
might be more useful to group the attributes of individual primers into their
natural associations

I have written new code for Emboss/Primer3.py that adds iterator/multiple
sequence parsing functionality to the parser, and extensively revises the
object model for the data.  The Record and Primers objects are retained, but
each primer/oligo is now represented by a Primer object that collects the
relevant data together.  The Record object has a new attribute that allows the
sequence to be recorded directly, rather than having to be parsed from the
comments attribute.  The new data model retains the old attribute-based access
for compatibility, but adds direct access to the Primer objects (where present)
by .forward, .reverse and .oligo attributes, and by keywords.

One change was required to the unit test, to account for the reporting of
absent primers as None, rather than having 'null' attributes.  I've added two
further test output files, which may be rather large for the distribution (60kb
total), and doctests that use these.

The code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0

This enhancement request also relates to bug 2966.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 17:59:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 12:59:14 -0500
Subject: [Biopython-dev] [Bug 2968] Modifications to Emboss eprimer3 parser
	and associated files
In-Reply-To: <bug-2968-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091759.nB9HxErQ022462@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2968


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 12:59 EST -------
I forgot to mention - the new code still passes the test_EmbossPrimer.py unit
test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 18:01:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 13:01:13 -0500
Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS
	6.1.0 arguments
In-Reply-To: <bug-2966-42@http.bugzilla.open-bio.org/>
Message-ID: <200912091801.nB9I1DMe022568@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-12-09 13:01 EST -------
I have made changes to Primer3Commandline that involve adding the EMBOSS 6.1.0
arguments, and docstrings to each argument.  I have also added doctests.

The proposed code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/9c0643e333b0cafb4e356426fb4902e0e9d2385c


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 18:03:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 13:03:30 -0500
Subject: [Biopython-dev] [Bug 2969] New: Addition of SeqmatchallCommandline
	to Emboss/Applications.py
Message-ID: <bug-2969-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969

           Summary: Addition of SeqmatchallCommandline to
                    Emboss/Applications.py
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


I thought it would be useful to have a command line wrapper to the EMBOSS
seqmatchall application, and have added this to Emboss/Applications.py, with
doctests.

The proposed code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/ced72a34b2565b97f3ad2c77a66e1083375cff02


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From kellrott at gmail.com  Wed Dec  9 19:22:01 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Wed, 9 Dec 2009 11:22:01 -0800
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
	<3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com>
	<20091208140354.GG74538@sobchak.mgh.harvard.edu>
	<bb02be080912081400x61d565c8wc2848606fc3496dd@mail.gmail.com>
	<320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com>
	<320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com>
	<320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com>
	<320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com>
	<320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com>
Message-ID: <bb02be080912091122j36c53354k32144b75ae0bb28e@mail.gmail.com>

> Kyle - could you confirm this on your machine please?
>

It looks like the master branch is working well.
I guess the next step will be looking into the zxJDBC to expand the BioSQL
ORM.
Intro can be found at: http://www.informit.com/articles/article.aspx?p=26143

Kyle


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 21:53:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 16:53:42 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092153.nB9LrgYN027652@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 16:53 EST -------
A nice easy one to wrap at first glance. I would like to also include the
"aformat" output to set the output alignment format (useful to set to pair or
simple for AlignIO to parse it as the "emboss" alignment format - see the
needle and water wrappers). You could then also add a run time test to
test_Emboss.py piping this to AlignIO... ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 22:42:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 17:42:26 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092242.nB9MgQS9028588@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #8 from chapmanb at 50mail.com  2009-12-09 17:42 EST -------
Great idea Peter -- happy to get this in. It's now on a branch here:

http://github.com/chapmanb/biopython/tree/biosql-sqlite

It would be excellent if you, Cymon or anyone else interested could review and
merge it in.

This also includes a small typo fix on Bio/SeqIO/InsdcIO.py which isn't really
related but came up when I was running the BioSQL tests.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec  9 23:51:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Dec 2009 18:51:14 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912092351.nB9NpESn030303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-09 18:51 EST -------
Hi Brad,

My only immediate comment is it might make sense to split the BioSQL tests in
two, one for SQLite which we can try and make 100% automatic (at least on
Python 2.5+), and one for a user specified back end (MySQL, PostreSQL etc)
which requires a username and password.

Its midnight here in the UK, so feel free to tweak things this evening your
time and I'll take full look tomorrow.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 11:12:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 06:12:36 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101112.nBABCaRr015734@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-12-10 06:12 EST -------
(In reply to comment #1)
> A nice easy one to wrap at first glance. I would like to also include the
> "aformat" output to set the output alignment format (useful to set to pair or
> simple for AlignIO to parse it as the "emboss" alignment format - see the
> needle and water wrappers). You could then also add a run time test to
> test_Emboss.py piping this to AlignIO... ;)

That shouldn't take too long to do (though probably won't get done by me this
week).  Do we want to set any particular policy for the sequence-associated and
outfile-associated arguments?  Their inclusion in the command-line wrappers is
pretty inconsistent, which is why I left them out in the first place.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 11:15:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 06:15:09 -0500
Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the
	bottom or top of the track in GenomeDiagram
In-Reply-To: <bug-2964-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101115.nBABF90t015907@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2964


------- Comment #12 from Daniel.Nicorici at gmail.com  2009-12-10 06:15 EST -------
(In reply to comment #11)
> (In reply to comment #10)
> 
> > It looks a little bit confusing too me now because I see that there are two
> > sides of the problem (or two bugs?), as following:
> > 1) drawing a line orthogonal on y-axis at any position which represents the
> > x-axis (this does not affect how the values are plotted and in what interval)
> > 2) in the case of bar plotting (partially affects also linear plotting), the
> > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and
> > y=-inf...+inf) unless the user specify something else and not to be drawn by
> > default from some arbitrary point, e.g. median, mean, etc., as it is done now. 
> > 
> > I have the feeling that the solution presented here affects only the point 1)
> > and not 2).
> > 
> > Please, could you elaborate more such that maybe I could implement your
> > suggestion?
> 
> I see why you've distinguished between the two cases, but I think they can be
> handled by the earlier suggestion to implement the location of the x-axis in
> the context of also allowing the user to set y-axis limits (see comment #5). 
> It's the combination of allowing y-axis limits and the location of x-axis
> crossing that gives the greatest flexibility.  For example, if y-limit
> selection and x-axis crossing point were under user control...
> 
> ...if you wanted to continue with the current behaviour, you'd not set any
> y-limits, and not specify the location of the x-axis.
> 
> ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0,
> and set the location of the x-axis to zero (if that was not the default).  This
> should draw bars with their bases on the bottom/inner of the track, and the
> scale running along the bottom/inner of the track. 
> 
> ...if you wanted to represent some data as a bar graph, with a special meaning
> for the mean (or median) value, you could optionally set y-limits, but have the
> x-axis cross at mean(data) or median(data).  This should draw bars with their
> bases on the x-axis, and the axis located at the mean/median value for the
> data.


I submitted the changes which do somehow what is described above, i.e. still by
default the x-axis is drawn in the middle of the track (it is still left for
now like this in order not to change the default behavior of GenomeDiagram). If
the x-axis is specified to be drawn at the bottom or top of the track then the
x-axis is drawn there and the values for bars/lines in the graph are drawn
using zero-based (if the some values are positive and other are negative) or
min (if all values are positive) or max (all values are negative). Hence only
when specifying the x-axis to be drawn at the bottom or top for the track, the
behavior of the graph and plotting are affected. The limits are computed
automatically.

> 
> Does this help clarify what I meant, above?

It helped. Thanks!

BR,
Daniel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Dec 10 12:20:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Dec 2009 12:20:55 +0000
Subject: [Biopython-dev] Removing C implementation of deprecated listfns,
	mathfns, stringfns
Message-ID: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>

Hi all,

The modules listfns, mathfns, stringfns are now all deprecated. They
all have both a C implementation and a pure Python implementation.

We could wait for the complete deprecation process, and remove
the C code when the Python code gets removed. However, I would
like remove their C implementations for the next release, as this will
simplify our code base.

The only downside is anyone still using these modules will get
a deprecation warning and a possible slow down (as the C code
wouldn't exist any more). Also anyone using the C code directly
will be in trouble (but no-one should be doing that...).

Any comments? Objections?

Thanks,

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 12:39:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:39:15 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101239.nBACdFtu018207@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #10 from chapmanb at 50mail.com  2009-12-10 07:39 EST -------
Thanks Peter. All of the tests will run on SQLite provided sqlite3 is
installed, so there is no need to split them. I enabled SQLite by default, so
they will run automatically if a user has sqlite3 and fail gracefully with a
dependency error if not.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 12:43:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:43:28 -0500
Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to
	Emboss/Applications.py
In-Reply-To: <bug-2969-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101243.nBAChSHg018300@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2969


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:43 EST -------
(In reply to comment #2)
> Do we want to set any particular policy for the sequence-associated and
> outfile-associated arguments?  Their inclusion in the command-line wrappers
> is pretty inconsistent, which is why I left them out in the first place.

In the long term, I'd like us to look at generating the wrappers automatically
from the EMBOSS ACD files which define their tool options. For now, since some
EMBOSS tools have so many options, they have been added in a somewhat ad-hoc
basis based on what the coder thought most important, or user feedback.

Fix checked in with addition of aformat option.

Thanks! Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 12:52:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:52:16 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101252.nBACqGp6018512@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:52 EST -------
(In reply to comment #10)
> Thanks Peter. All of the tests will run on SQLite provided sqlite3 is
> installed, so there is no need to split them. I enabled SQLite by default, so
> they will run automatically if a user has sqlite3 and fail gracefully with a
> dependency error if not.

That's great as is. I was thinking about something more: What I meant was, I
want to be able to run all the tests on SQLite (by default) AND on another back
end (e.g. MySQL) if the user has configured it. Otherwise we (as developers)
have to manually switch the BioSQL settings and rerun the BioSQL unit tests.

I will be able to test the effect of your changes on MySQL, hopefully Cymon
can do this on PostgreSQL - not that I anticipate and regressions, but best
to be sure ;)

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 12:56:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 07:56:44 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101256.nBACuheQ018635@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 07:56 EST -------
(In reply to comment #11)
> 
> That's great as is. I was thinking about something more: What I meant was, I
> want to be able to run all the tests on SQLite (by default) AND on another
> back end (e.g. MySQL) if the user has configured it. Otherwise we (as
> developers) have to manually switch the BioSQL settings and rerun the BioSQL
> unit tests.
> 

On reflection, that kind of improvement can wait until after Biopython 1.53 is
out. It would be great to make it completely general so that if you have all
the backends installed the test suite could check on SQLite, MySQL, PostgreSQL
etc.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 13:15:45 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 08:15:45 -0500
Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM
	records (Bio.PDB.PDBParser)
In-Reply-To: <bug-2495-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101315.nBADFj7O019533@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2495


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 08:15 EST -------
(In reply to comment #2)
> 
> Leaving bug open to deal with the output as well.
> 

Marking bug as fixed. I've just committed a change based on a patch from
Frederik Gwinner via GitHub - Bio.PDB.PDBIO should now save the element
on output now, 

Please reopen this bug if there is any problem.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Dec 10 14:25:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 10 Dec 2009 14:25:53 +0000
Subject: [Biopython-dev] Removing C implementation of deprecated listfns,
	mathfns, stringfns
In-Reply-To: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>
References: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com>
Message-ID: <320fb6e00912100625s48ba290cj1234d757da0b94f@mail.gmail.com>

On Thu, Dec 10, 2009 at 12:20 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> The modules listfns, mathfns, stringfns are now all deprecated. They
> all have both a C implementation and a pure Python implementation.
>
> We could wait for the complete deprecation process, and remove
> the C code when the Python code gets removed. However, I would
> like remove their C implementations for the next release, as this will
> simplify our code base.
>
> The only downside is anyone still using these modules will get
> a deprecation warning and a possible slow down (as the C code
> wouldn't exist any more). Also anyone using the C code directly
> will be in trouble (but no-one should be doing that...).
>
> Any comments? Objections?

I hope there are no objections as I've just done this on the trunk ;)

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Dec 10 14:54:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Dec 2009 09:54:17 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912101454.nBAEsHdi023376@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-10 09:54 EST -------
(In reply to comment #11)
> 
> I will be able to test the effect of your changes on MySQL, hopefully Cymon
> can do this on PostgreSQL - not that I anticipate and regressions, but best
> to be sure ;)
> 

The branch still merges cleanly onto the trunk (I had already manually applied
the Bio/SeqIO/InsdcIO.py date fix to the trunk). Testing "as is" on Mac OS X
10.5 with Apple's Python 2.5.2 uses SQLite, and works. Changing setup_BioSQL.py
to use MySQL also works fine :)

I have not yet tried this on Windows.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 18:12:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:12:23 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121812.nBCICNWt003206@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #14 from cymon.cox at gmail.com  2009-12-12 13:12 EST -------
(In reply to comment #11)
> I will be able to test the effect of your changes on MySQL, hopefully Cymon
> can do this on PostgreSQL - not that I anticipate and regressions, but best
> to be sure ;)

Is SQLite ":memory:" TESTDB working for you on Brads branch?

It fails for me, all else is fin (incl the SQLite file db). 

[cymon at spiro Tests]$ python test_BioSQL_SeqIO.py
Connecting to database
Removing existing sub-database 'biosql-seqio-test' (if exists)
Traceback (most recent call last):
  File "test_BioSQL_SeqIO.py", line 134, in <module>
    if db_name in server.keys():
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
123, in keys
    return self.adaptor.list_biodatabase_names()
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
306, in list_biodatabase_names
    "SELECT name FROM biodatabase")
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
355, in execute_and_fetch_col0
    self.execute(sql, args or ())
  File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line
336, in execute
    self.dbutils.execute(self.cursor, sql, args)
  File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 53, in
execute
    cursor.execute(sql, args or ())
sqlite3.OperationalError: no such table: biodatabase


Perhaps its my sqlite installation - I'm not familiar with it:

[cymon at spiro BioSQL]$ dpkg -l|egrep sqlite
ii  libmono-sqlite2.0-cil                2.4.2.3+dfsg-2                        
    Mono Sqlite library (for CLI 2.0)
ii  libsqlite0                           2.8.17-6build1                        
    SQLite shared library
ii  libsqlite3-0                         3.6.16-1ubuntu1                       
    SQLite 3 shared library
ii  sqlite3                              3.6.16-1ubuntu1                       
    A command line interface for SQLite 3

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 18:33:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:33:15 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121833.nBCIXFCH003747@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-12 13:33 EST -------
(In reply to comment #14)
> (In reply to comment #11)
> > I will be able to test the effect of your changes on MySQL, hopefully Cymon
> > can do this on PostgreSQL - not that I anticipate and regressions, but best
> > to be sure ;)
> 
> Is SQLite ":memory:" TESTDB working for you on Brads branch?

I didn't try that specifically - just SQLite on disk. Brad?

>
> It fails for me, all else is fin (incl the SQLite file db)
>

But the good news is Brad's changes to BioSQL/*.py haven't caused any
regressions on PostreSQL :)

Thanks Cymon,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 18:39:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 13:39:07 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121839.nBCId7U6003831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #16 from cymon.cox at gmail.com  2009-12-12 13:39 EST -------
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #11)
> > > I will be able to test the effect of your changes on MySQL, hopefully Cymon
> > > can do this on PostgreSQL - not that I anticipate and regressions, but best
> > > to be sure ;)
> > 
> > Is SQLite ":memory:" TESTDB working for you on Brads branch?
> 
> I didn't try that specifically - just SQLite on disk. Brad?
> 
> >
> > It fails for me, all else is fin (incl the SQLite file db)
> >
> 
> But the good news is Brad's changes to BioSQL/*.py haven't caused any
> regressions on PostreSQL :)

Yep, no problems, although I only tried the psycopg2 driver (with and without
rules deletion).

Psycopg version 1 support has had a deprecation warning since version 1.53
http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it?

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 19:05:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 14:05:02 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121905.nBCJ52Nn004276@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #17 from chapmanb at 50mail.com  2009-12-12 14:05 EST -------
Thanks Cymon -- glad nothing is broken on Postgres. 

The in memory database (:memory:) doesn't work for the tests, because they
assume a database created by previous test cases. Since the memory one keeps
going away, they will get plenty of errors about non-existing tables. It would
work in theory with some test re-writing, but it's not too necessary.

Sorry, should have added a note about this. Thanks again for double checking
that everything works.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 12 19:41:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 12 Dec 2009 14:41:12 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912121941.nBCJfCXr004756@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-12 14:41 EST -------
(In reply to comment #16)
> 
> Yep, no problems, although I only tried the psycopg2 driver (with and
> without rules deletion).
> 
> Psycopg version 1 support has had a deprecation warning since version 1.53
> http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it?
> 
> C.
> 

Minor typo - Psycopg v1 support was deprecated in Biopython 1.51 (August 2009).
In line with the current deprecation policy, we aim for two releases with the
warning (which has happened already, 1.51 and 1.52) plus at least one year -
which means we can drop Psycopg v1 in summer 2010. Given in this case its a
fairly simple task for someone to just install Psycopg v2, we might look at
dropping the Psycopg v1 support a little quicker (say Biopython 1.54?).

See: http://www.biopython.org/wiki/Deprecation_policy

(In reply to comment #17)
> Thanks Cymon -- glad nothing is broken on Postgres. 
> 
> The in memory database (:memory:) doesn't work for the tests, because they
> assume a database created by previous test cases. Since the memory one keeps
> going away, they will get plenty of errors about non-existing tables. It would
> work in theory with some test re-writing, but it's not too necessary.
> 
> Sorry, should have added a note about this. Thanks again for double checking
> that everything works.

OK then - Brad, would you like to merge this to the trunk now (or in the next
few days), add a note about not using :memory: in Tests/setup_BioSQL.py, and
something to the NEWS file (with a proviso about the SQLite schema not yet
being official)?

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 14 12:48:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 Dec 2009 07:48:28 -0500
Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL
In-Reply-To: <bug-2866-42@http.bugzilla.open-bio.org/>
Message-ID: <200912141248.nBECmS6b007714@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2866


chapmanb at 50mail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #19 from chapmanb at 50mail.com  2009-12-14 07:48 EST -------
Peter and Cymon -- thanks again for the help. Merged into the main trunk and
marking this as resolved.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Dec 14 16:24:44 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 14 Dec 2009 16:24:44 +0000
Subject: [Biopython-dev] Plans for Biopython 1.53
In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com>
Message-ID: <320fb6e00912140824x3bfa58cfy8520142c0fea3a45@mail.gmail.com>

On Mon, Dec 7, 2009 at 1:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> One good reason for doing Biopython 1.53 soon is the
> NCBI said they plan to start using the new Jan 2010 DTD
> files for MedLine/PubMed as early as mid December:
> http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html

I've just checked the PubMed XML from efetch, and the
NCBI are still using the old 2009 DTD file. I guess it is
only midday in the USA, so plenty of time for them to
make the switch on 14 Dec as announced...

Once that happens (hopefully within hours), and I've
checked the Entrez parser is still happy, we can do
the Biopython release.

Until then, only documentation and unit tests fixes
on the trunk please.

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec 15 10:45:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 10:45:31 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
Message-ID: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>

Hello all,

I plan to do the Biopython 1.53 release this afternoon (in a few hours time).

If there are any last minute changes anyone wants to make on the trunk,
please email first. Ideally just documentation or additional unit tests at this
point ;)

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec 15 15:29:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 15:29:48 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
Message-ID: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>

On Tue, Dec 15, 2009 at 10:45 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hello all,
>
> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).
>

OK - Everything looks good on the code side, git has been tagged, source
archives and windows installers uploaded. If anyone could double check
the installers work on your machines that would be great.

Brad - could you run a sanity test before uploading to pypi?

David - did you manage to draft a release announcement? If not, don't
worry, I'll make one up ;)

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec 15 16:28:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 16:28:13 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
Message-ID: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>

On Tue, Dec 15, 2009 at 3:29 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Dec 15, 2009 at 10:45 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hello all,
>>
>> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).
>>
>
> OK - Everything looks good on the code side, git has been tagged, source
> archives and windows installers uploaded. If anyone could double check
> the installers work on your machines that would be great.
>
> Brad - could you run a sanity test before uploading to pypi?
>
> David - did you manage to draft a release announcement? If not, don't
> worry, I'll make one up ;)

Draft text below - any comments?

Thanks,

Peter

----

We are pleased to announce the availability of Biopython 1.53, a new
stable release of the Biopython library, three months after the
release of Biopython 1.52. This is our first release since migrating
from CVS to git for source code control.

There have been some additions to our core objects ? the Seq (and
related UnknownSeq) objects gained upper and lower methods (like the
string methods of the same name but alphabet aware) plus a new ungap
method. The SeqFeature object now has an extract method to get the
region of sequence it describes (useful for getting CDS nucleotide
sequences from GenBank files). Also SeqRecord objects now support
addition, giving a new SeqRecord with the combined sequence, all the
SeqFeatures, and any common annotation.

SQLite support (built into Python 2.5+) was added to our BioSQL
interface. This is still a little experimental as we are using a draft
BioSQL SQLite schema, but this should be merged into the next BioSQL
release.

Biopython now includes wrappers for the new NCBI BLAST C++ tools,
which will be replacing the old NCBI ?legacy? BLAST tools written in
C. The plain text BLAST parser has been updated to cope as well.
Nevertheless, we (and the NCBI) still recommend using the XML output
for parsing.

Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for
parsing MedLine/PubMed data.

The NCBI codon tables have been updated from version 3.4 to 3.9, which
adds a few extra start codons, and a few new tables (Tables 16, 21, 22
and 23).

The restriction enzyme list in Bio.Restriction has been updated to the
Nov 2009 release of REBASE.

The Bio.PDB parser and output code has been updated to understand the
element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been
updated for recent changes to the PDB FTP site.

Finally, support for running Biopython under Jython (using the Java
Virtual Machine) has been much improved. Note that Jython does not
support C code, and currently Jython does not parse DTD files (needed
for the Bio.Entrez XML parser). However, most of the Biopython modules
seem fine from testing Jython 2.5.0 and 2.5.1.

Sources and Windows Installers are available from our downloads page.

Thanks to the Biopython development team and to everyone who has
reported bugs or contributed patches since our last release.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:32:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:28 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWS6a022173@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:32 EST -------
Fixed in Biopython 1.53, using a similar technique but complicated because
this file is generated by a separate script.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:32:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:46 -0500
Subject: [Biopython-dev] [Bug 2892] Jython MatrixInfo.py fix+patch
In-Reply-To: <bug-2892-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWkSA022203@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2892


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:32 EST -------
Fixed in Biopython 1.53 using a similar technique.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:32:48 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:48 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWm0Q022215@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


Bug 2895 depends on bug 2892, which changed state.

Bug 2892 Summary: Jython MatrixInfo.py fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2892

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:32:51 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:32:51 -0500
Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch
In-Reply-To: <bug-2893-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151632.nBFGWpCp022227@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2893


Bug 2893 depends on bug 2892, which changed state.

Bug 2892 Summary: Jython MatrixInfo.py fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2892

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:33:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:33:13 -0500
Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch
In-Reply-To: <bug-2893-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151633.nBFGXD3Y022254@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2893


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:33 EST -------
Fixed in Biopython 1.53 using a similar technique


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:33:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:33:15 -0500
Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary
	Jython Error Fix+Patch
In-Reply-To: <bug-2895-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151633.nBFGXFa7022266@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2895


Bug 2895 depends on bug 2893, which changed state.

Bug 2893 Summary: Jython test_prosite fix+patch
http://bugzilla.open-bio.org/show_bug.cgi?id=2893

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:41:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:41:30 -0500
Subject: [Biopython-dev] [Bug 2807] Clustalw return codes
In-Reply-To: <bug-2807-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151641.nBFGfUpS022532@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2807


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:41 EST -------
Bio.Clustalw was declared obsolete in Release 1.52, so there is no reason to
add
better support for return codes. With the new alignment wrappers and subprocess
this is a non-issue.

Marking as "won't fix".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 15 16:46:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Dec 2009 11:46:17 -0500
Subject: [Biopython-dev] [Bug 2820] Convert test_PDB.py to unittest
In-Reply-To: <bug-2820-42@http.bugzilla.open-bio.org/>
Message-ID: <200912151646.nBFGkHAG022705@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2820


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-15 11:46 EST -------
(In reply to comment #1)
> 
> I've checked in a slightly modified version as test_PDB_unit.py - I think
> having both this and the original test_PDB.py is sensible in the short term.
> 

I've just removed old print-and-compare test_PDB.py, then renamed
test_PDB_unit.py to test_PDB.py


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Dec 15 17:01:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 Dec 2009 17:01:38 +0000
Subject: [Biopython-dev] Biopython 1.53 released
Message-ID: <320fb6e00912150901k138ae04bmc5d5af9c867340ec@mail.gmail.com>

Dear Biopythoneers,

We are pleased to announce the availability of Biopython 1.53, a new
stable release of the Biopython library, three months after the
release of Biopython 1.52. This is our first release since migrating
from CVS to git for source code control.

There have been some additions to our core objects ? the Seq (and
related UnknownSeq) objects gained upper and lower methods (like the
string methods of the same name but alphabet aware) plus a new ungap
method. The SeqFeature object now has an extract method to get the
region of sequence it describes (useful for getting CDS nucleotide
sequences from GenBank files). Also SeqRecord objects now support
addition, giving a new SeqRecord with the combined sequence, all the
SeqFeatures, and any common annotation.

SQLite support (built into Python 2.5+) was added to our BioSQL
interface. This is still a little experimental as we are using a draft
BioSQL SQLite schema, but this should be merged into the next
BioSQL release.

Biopython now includes wrappers for the new NCBI BLAST C++ tools,
which will be replacing the old NCBI ?legacy? BLAST tools written in
C. The plain text BLAST parser has been updated to cope as well.
Nevertheless, we (and the NCBI) still recommend using the XML output
for parsing.

Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for
parsing MedLine/PubMed data.

The NCBI codon tables have been updated from version 3.4 to 3.9, which
adds a few extra start codons, and a few new tables (Tables 16, 21, 22
and 23).

The restriction enzyme list in Bio.Restriction has been updated to the
Nov 2009 release of REBASE.

The Bio.PDB parser and output code has been updated to understand the
element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been
updated for recent changes to the PDB FTP site.

Finally, support for running Biopython under Jython (using the Java
Virtual Machine) has been much improved. Note that Jython does not
support C code, and currently Jython does not parse DTD files (needed
for the Bio.Entrez XML parser). However, most of the Biopython modules
seem fine from testing Jython 2.5.0 and 2.5.1.

Sources and Windows Installers are available from our downloads page.

Thanks to the Biopython development team and to everyone who has
reported bugs or contributed patches since our last release.

--Peter, on behalf of the Biopython developers

P.S. This news post is online at
http://news.open-bio.org/news/2009/12/biopython-release-153/

You may wish to subscribe to our news feed.  For RSS links etc, see:
http://biopython.org/wiki/News

Biopython news is also on twitter:
http://twitter.com/biopython


From chapmanb at 50mail.com  Wed Dec 16 12:42:35 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 16 Dec 2009 07:42:35 -0500
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
	<320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
Message-ID: <20091216124235.GK78379@sobchak.mgh.harvard.edu>

Hi Peter;

> >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time).

Sorry I am too slow with your mails. Thanks for the hard work
getting this together. Great stuff.

> > Brad - could you run a sanity test before uploading to pypi?

Looks good to me, and uploaded to pypi.

> Draft text below - any comments?

As a thought for next time, what do you think about adding the
names of people who have worked on the items mentioned in the
release? This would give a bit more public recognition for the
contributions, especially to people who only look at the release
notes and not mailing list traffic.

Thanks again,
Brad


From biopython at maubp.freeserve.co.uk  Wed Dec 16 22:43:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Dec 2009 22:43:16 +0000
Subject: [Biopython-dev] Code freeze for Biopython 1.53
In-Reply-To: <20091216124235.GK78379@sobchak.mgh.harvard.edu>
References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com>
	<320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com>
	<320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com>
	<20091216124235.GK78379@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912161443q30f82120of1c98b073136c3f6@mail.gmail.com>

Brad wrote:
>> Brad - could you run a sanity test before uploading to pypi?
>
> Looks good to me, and uploaded to pypi.

Great, thank you.

>> Draft text below - any comments?
>
> As a thought for next time, what do you think about adding the
> names of people who have worked on the items mentioned in the
> release? This would give a bit more public recognition for the
> contributions, especially to people who only look at the release
> notes and not mailing list traffic.

Its too late for the emails and the source code bundles, but
the nice thing about the NEWS file (in the repository) and
the OBF news server is we can update them even now.

Of course, quite where to draw the line is debatable - a simple
patch probably doesn't warrant it (or does it?), but solving a
more complex bug or adding some new functionality does.
If any existing core developers want more "recognition" we
can do that too.

For example, Kyle, would you have like to be named with
regards to the Jython work? I almost put you in anyway,
but in the end just mentioned it on twitter:
http://twitter.com/Biopython/statuses/6502469425

Another idea to showcase new features would be for the
author(s) to prepare a (credited) blog post with some
examples (to put on our news server). I have already done
a few like this, and think it would also be a good thing in
general.

Peter


From kellrott at gmail.com  Thu Dec 17 01:39:49 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Wed, 16 Dec 2009 17:39:49 -0800
Subject: [Biopython-dev] zxJDBC support for BioSQL
Message-ID: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>

I've push pushed a patch to the BioSQL code that enables zxJDBC support.
This means that Jython can now run BioSQL through mysql.  (SQLite hasn't
been ported to Java yet)
zxJDBC is a Jython module included in the standard distribution that
provides a PythonDB interface through the java sql interfaces.  I've only
ran the unit tests using the mysql-connector, but it should theoretically
work with Oracle as well.
The biggest issues for changing code:
 - Java expects ? instead of %s, so sql strings have to be altered (I
override the execute method in the DBUtils to run a regular express before
execution)
 - A Sql string with a=? works, one with a='?' does not (Loader.py had some
examples of this)
 - Java returns unicode, not strings (recent patch to the mainline fixes
this)

Code can be found at http://github.com/kellrott/biopython

Kyle


From biopython at maubp.freeserve.co.uk  Thu Dec 17 10:46:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 10:46:37 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
Message-ID: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>

On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>
> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
> been ported to Java yet)
> zxJDBC is a Jython module included in the standard distribution that
> provides a PythonDB interface through the java sql interfaces. ?I've only
> ran the unit tests using the mysql-connector, but it should theoretically
> work with Oracle as well.

Sounds good, and ought to work on PostgreSQL too in theory.

I should be able to test it on MySQL.

> The biggest issues for changing code:
> ?- Java expects ? instead of %s, so sql strings have to be altered (I
> override the execute method in the DBUtils to run a regular express
> before execution)
> ?- A Sql string with a=? works, one with a='?' does not (Loader.py had some
> examples of this)
> ?- Java returns unicode, not strings (recent patch to the mainline fixes
> this)

Some of those issues applied to SQLite (hence the changes on the
trunk from Brad).

> Code can be found at http://github.com/kellrott/biopython

Lovely. That's on your jython branch (along with lots of your other work)?

Peter


From biopython at maubp.freeserve.co.uk  Thu Dec 17 13:31:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 13:31:30 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <320fb6e00912170531j3f9c9b38n123e0464fa536e45@mail.gmail.com>

On Thu, Dec 17, 2009 at 10:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>>
>> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
>> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
>> been ported to Java yet)
>> zxJDBC is a Jython module included in the standard distribution that
>> provides a PythonDB interface through the java sql interfaces. ?I've only
>> ran the unit tests using the mysql-connector, but it should theoretically
>> work with Oracle as well.
>
> Sounds good, and ought to work on PostgreSQL too in theory.
>
> I should be able to test it on MySQL.

I worked out I needed to install MySQL Connector/J  so that
org.gjt.mm.mysql.Driver works in Jython, get it from here:
http://dev.mysql.com/downloads/connector/j/

Installation seems to be just unzipping this and updating your
CLASSPATH environment variable to point at the jar file.

Peter


From biopython at maubp.freeserve.co.uk  Thu Dec 17 14:54:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 14:54:13 +0000
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <320fb6e00912170654g41bc8c4eyce0f56b4472076f9@mail.gmail.com>

On Thu, Dec 17, 2009 at 10:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott <kellrott at gmail.com> wrote:
>>
>> I've push pushed a patch to the BioSQL code that enables zxJDBC support.
>> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't
>> been ported to Java yet)

Maybe one day Jython will have a Python sqlite3 like library built in:
http://bugs.jython.org/issue1682864

For now it looks like we could probably use SQLite via zxJDBC instead
(see links on that Jython issue).

Peter


From kellrott at gmail.com  Thu Dec 17 18:03:38 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Thu, 17 Dec 2009 10:03:38 -0800
Subject: [Biopython-dev] zxJDBC support for BioSQL
In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
References: <bb02be080912161739k69e63916rbb488a6d6f35948d@mail.gmail.com>
	<320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com>
Message-ID: <bb02be080912171003n58ba38dej8a9aeed15a289223@mail.gmail.com>

> > Code can be found at http://github.com/kellrott/biopython
>
> Lovely. That's on your jython branch (along with lots of your other work)?
>

Yes, but all of the zxJDBC work has been done in the past 2 weeks (just the
last three commits), so it should be easy to cherry-pick out the relevant
patches.

Kyle


From mhampton at d.umn.edu  Thu Dec 17 18:42:33 2009
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Thu, 17 Dec 2009 12:42:33 -0600 (CST)
Subject: [Biopython-dev] code credits
In-Reply-To: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
References: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
Message-ID: <Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>


I strongly encourage you to list anyone who has contributed a patch, no 
matter how small.  This has worked very well for the Sage project 
(www.sagemath.org) where credit is given to all contributors and reviewers 
(every patch must be reviewed by at least one other person).  For example 
see:

http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f

Marshall Hampton
Department of Mathematics and Statistics
University of Minnesota, Duluth

> Message: 1
> Date: Wed, 16 Dec 2009 22:43:16 +0000
> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Code freeze for Biopython 1.53
> To: Brad Chapman <chapmanb at 50mail.com>, biopython-dev at biopython.org
> Message-ID:
> 	<320fb6e00912161443q30f82120of1c98b073136c3f6 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Brad wrote:
>>> Brad - could you run a sanity test before uploading to pypi?
>>
>> Looks good to me, and uploaded to pypi.
>
> Great, thank you.
>
>>> Draft text below - any comments?
>>
>> As a thought for next time, what do you think about adding the
>> names of people who have worked on the items mentioned in the
>> release? This would give a bit more public recognition for the
>> contributions, especially to people who only look at the release
>> notes and not mailing list traffic.
>
> Its too late for the emails and the source code bundles, but
> the nice thing about the NEWS file (in the repository) and
> the OBF news server is we can update them even now.
>
> Of course, quite where to draw the line is debatable - a simple
> patch probably doesn't warrant it (or does it?), but solving a
> more complex bug or adding some new functionality does.
> If any existing core developers want more "recognition" we
> can do that too.
>
> For example, Kyle, would you have like to be named with
> regards to the Jython work? I almost put you in anyway,
> but in the end just mentioned it on twitter:
> http://twitter.com/Biopython/statuses/6502469425
>
> Another idea to showcase new features would be for the
> author(s) to prepare a (credited) blog post with some
> examples (to put on our news server). I have already done
> a few like this, and think it would also be a good thing in
> general.
>
> Peter


From kellrott at gmail.com  Thu Dec 17 21:20:10 2009
From: kellrott at gmail.com (Kyle Ellrott)
Date: Thu, 17 Dec 2009 13:20:10 -0800
Subject: [Biopython-dev] code credits
In-Reply-To: <Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>
References: <mailman.9.1261069205.19597.biopython-dev@lists.open-bio.org>
	<Pine.SOC.4.64.0912171237220.16381@ub.d.umn.edu>
Message-ID: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>

I would agree with that.  Drawing from broad stereotypes, I would think that
a majority of contributors are academic and would be most interested in
adding things to their CV.  So acknowledgment would be of great value to
them at no real cost to the Biopython project.  Plus there's the old idea
that the more authors a paper has the more important it must be.

Kyle

I strongly encourage you to list anyone who has contributed a patch, no
> matter how small.  This has worked very well for the Sage project (
> www.sagemath.org) where credit is given to all contributors and reviewers
> (every patch must be reviewed by at least one other person).  For example
> see:
>
> http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f
>
> Marshall Hampton
> Department of Mathematics and Statistics
> University of Minnesota, Duluth
>
>


From tallpaulinjax at yahoo.com  Thu Dec 17 21:48:25 2009
From: tallpaulinjax at yahoo.com (Paul B)
Date: Thu, 17 Dec 2009 13:48:25 -0800 (PST)
Subject: [Biopython-dev] code credits
In-Reply-To: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
Message-ID: <928490.72367.qm@web30708.mail.mud.yahoo.com>

I also agree completely. Adding value to the code deserves some form of credit, if desired by the contributor. I fixed a bit of code in a couple of the modules and received no credit... that made me a good bit less gung ho about contributing more.

--- On Thu, 12/17/09, Kyle Ellrott <kellrott at gmail.com> wrote:

From: Kyle Ellrott <kellrott at gmail.com>
Subject: Re: [Biopython-dev] code credits
To: "Marshall Hampton" <mhampton at d.umn.edu>
Cc: biopython-dev at lists.open-bio.org
Date: Thursday, December 17, 2009, 4:20 PM


I would agree with that.? Drawing from broad stereotypes, I would think that
a majority of contributors are academic and would be most interested in
adding things to their CV.? So acknowledgment would be of great value to
them at no real cost to the Biopython project.? Plus there's the old idea
that the more authors a paper has the more important it must be.

Kyle

I strongly encourage you to list anyone who has contributed a patch, no
> matter how small.? This has worked very well for the Sage project (
> www.sagemath.org) where credit is given to all contributors and reviewers
> (every patch must be reviewed by at least one other person).? For example
> see:
>
> http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f
>
> Marshall Hampton
> Department of Mathematics and Statistics
> University of Minnesota, Duluth
>
>
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Thu Dec 17 22:54:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Dec 2009 22:54:40 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <928490.72367.qm@web30708.mail.mud.yahoo.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
Message-ID: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>

Hi all,

Marshall Hampton's description of how they do it on Sage
sounds worth trying - if we keep track as things are checked
in, it won't be too much work either. Do you (sage) have a
list of guidelines for what qualifies for a credit?

On Thu, Dec 17, 2009 at 9:48 PM, Paul B <tallpaulinjax at yahoo.com> wrote:
>
> I also agree completely. Adding value to the code deserves
> some form of credit, if desired by the contributor. I fixed a bit
> of code in a couple of the modules and received no credit...
> that made me a good bit less gung ho about contributing more.
>

Sorry :(  You didn't get no credit at all though, you were
named in the commit:
http://github.com/biopython/biopython/commit/225fb0eb92c99018c3710c3ec5ac0b22e9706208

Also people who offer changes via github that can be
merged cleanly onto the trunk, or cherry-picked would
also automatically get a credit in the repository history.

Would someone like to go through the git log for Biopython
1.53 for a full list? e.g. Hongbo Zhu and Frederik Gwinner
contributed to a PDB enhancement (Bug 2495), and as he
pointed out, so did Paul B (again, PDB stuff). These were
the "border line" cases I had in mind here:
http://lists.open-bio.org/pipermail/biopython-dev/2009-December/007161.html

>From personal experience contributing to other open
source project, getting a credit in release notes even for
a small bug fix/enhancement as in sage is rare. So while
I thought I was following OS norms in writing the last
release notes, we can certainly do this differently in
future.

Regards,

Peter


From mhampton at d.umn.edu  Fri Dec 18 01:54:00 2009
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Thu, 17 Dec 2009 19:54:00 -0600 (CST)
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com> 
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
Message-ID: <Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>


On Thu, 17 Dec 2009, Peter wrote:
> Marshall Hampton's description of how they do it on Sage
> sounds worth trying - if we keep track as things are checked
> in, it won't be too much work either. Do you (sage) have a
> list of guidelines for what qualifies for a credit?

I don't think we have formal guidelines, but the process is pretty simple. 
Whoever works on a patch in our bug/feature tracker has to flag it for 
review.  Both the person who implements the patch and the reviewer get 
credit.  It doesn't matter if its a 1-character change to the 
documentation, they're listed in the release notes.  Basically, the idea 
is to err (if that's the right word) on the side of acknowledging any 
contribution.  I think that Sage (really William Stein initially) adopting 
that philosophy is one of the reasons its gone from 1 to 150 or so 
developers.

I'm cc'ing sage-devel in case anyone there wants to comment on this.

Cheers,

Marshall Hampton
Department of Mathematics and Statistics
University of Minnesota, Duluth


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 09:44:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 04:44:02 -0500
Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string
	for HTML (not an error)
In-Reply-To: <bug-2938-42@http.bugzilla.open-bio.org/>
Message-ID: <200912180944.nBI9i22n007947@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2938


------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 04:44 EST -------
The offending XML file (the one that does not start with <?xml) is created by
efetch from the journals database. Upon the EUtils documentation more
carefully, it seems that XML output from the journals database is not
officially supported; only text and html output are supported. One option is to
simply remove the offending XML file from the tests, and raise an error
whenever Entrez.read is presented with data that do not start with <?xml.
Additionally, we could add a parser for the text output generated by efetch
from the journals database.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 09:46:45 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 04:46:45 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912180946.nBI9kjFA008009@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 04:46 EST -------
Peter, are you still looking at this bug report?
Otherwise I could have a look at it.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:00:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:00:50 -0500
Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy
In-Reply-To: <bug-2698-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181000.nBIA0opL008316@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:00 EST -------
Thanks for your test!

I would like to simplify the code a bit though.
How about replacing

ix, iy= expand_count([0, 0, 1],'C', 40)
xm.extend(ix)
ym.extend(iy)

by

xm.extend([0,0,1] * 40)
ym.extend(['C'] * 40)

Or, you could replace this whole section by
xm = [0,0,1]*40 + [0,0,1]*60 + [0,1,0]*75 + [0,1,0]*25 + [1,0,0]*90 +
[1,0,0]*10

and similarly for ym.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:08:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:08:24 -0500
Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion
	is too lenient
In-Reply-To: <bug-2693-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181008.nBIA8Ogf008537@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:08 EST -------
Sorry for not getting back to this bug report earlier.

(In reply to comment #3)
> > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn
> > can store the value of llik on each call.
> 
> I guess this is all how you define the purpose of the update_fn function.
> 
Do you have an example of the update_fn function where old_llik is needed?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:17:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:17:12 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181017.nBIAHCJN008837@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 05:17 EST -------
One option is to store these variables inside the function. As an example, if
this is a module mymodule.py:

def f(x = None):
    if x==None:
        x = f.x
    print x

f.x = 3

then we can do the following:

>>> import mymodule
>>> mymodule.f()
3
>>> mymodule.f(5)
5
>>> mymodule.f.x = 9
>>> mymodule.f(5)
5
>>> mymodule.f()
9
>>> 

But personally, I think that having module-level defaults is not really
necessary. We typically don't have that for other functions, and the only
reason for having them here is that once upon a time this module had such
module-level defaults.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:24:35 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:24:35 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181024.nBIAOZj6009054@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:24 EST -------
(In reply to comment #11)
> Peter, are you still looking at this bug report?
> Otherwise I could have a look at it.

Thanks Michiel - Please feel free. I didn't feel we had time to get this into
Biopython 1.53, as I think it is going to be a lot of work to assess, but needs
to be done.

I think there are two issues here, poor support for multiple models, and
re-writing the flex parser in pure python. Given time (!) I would want to take
Paul's python parser and use it to replace the flex code (which is currently
not compiled or installed by default, Bug 2619) and verify it is backwards
compatible, and then add in the model support. If we have enough test coverage
already, then doing it in one go might be OK. Up to you.

Other relevant issues include Bug 2626 (files the current parser can't read -
it may turn out that these are also multi-model CIF files).

Also regarding the model support, for PDB files we currently index them
0,1,2,... as found in the file. There are also names given in the PDB file
itself, which need not by continuous etc. See Bug 2950 and Bug 2951 for this.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:44:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:44:13 -0500
Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string
	for HTML (not an error)
In-Reply-To: <bug-2938-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181044.nBIAiDD6009554@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2938


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:44 EST -------
(In reply to comment #6)
> The offending XML file (the one that does not start with <?xml) is created by
> efetch from the journals database. Upon the EUtils documentation more
> carefully, it seems that XML output from the journals database is not
> officially supported; only text and html output are supported. One option is
> to simply remove the offending XML file from the tests, and raise an error
> whenever Entrez.read is presented with data that do not start with <?xml.
> Additionally, we could add a parser for the text output generated by efetch
> from the journals database.

Hmm - sounds like a plan, but maybe drop the Entrez team a query about this.
Does the current funny XML file have anything useful in it?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:50:03 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:50:03 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181050.nBIAo39q009740@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:50 EST -------
(In reply to comment #12)
> 
> But personally, I think that having module-level defaults is not really
> necessary. We typically don't have that for other functions, and the only
> reason for having them here is that once upon a time this module had such
> module-level defaults.

I agree the module-level defaults are not necessary - but it would be "nice"
to have a transition where both can be used. In reality, I may being overly
cautious - doubt it would affect many (any?) users to just make a clean switch
(which would keep the code simple). I'm happy to leave this to your judgement
Michiel.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 10:54:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 05:54:24 -0500
Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path
In-Reply-To: <bug-2947-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181054.nBIAsOIw009914@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2947


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-18 05:54 EST -------
(In reply to comment #0)
> 
> Thus it appears to me that the viterbi algorithm is not robust enough
> and biased towards the last letter of the state alphabet.

Quite possibly. Might there be a bug in our code, or do you think this
is just an inherent algorithm limitation?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 11:53:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 06:53:14 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181153.nBIBrEQi011286@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2009-12-18 06:53 EST -------
Fixed in github.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 14:12:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 09:12:26 -0500
Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path
In-Reply-To: <bug-2947-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181412.nBIECQ59014801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2947


------- Comment #2 from georg.lipps at fhnw.ch  2009-12-18 09:12 EST -------
Hi Peter,

I am not an expert of the Viterbi algorithm. But as such the algorithm does not
do what is is expected to do. So I guess it is indeed an error in the
implementation.

I would be very happy if it can be fixed.

Greetings,

Georg


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 18 16:15:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Dec 2009 11:15:24 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912181615.nBIGFOD2017597@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #13 from TallPaulInJax at yahoo.com  2009-12-18 11:15 EST -------
Michiel, if you have any questions please feel free to contact me!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rjalves at igc.gulbenkian.pt  Fri Dec 18 23:39:28 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Fri, 18 Dec 2009 23:39:28 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
Message-ID: <4B2C12B0.9060806@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry to take this to the discussion list, took a bit longer than I
expected to get the approval.

Bringing now the subject to the right place. Leaving full quote history
to help the reading.

Quoting Peter on 12/18/2009 09:39 PM:
> Hi Renato,
> 
> I'm cooking dinner while writing this, so it won't be as in depth as
> usual...
> 
> On Fri, Dec 18, 2009 at 5:17 PM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
>> [I tried submitting this message to the dev mailing list, but got
>> rejected since I'm not yet authorized to post there, so here it goes]
> 
> Have you definitely subscribed to the dev list? That should be all that
> is required to post there, and this discussion would be better suited
> there.
> 
>> Hi everyone,
>>
>> I'm working on changes to the Bio.SeqIO.index() function to make it more
>> consistent with the .read and .parse i.e. accept a filehandle instead of
>> a filename and also to include a way to cache the index into a file to
>> speed up the process.
>>
>> The reason why we are implementing these two is because we were going to
>> implement our own index solution until we realized this was added to 1.52.
>>
>> However the implementation in 1.52 has a few limitations.
> 
> Yes, this was designed to cover basic use cases in a general way,
> but with the option in future to do other things - and in particular
> saving the index to disk was kept in mind.
> 
>> One limitation is that we are using a gzipped database for the sake of
>> space and using gzip.open() to create the file-handle that would then be
>> passed to .parse(). The same was not doable with .index().
>> This is already implemented in
>> http://github.com/Unode/biopython/commit/6fc390151452e3ddf26a117269132125a3ffb3fe
> 
> That was a deliberate choice in that the index code wants to "own"
> the handle. If other code has access to the handle, there is a risky
> of different bits of code moving the handle pointer etc. But, if you
> are careful it could be done.
The way I approached it was to reset the handle pointer to the first
position, since we would like to index the full file. But I understand
that if the user uses the same handle on different files weird results
may happen.
Something that could be a simple workaround would be to copy the
filehandle object in such a way that it's properties are maintained
(like being a gzip.open() filehandle) but it's use doesn't affect the
use of the original handle. However I don't know if this is possible.

> 
> There are also issues here in combination with saving the index.
> With a filename, the code can easily reopen the file in the same
> mode. With a handle, things are more tricky. You have non-file
> handles to consider - such as the gzip example. There is also the
> problem of recording the file mode (normal text, universal text,
> or binary - which we will need for SFF files - code already written).
> 
I see, only after your comment I realized handle.name and handle.mode
are only available in normal filehandles. The gzip.open() example stores
the filename in .filename while the .mode seems to have a different meaning.
> If we do change the code to allow handles, it would have to be
> to allow handles OR filenames to be compatible with Biopython
> 1.52 and 1.53 (which take just filenames). This could be handled
> as in Bio.SeqIO.convert(), which also allows both (which was the
> subject of some discussion!).
> 
I'll have to look more on the example and consider the fact that my
current implementation breaks compatibility with previous code and that
not everything needed (filename, mode,...) is accessible in filehandles.
>> The second is that we are going to use this feature to quick search the
>> database in a web application. Here we have the limitation that we don't
>> have persistence across web requests, which means that we would need to
>> recalculate the index on every web request.
>>
>> The details of how we plan to implement this are the following:
>>
>> cPickle the internal dictionary of offsets and save it on the database
>> folder with the same name as the database + .index. The consistency
>> check on whether the file has changed will be performed based on name
>> and timestamp. By default .index() will search for this file, check the
>> timestamp and use the cache if they match, otherwise they will be
>> recalculated. The save function will be available like:
>>
>>>>> d = SeqIO.index(...)
>>>>> d.save(filename)
>> where filename is optional and defaults to "%s.index" % _handle.name
>>
>> We already have a solution like this implemented with subclasses of
>> SeqIO._index, it's just a matter of reworking that and merge it into
>> BioPython if you consider a good addition to the code.
>>
>> I would like to hear your comments and suggestions on this.
> 
> Yes, saving indexes is an obvious addition. I have explored
> using pickle via shelve, and also SQLite - there are
> implementations of this on my github respository, plus
> begun to look into the existing OBF Open Biological
> Database Access (OBDA) specification for cross project
> compatibility. Other potential benefits here are reduced
> memory usage if we don't keep the dictionary
> of offsets in RAM.
I did try to use pickle directly on the dict like object that is
returned from SeqIO.index() but pickle was not happy with it. The SQLite
approach also crossed my mind and also BioSQL or just some custom SQL
database, but the RAM approach seemed good enough, at least for our
current uses. I can see though that some file formats will require a lot
more RAM depending on what is indexed and their size. In the end it came
out as cPickled dictionaries for faster access.
> 
> http://github.com/peterjc/biopython/tree/index-shelve
> http://github.com/peterjc/biopython/tree/index-sqlite
> 
> There is a potential complication with index sub-classes
> which do more specialised indexing (e.g. GenBank files,
> and for a more extreme case, SFF files). See:
> http://github.com/peterjc/biopython/tree/sff-seqio
For these I would have to do it on a unittest base, I'm not familiar
with the formats. Also the implementation I did was based on the current
master branch of biopython. I now realize a lot more has been done
outside of it that I should look into.
> 
> Anyway - great to see you are finding the code useful,
> and have some quite similar ideas for how to extend
> it further.
> 
> Peter
Thanks for all that info, I have a lot to dig into and see if I can
actually contribute with something. You seem to have pretty much
everything sorted ;)

Renato
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkssEqkACgkQYh11EUYTX9QWHwCeOIuuaEGA3qLvB1EHamDohpZ3
bj0AnRAkP9jOGpvTnSc0W7YgFyX/Ard/
=S45W
-----END PGP SIGNATURE-----


From biopython at maubp.freeserve.co.uk  Sat Dec 19 09:57:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 19 Dec 2009 09:57:25 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <4B2C12B0.9060806@igc.gulbenkian.pt>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
Message-ID: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>

On Fri, Dec 18, 2009 at 11:39 PM, Renato Alves wrote:
> Sorry to take this to the discussion list, took a bit longer than I
> expected to get the approval.
>
> Bringing now the subject to the right place. Leaving full quote history
> to help the reading.

Thanks.

>> That was a deliberate choice in that the index code wants to "own"
>> the handle. If other code has access to the handle, there is a risk
>> of different bits of code moving the handle pointer etc. But, if you
>> are careful it could be done.
>
> The way I approached it was to reset the handle pointer to the first
> position, since we would like to index the full file. But I understand
> that if the user uses the same handle on different files weird results
> may happen.

OK

> Something that could be a simple workaround would be to copy the
> filehandle object in such a way that it's properties are maintained
> (like being a gzip.open() filehandle) but it's use doesn't affect the
> use of the original handle. However I don't know if this is possible.

That may work for some handles but not others. Worth trying.

>> There are also issues here in combination with saving the index.
>> With a filename, the code can easily reopen the file in the same
>> mode. With a handle, things are more tricky. You have non-file
>> handles to consider - such as the gzip example. There is also the
>> problem of recording the file mode (normal text, universal text,
>> or binary - which we will need for SFF files - code already written).
>
> I see, only after your comment I realized handle.name and handle.mode
> are only available in normal filehandles. The gzip.open() example stores
> the filename in .filename while the .mode seems to have a different
> meaning.

That would make finding out the filename from a handle tricky.

>> If we do change the code to allow handles, it would have to be
>> to allow handles OR filenames to be compatible with Biopython
>> 1.52 and 1.53 (which take just filenames). This could be handled
>> as in Bio.SeqIO.convert(), which also allows both (which was the
>> subject of some discussion!).
>
> I'll have to look more on the example and consider the fact that my
> current implementation breaks compatibility with previous code and that
> not everything needed (filename, mode,...) is accessible in filehandles.

OK.

>> Yes, saving indexes is an obvious addition. I have explored
>> using pickle via shelve, and also SQLite - there are
>> implementations of this on my github respository, plus
>> begun to look into the existing OBF Open Biological
>> Database Access (OBDA) specification for cross project
>> compatibility. Other potential benefits here are reduced
>> memory usage if we don't keep the dictionary
>> of offsets in RAM.
>
> I did try to use pickle directly on the dict like object that is
> returned from SeqIO.index() but pickle was not happy with it. The SQLite
> approach also crossed my mind and also BioSQL or just some custom SQL
> database, but the RAM approach seemed good enough, at least for our
> current uses. I can see though that some file formats will require a lot
> more RAM depending on what is indexed and their size. In the end it came
> out as cPickled dictionaries for faster access.

I agree that an in RAM dictionary works pretty well, even for
very large sequence files. In terms of speed, I would expect
a two step build index in memory, then save to disk, to be
faster than building the index database on disk (which was
a bit slow).

>> There is a potential complication with index sub-classes
>> which do more specialised indexing (e.g. GenBank files,
>> and for a more extreme case, SFF files). See:
>> http://github.com/peterjc/biopython/tree/sff-seqio
>
> For these I would have to do it on a unittest base, I'm not familiar
> with the formats. Also the implementation I did was based on
> the current master branch of biopython. I now realize a lot more
> has been done outside of it that I should look into.

I'm sorry if the discussion on the (dev) mailing list wasn't
clearer - but having a fresh set of eyes looking at the topic
is very useful.

>> Anyway - great to see you are finding the code useful,
>> and have some quite similar ideas for how to extend
>> it further.
>
> Thanks for all that info, I have a lot to dig into and see if I can
> actually contribute with something. You seem to have pretty much
> everything sorted ;)

Well, i hadn't been thinking about gzipped files (or any archives).
How does gzip behave with memory use? I assume it doesn't
load everything into RAM, but does allow you random access
(seek and tell).

This is a vague idea (which I haven't tried yet), but maybe the
Bio.SeqIO.index() function could take an optional argument
(gzip=True, or something more general like archive=...) which
would cause the file to be opened via the gzip module instead?

Regards,

Peter


From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 11:02:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 06:02:44 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191102.nBJB2iOb014900@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #6 from robfsouza at gmail.com  2009-12-19 06:02 EST -------
Created an attachment (id=1412)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1412&action=view)
Testcase for NCBI's BLAST alignment with errors

This is a sequence from Naegleria gruberi and blastpgp output which reproduces
a reported bug in NCBI's blastpgp output at the first iteration (see hit
against sequence gi|156552846|ref|XP_001600053.1). Search parameters were

blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000
-v 10000 -h 0.01 -I T -m 0 -M BLOSUM62 -F F


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 11:21:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 06:21:13 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191121.nBJBLDax015457@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


robfsouza at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1412 is|0                           |1
           obsolete|                            |


------- Comment #7 from robfsouza at gmail.com  2009-12-19 06:21 EST -------
Created an attachment (id=1413)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1413&action=view)
Testcase for NCBI's BLAST alignment with errors

Sending the right query sequence now (my mistake! :))


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 19 12:09:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Dec 2009 07:09:57 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912191209.nBJC9vxr016459@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #8 from ibdeno at gmail.com  2009-12-19 07:09 EST -------
(In reply to comment #7)
Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 using
Robson's test case.
Thanks to Robson for this and apologies for not having been able to send a test
case.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rjalves at igc.gulbenkian.pt  Sat Dec 19 21:48:10 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Sat, 19 Dec 2009 21:48:10 +0000
Subject: [Biopython-dev] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>	
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>	
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
Message-ID: <4B2D4A1A.6@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Well, i hadn't been thinking about gzipped files (or any archives).
> How does gzip behave with memory use? I assume it doesn't
> load everything into RAM, but does allow you random access
> (seek and tell).

- From what I can tell, in terms of RAM it behaves the same way as a
normal open() it only decompresses the segments as they are accessed but
doesn't cache them. A reasonable trade-off between space and access time.

> This is a vague idea (which I haven't tried yet), but maybe the
> Bio.SeqIO.index() function could take an optional argument
> (gzip=True, or something more general like archive=...) which
> would cause the file to be opened via the gzip module instead?

I thought about something similar but using a combination of extension
of the file and magic (or actually python-magic[1]). The first one is
potentially messy although it's how things are mostly done in Windows.
The second one I couldn't confirm if is available for Windows but is
widely present in Linux (and I suppose MacOS too).
In the end I dislike the idea of 'having' to use one approach or the
other depending on the OS the code is running on, however this would fit
in without breaking any compatibility with current code.

1 - http://pypi.python.org/pypi/python-magic/0.1

Renato
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkstShgACgkQYh11EUYTX9Tu3wCglh6d3rt/ANU5J45bsceqcQ78
TQ0AnjgIlNhYRMqdzl4jBGYOPdMKOY7D
=rqsi
-----END PGP SIGNATURE-----


From eric.talevich at gmail.com  Sat Dec 19 22:42:23 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 19 Dec 2009 14:42:23 -0800
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> 
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
Message-ID: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>

On Sat, Dec 19, 2009 at 1:57 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> This is a vague idea (which I haven't tried yet), but maybe the
> Bio.SeqIO.index() function could take an optional argument
> (gzip=True, or something more general like archive=...) which
> would cause the file to be opened via the gzip module instead?
>

Or: open=open -- accept a function that opens the file; by default, the
built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
user-defined function to open zip files (since that's less straightforward).

Otherwise, since the variety of archive formats supported by the Python
standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

-Eric


From rjalves at igc.gulbenkian.pt  Sun Dec 20 00:08:42 2009
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Sun, 20 Dec 2009 00:08:42 +0000
Subject: [Biopython-dev] SeqIO.index improvement suggestions
In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
Message-ID: <4B2D6B0A.4040008@igc.gulbenkian.pt>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- - From Eric Talevich on 12/19/2009 10:42 PM:
> Or: open=open -- accept a function that opens the file; by default, the
> built-in open function, but easily replaced by gzip.open, bz2.BZ2File,
> or a user-defined function to open zip files (since that's less
> straightforward).
>
> Otherwise, since the variety of archive formats supported by the Python
> standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

I prefer the first option. Flexible, backwards compatible, fits all
mentioned cases so far and allows inclusion of other formats. Got my
vote on that one.

Renato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkstawUACgkQYh11EUYTX9TJbwCgi4fQGQcfaBdJNLbMRsubjz82
4LQAnRgY0IKjwznjtiQzRNd0k8SH4oMN
=YNHc
-----END PGP SIGNATURE-----


From biopython at maubp.freeserve.co.uk  Sun Dec 20 18:06:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 20 Dec 2009 18:06:33 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
Message-ID: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>

On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote:
> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote:
>>
>> This is a vague idea (which I haven't tried yet), but maybe the
>> Bio.SeqIO.index() function could take an optional argument
>> (gzip=True, or something more general like archive=...) which
>> would cause the file to be opened via the gzip module instead?
>
> Or: open=open -- accept a function that opens the file; by default, the
> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
> user-defined function to open zip files (since that's less straightforward).

That's what I had in mind with the "archive=..." bit (I should have
been clearer), but "open" is probably a better name for it (assuming
it isn't going to become a reserved word in future versions of Python).

> Otherwise, since the variety of archive formats supported by the Python
> standard library is limited, archive='gzip'|'bz2'|'zip' sounds good.

That would work, but as you say, it is rather limited.

Peter


From biopython at maubp.freeserve.co.uk  Mon Dec 21 11:57:51 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 11:57:51 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
Message-ID: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>

Hello all,

This email has been sent to the Biopython developers list, where
we are proposing to include a list of contributors in the Biopython
1.53 and future release notes.

I have specifically CC'd Chris Lasher, Hongbo Zhu and Paul B as
"new contributors". I don't have an email address for Frederik
Gwinner but will send him this message via github instead.

On Fri, Dec 18, 2009 at 1:54 AM, Marshall Hampton wrote:
>
> On Thu, 17 Dec 2009, Peter wrote:
>>
>> Marshall Hampton's description of how they do it on Sage
>> sounds worth trying - if we keep track as things are checked
>> in, it won't be too much work either. Do you (sage) have a
>> list of guidelines for what qualifies for a credit?
>
> I don't think we have formal guidelines, but the process is pretty simple.
> Whoever works on a patch in our bug/feature tracker has to flag it for
> review. ?Both the person who implements the patch and the reviewer get
> credit. ?It doesn't matter if its a 1-character change to the documentation,
> they're listed in the release notes. ?Basically, the idea is to err (if
> that's the right word) on the side of acknowledging any contribution. ...

On that basis, this is a (partial?) list for Biopython 1.53, given
alphabetically as done by Sage:

Bartek Wilczyns
Brad Chapman
Chris Lasher (first contribution?)
Cymon Cox
Frank Kauff
Frederik Gwinner (first contribution?)
Hongbo Zhu (first contribution?)
Kyle Ellrott
Leighton Pritchard
Michiel de Hoon
Paul B (first contribution?)
Peter Cock

Am I missing anyone? Have I spelt all the names right? (Actually a
serious question - I recently made a typo on a git commit comment
and miss-typed Leighton's surname).

We can update the release note on the news server/blog to include this,
and send round another announcement email describing this plan. For
the source code, I have two suggestions:

(1) Include this in the NEWS file for each release, and continue adding
names to the single alphabetical list in the CONTRIBUTORS file.

(2) Don't included the list of names in the NEWS file, but instead put
them in the CONTRIBUTORS file. This can have a section for each
future release, with all the existing entries listed as contributors up to
and including Biopython 1.52.

I prefer the second option - the NEWS file is already quite long, and can
refer to the CONTRIBUTORS file (e.g. for Biopython 1.53 we can have a
line "(At least) 12 people contributed to this release, including 4 first time
contributors".

Peter


From chapmanb at 50mail.com  Mon Dec 21 13:23:39 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 21 Dec 2009 08:23:39 -0500
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
Message-ID: <20091221132339.GC21580@sobchak.mgh.harvard.edu>

Hi Peter;
Awesome. Nice to see all the new and familiar names from this latest
release.

> (1) Include this in the NEWS file for each release, and continue adding
> names to the single alphabetical list in the CONTRIBUTORS file.

I'd rather see it this way, which is a bit more informal and in
context. Something along the lines of:

Bob Jones added the FooBar module for parsing the latest NCBI
file format.

or:

Several bug fixes were committed to the PDB module. Thanks to Joe
Smith, Steve P and Jorge Garcia for their patches.

If people contributed to something that didn't make the new cut, then we
could just list additional contributors near the end. The goal should
be to recognize people if they contributed to a release by having
their name somewhere in the release notes. For core contributors like
yourself, you probably don't want your name next to everything so pick a
couple of your favorites for attribution.

Brad


From biopython at maubp.freeserve.co.uk  Mon Dec 21 14:34:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 14:34:38 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <20091221132339.GC21580@sobchak.mgh.harvard.edu>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>

On Mon, Dec 21, 2009 at 1:23 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Hi Peter;
> Awesome. Nice to see all the new and familiar names from this latest
> release.
>
>> (1) Include this in the NEWS file for each release, and continue adding
>> names to the single alphabetical list in the CONTRIBUTORS file.
>
> I'd rather see it this way, which is a bit more informal and in
> context. Something along the lines of:
>
> Bob Jones added the FooBar module for parsing the latest NCBI
> file format.
>
> or:
>
> Several bug fixes were committed to the PDB module. Thanks to Joe
> Smith, Steve P and Jorge Garcia for their patches.
>
> If people contributed to something that didn't make the new cut, then we
> could just list additional contributors near the end. The goal should
> be to recognize people if they contributed to a release by having
> their name somewhere in the release notes. For core contributors like
> yourself, you probably don't want your name next to everything so pick a
> couple of your favorites for attribution.

OK - some under your option (3?), the CONTRIBOTORS file is kept
in the existing style, and the NEWS file also continues in a similar
*style* to before, but making a more concious effort to include names
next to noteworthy features, and ensure any other contributors get
included at the end (e.g. "Plus miscelaneous bug fixes from X, Y
and Z").

That seems fine.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Dec 21 15:34:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Dec 2009 10:34:22 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912211534.nBLFYMKt002285@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-12-21 10:34 EST -------
(In reply to comment #8)
> (In reply to comment #7)
> Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22
> using Robson's test case. Thanks to Robson for this and apologies for not
> having been able to send a test case.

I was also able to confirmed the problem is present in blastpgp 2.2.22,
however it seems to have been fixed in the "new" BLAST+ suite, psiblast
2.2.22+ as described here:
http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html

Given this new information, this does look like an NCBI BLAST bug, and not
a problem in Biopython itself. We *might* be able to get our parser to cope
with the funny BLAST output, but it does look difficult and risky to me.

Miguel - Is it possible the BLAST bug is relatively recent and first showed
up when you updated blastpgp to 2.2.18?

Regards,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Dec 21 16:48:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 21 Dec 2009 16:48:50 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
	<320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
Message-ID: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>

Peter wrote this (with spelling fixed):
>
> OK - some under your option (3?), the CONTRIBUTORS file is kept
> in the existing style, and the NEWS file also continues in a similar
> *style* to before, but making a more concious effort to include names
> next to noteworthy features, and ensure any other contributors get
> included at the end (e.g. "Plus miscellaneous bug fixes from X, Y
> and Z").
>

Actually, looking over this again, if we want to include a "Sage style"
list of names in the release notes (which looks good), it really would
be easier if we kept this list of names in that format within the
repository (updating it as needed when new code is checked in).
The NEWS and CONTRIBUTORS files are the obvious places to
do this.

With Brad's outline (3), or at least how I understood it (and maybe
I misunderstood you Brad), the NEWS file would have the contributor
names for each release, but not in a format where they can be
copy and pasted to put together a release notice. Meanwhile the
CONTRIBUTORS file would continue as a single list of all
contributions to date. This means whomever writes the release
notice has to synthesise the contributor list by hand, which is
tedious and risks omitting people.

My earlier suggestions had the list of names in the NEWS file for
each release (1), or in the CONTRIBUTORS file broken down by
release (2). These options seem better to me just from a practical
point of view - and we can still also credit people in the main text
of the NEWS file as we do now if appropriate.

So, how about a merger of (1) and (3)? i.e.

* The CONTRIBUTORS file remains a single alphabetical list
of all contributors to date (no change).
* Entries in the NEWS file for new features etc may continue
to credit authors as appropriate.
* The NEWS file will include at the end of each release section
an alphabetical list of contributors for that release (with new
contributors flagged). This will be re-used in the release notice.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Dec 21 16:49:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Dec 2009 11:49:29 -0500
Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS
	6.1.0 arguments
In-Reply-To: <bug-2966-42@http.bugzilla.open-bio.org/>
Message-ID: <200912211649.nBLGnTed003915@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2966


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-12-21 11:49 EST -------
I also found an issue with the PrimerSearchCommandline.  The command line
options -sequences and -primers do not appear to be used in EMBOSS6.1.0, having
been replaced by -seqall and -infile, respectively.  I changed the options
accordingly, and the modified files are available at
http://github.com/widdowquinn/biopython/tree/emboss-branch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From p.j.a.cock at googlemail.com  Tue Dec 22 09:25:27 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 22 Dec 2009 09:25:27 +0000
Subject: [Biopython-dev] Fwd: Debian python-biopython packaging for
	Biopython 1.53
In-Reply-To: <E3E486F3-8609-4291-BE56-D9D760FB8C87@ini.phys.ethz.ch>
References: <320fb6e00908110407x2c4132f8va17e19aaf2b224d0@mail.gmail.com>
	<48B3E023-F75A-4F50-90CE-6FDA7DDA9E4C@ini.phys.ethz.ch>
	<320fb6e00908120308w5077f598i428b6011912c6f37@mail.gmail.com>
	<783F8F61-58D6-4638-B2C7-5C206C321C13@ini.phys.ethz.ch>
	<320fb6e00908190305o3cb4523ct1645b98f4b284d43@mail.gmail.com>
	<4151f0acb1da52f12d3f08419d3171e9@ini.phys.ethz.ch>
	<320fb6e00908200748g78485c64kc19cea88c7c4cee@mail.gmail.com>
	<E3E486F3-8609-4291-BE56-D9D760FB8C87@ini.phys.ethz.ch>
Message-ID: <320fb6e00912220125w50a600c1xcf5e4750d70b39ca@mail.gmail.com>

Hi all,

Do any of our C experts know if this compilation warning is
important (under Linux Debain, query raised by Philipp Benner
who kindly packages Biopython for Debian, which also get
used on Ubuntu).

Thanks,

Peter

---------- Forwarded message ----------
From: Philipp Benner <philipp.benner at ini.phys.ethz.ch>
Date: Mon, Dec 21, 2009 at 6:34 PM
Subject: Debian python-biopython packaging for Biopython 1.53
To: Peter Cock <p.j.a.cock at googlemail.com>


Hey Peter,

I just uploaded the new release. Just a minor question:

dpkg-shlibdeps: warning: dependency on libpthread.so.0 could be
avoided if "debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Cluster/cluster.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Motif/_pwm.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/KDTree/_CKDTree.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/trie.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/PDB/mmCIF/MMCIFlex.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Cluster/cluster.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/trie.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Motif/_pwm.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cMarkovModel.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/PDB/mmCIF/MMCIFlex.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Nexus/cnexus.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cpairwise2.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/KDTree/_CKDTree.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Nexus/cnexus.so
debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cpairwise2.so
debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cMarkovModel.so"
were not uselessly linked against it (they use none of its symbols).

is this true? It might also be an error of dpkg-shlibdeps.

Regards,
Philipp


From biopython at maubp.freeserve.co.uk  Tue Dec 22 12:14:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 12:14:32 +0000
Subject: [Biopython-dev] code credits
In-Reply-To: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>
References: <bb02be080912171320u480fe461r1f517970f08e091b@mail.gmail.com>
	<928490.72367.qm@web30708.mail.mud.yahoo.com>
	<320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com>
	<Pine.SOC.4.64.0912171946120.13591@ub.d.umn.edu>
	<320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com>
	<20091221132339.GC21580@sobchak.mgh.harvard.edu>
	<320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com>
	<320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com>
Message-ID: <320fb6e00912220414t6429f1e5n792e5feeecbe633f@mail.gmail.com>

On Mon, Dec 21, 2009 at 4:48 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> So, how about a merger of (1) and (3)? i.e.
>
> * The CONTRIBUTORS file remains a single alphabetical list
> of all contributors to date (no change).
> * Entries in the NEWS file for new features etc may continue
> to credit authors as appropriate.
> * The NEWS file will include at the end of each release section
> an alphabetical list of contributors for that release (with new
> contributors flagged). This will be re-used in the release notice.

I've done that in github - how do the NEWS and CONTRIB file look?

http://github.com/biopython/biopython/commit/86d8d99aab894ab5f32a0e7a0c45d63a441da645

I haven't automatically included email addresses for the new contributors
since there is a risk of them being harvested for spam, so I figure that
should be "opt in".

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec 22 15:34:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 15:34:37 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
	<320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
Message-ID: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>

On Sun, Dec 20, 2009 at 6:06 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote:
>> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote:
>>>
>>> This is a vague idea (which I haven't tried yet), but maybe the
>>> Bio.SeqIO.index() function could take an optional argument
>>> (gzip=True, or something more general like archive=...) which
>>> would cause the file to be opened via the gzip module instead?
>>
>> Or: open=open -- accept a function that opens the file; by default, the
>> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a
>> user-defined function to open zip files (since that's less straightforward).
>
> That's what I had in mind with the "archive=..." bit (I should have
> been clearer), but "open" is probably a better name for it (assuming
> it isn't going to become a reserved word in future versions of Python).

Proof of concept on github:
http://github.com/peterjc/biopython/tree/index-zip

This is using open_function as the new argument name (to match
the existing key_function and avoid any confusion with the built in
name open). I'm open to debate on this.

Points to note, this is untested on Windows. In particular we need
to look at gzipped plain text files using DOS/Windows new lines
(rare case?) plus gzipped plain text files using Unix new lines
(likely to be the more common of the two I'd expect). From my
initial checks, while gzip.open() does take a mode argument it
doesn't seem to support the "rU" value for universal new line
read mode. This spoils my plan to give the open_function both
the filename and the desired mode (generally "rU", but for SFF
files etc we will want to use "rb").

Peter


From biopython at maubp.freeserve.co.uk  Tue Dec 22 16:08:50 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Dec 2009 16:08:50 +0000
Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions
In-Reply-To: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>
References: <4B2BB938.5030709@igc.gulbenkian.pt>
	<320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com>
	<4B2C12B0.9060806@igc.gulbenkian.pt>
	<320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com>
	<3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com>
	<320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com>
	<320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com>
Message-ID: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com>

On Tue, Dec 22, 2009 at 3:34 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Points to note, this is untested on Windows. In particular we need
> to look at gzipped plain text files using DOS/Windows new lines
> (rare case?) plus gzipped plain text files using Unix new lines
> (likely to be the more common of the two I'd expect). From my
> initial checks, while gzip.open() does take a mode argument it
> doesn't seem to support the "rU" value for universal new line
> read mode. This spoils my plan to give the open_function both
> the filename and the desired mode (generally "rU", but for SFF
> files etc we will want to use "rb").

The gzip mode issue is interesting... running on the Mac,
Leopard 10.5, using the Apple provided Python 2.5.2,
looking at a gzipped QUAL file everything is fine:

Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open("Quality/example.qual.gz", "r").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'
>>> gzip.open("Quality/example.qual.gz", "rb").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'
>>> gzip.open("Quality/example.qual.gz", "rU").read()
'>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26
22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26
26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23
18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22
26 26 13 22 26 18\n24 18 18 18 18\n'

Looking at a gzipped FASTA file everything is fine:

>>> gzip.open("Quality/example.fasta.gz", "r").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'
>>> gzip.open("Quality/example.fasta.gz", "rb").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'
>>> gzip.open("Quality/example.fasta.gz", "rU").read()
'>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n'

But, there is a problem with my gzipped FASTQ file:

>>> gzip.open("Quality/example.fastq.gz", "r").read()
'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n'
>>> gzip.open("Quality/example.fastq.gz", "rb").read()
'@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n'
>>> gzip.open("Quality/example.fastq.gz", "rU").read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 220, in read
    self._read(readsize)
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 292, in _read
    self._read_eof()
  File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py",
line 311, in _read_eof
    raise IOError, "CRC check failed"
IOError: CRC check failed

I may have stumbled on a bug in the Python gzip library :(

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 12:00:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 07:00:56 -0500
Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text
	output with NCBStandalone.PSIBlastParser
In-Reply-To: <bug-2927-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241200.nBOC0ukq031745@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2927


------- Comment #10 from ibdeno at gmail.com  2009-12-24 07:00 EST -------
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> I was also able to confirmed the problem is present in blastpgp 2.2.22,
> however it seems to have been fixed in the "new" BLAST+ suite, psiblast
> 2.2.22+ as described here:
> http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html
> 
> Given this new information, this does look like an NCBI BLAST bug, and not
> a problem in Biopython itself. We *might* be able to get our parser to cope
> with the funny BLAST output, but it does look difficult and risky to me.
> 

I think the best strategy will be to use the BLAST+ suite, since the "old"
programs will be abandoned, as I learnt from NCBI. Also, I think we should use
XML output.  I know I promised to work on testing that, but I don't think I
will able to do the test before Februare...

> Miguel - Is it possible the BLAST bug is relatively recent and first showed
> up when you updated blastpgp to 2.2.18?
> 

I had been using 2.2.18 for quite a while (months) and never had a problem. I
think I initially thought it might be a problem with the actual databases, more
than with the program...

Best regards,


Miguel


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 15:25:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 10:25:15 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241525.nBOFPFxH003980@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp  2009-12-24 10:25 EST -------
>From testing the current flex-based MMCIF parser, it seems that it is not quite
complete. I don't think it is necessary to be backwards compatible with it. I
rather have a well-designed MMCIF parser written independently, like the one by
Paul, and have it replace the current MMCIF parser over time. This also allows
us to have the design of the new parser more consistent with other Biopython
modules.

To do so, I suggest to have the new MMCIF parser in a new module MMCIF.py under
Bio.PDB, and let it coexist with the current MMCIF parser for the time being.

Since the new MMCIF parser does not use flex, I would think that the previous
division into MMCIF2Dict and MMCIFParser may not be needed for the new parser.
Paul, do you agree? Can the new parser live in a single MMCIF.py module?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 24 15:37:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 24 Dec 2009 10:37:08 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912241537.nBOFb83e004255@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #15 from TallPaulInJax at yahoo.com  2009-12-24 10:37 EST -------
Hi Michiel,

"I have a well-designed MMCIF parser written independently": Very interesting!
Is it written solely in Python as well? I will say the parser I wrote is slower
than I would like, so if you have an alternative?

"Since the new MMCIF parser does not use flex, I would think that the previous
division into MMCIF2Dict and MMCIFParser may not be needed for the new parser."
I'm not expert enough in Python and in BioPython to know the correct call here.
Perhaps Peter could answer this? I personally like the separation of concerns
so that if someone else wanted to write their own parser, the code is modular
in nature and supports doing that.

Thanks for your help, Michiel!

Paul


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 26 10:08:05 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 26 Dec 2009 05:08:05 -0500
Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model.
In-Reply-To: <bug-2943-42@http.bugzilla.open-bio.org/>
Message-ID: <200912261008.nBQA85So025649@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2943


------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp  2009-12-26 05:08 EST -------
(In reply to comment #15)
> "I have a well-designed MMCIF parser written independently": Very interesting!
Actually I wrote "I *rather* have....". I don't have an MMCIF parser myself; I
was referring to your parser.

Btw, could you add a test case for the MMCIF parser (using some small data file
that we can include with the Biopython distribution)? Tests are not just
important to make sure everything works; often they are a very good example of
how the code works.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From eric.talevich at gmail.com  Tue Dec 29 01:51:40 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 28 Dec 2009 17:51:40 -0800
Subject: [Biopython-dev] Code review request for phyloxml branch
In-Reply-To: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com>
References: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com>
Message-ID: <3f6baf360912281751g5152a945p951dbbbcbffbddb1@mail.gmail.com>

Hi folks,

Here's an update on the status of Bio.Tree and TreeIO. I think I've taken
care of most of the blockers since the last review in September.

First, some links:
http://github.com/etal/biopython/tree/phyloxml/Bio/Tree/
http://github.com/etal/biopython/tree/phyloxml/Bio/TreeIO/
http://github.com/etal/biopython/tree/phyloxml/Tests/test_PhyloXML.py
http://github.com/etal/biopython/tree/phyloxml/Tests/test_Tree.py
http://biopython.org/wiki/PhyloXML

Discussion:

*TreeIO*
Conversion between Nexus, Newick and phyloXML tree file formats works; the
read/parse/write functions for each IO format use the same object types.
Neat!

The tree annotations (e.g. id) aren't preserved perfectly during conversions
-- I'll keep working on this, but I don't think it's a blocker. The taxon
names of terminal nodes are kept as "clade" names in phyloXML for
round-tripping. Tree topology and branch lengths seem OK.

Under the hood:
-- PhyloXMLIO is from GSoC
-- NewickIO is ported from the Bio.Nexus.Trees parser. I think it works the
same way.
-- NexusIO relies on Bio.Nexus.Nexus for parsing, then converts the
resulting Nexus.Trees.Tree objects to Bio.Tree.Newick objects. One day, when
Nexus.Trees is replaced by NewickIO in the main Nexus parser, then this
conversion can be dropped and NexusIO will be very simple.

*Tree*
The BaseTree object structure looks like this:*

-- BaseTree.**Tree* contains global tree information, like whether the tree
is rooted, and a reference to the root clade. The phyloXML Phylogeny object
inherits from this.*

-- BaseTree.**Subtree* contains local (clade- or node-specific) information,
and references to each of its direct descendents, recursively. The phyloXML
Clade object inherits from this. Nodes are implicit. I could add references
to the ancestor of each sub-tree without too much difficulty, but I haven't
needed them yet.

The same methods (get_terminals et al.) generally apply to both classes, so
I created a separate TreeMixin class from which both BaseTree.Tree and
BaseTree.Subtree inherit.

Bio.Tree.Newick contains simple subclasses of Tree and Subtree, and an
incomplete set of shims that track Bio.Nexus.Trees.Tree (minus the I/O).
This is to ease the deprecation and eventual replacement of Bio.Nexus.Trees,
as I imagine it:
(1) Port methods from Nexus.Trees to Bio.Tree, simplifying arguments where
reasonable (since the node IDs and adjacency list lookup are no longer
needed)
(2) Implement methods in Bio.Tree.Newick with the original argument lists,
but triggering a deprecation warning indicating the newer replacement method
(3) Replace Nexus.Trees with an import of Bio.Tree.Newick(IO) and a few more
shims to duplicate the original API -- so test_Nexus.py should still pass,
ideally (with deprecation warnings)
(4) In Nexus.Nexus, replace all usage of Nexus.Trees with proper usage of
NexusIO and Bio.Tree methods.
(5) Eventually delete Nexus.Trees and the shims in Bio.Tree.Newick.

I'm currently doing (1) and (2), with more emphasis on getting (1) right.
Not all of the important methods have been ported, but I'm happy with the
tree traversal methods.
*
Tests
*I created test_Tree.py to test the methods in Bio.Tree.BaseTree;
test_PhyloXML.py tests Bio.Tree.PhyloXML objects and Bio.TreeIO.PhyloXMLIO
parsing/writing.

I noticed that in Tests/Nexus/, the example file for internal node labels is
actually in Newick/NH format, not Nexus. That was briefly confusing, so
maybe that file should be renamed.

What do you think?

All the best,
Eric