From p.j.a.cock at googlemail.com  Wed Aug  1 05:27:14 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 10:27:14 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
Message-ID: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>

On Wed, Aug 1, 2012 at 1:37 AM, Zachary Charlop-Powers wrote:
> Hello Biopython,
>
> I am writing about a small feature that I would like to see implemented
> (and could possibly help to implement it: I haven't contributed before and
> am not sure exactly how tough this will be).  When using Genome Diagram to
> draw features you can specify which strand to put a feature on. If the
> strand is positive it will go above the track in the positive-facing
> direction and if negative it will go below the track in the negative facing
> direction. (seehttp://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc200) . That's a
> great behavior.

Yep - all fine so far.

> However if you use  strand="None", Genome Diagram will draw
> the features inline with the track and always in the positive direction.
> For myself, and probably others, keeping the direction of the features is
> immensely useful as you can often get a sense of operon structure in
> prokaryote genomes just by looking at the genes. Of course the forward and
> the minus strands can be drawn but condensing small sections of genes to a
> single track saves space when making images.
>
> So, would it be possible to change the default behavior of Genome Diagram
> to draw features inline (strand="None"), but to preserve their orientation?

I think I know what you mean - that kind of picture is quite common
e.g. for viruses - but only where there are no overlapping genes
on opposite strands. GenomeDiagram was written originally primarily
for bacteria, were overlapping genes on opposite strands are more
common, which may explain the design choices made.

Currently strand controls both orientation (for arrows, no effect on
box sigils) and vertical placement (above, below, or straddling the
line). Basically you want to override the vertical placement only?
Note this is sigil dependent - it makes sense for the arrow, but not
the default box (which was originally the only sigil supported).

The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

The question is how to most cleanly expose this to the user while
not breaking anything else (e.g. cross links), and ideally allow for
a related option which Leighton and I have considered (but not
had a pressing need to implement) for frame specific placement.
i.e. Rather than treating the vertical drawing spaces as two regions
(above the axis line for the forward strand, below the line for the
reverse strand), treat it as six regions (three frames above and
below the axis line). I'm picturing something a bit like the view
in the Artemis annotation editor.

One question which constrains this design choice is would you
want to mix these placements on the same track? I think yes -
using plain strandless BOX features (at the bottom of the z-order
stack) is a really useful way to to highlight a region of interest
(which could have multiple genes drawn on top of it).

That suggests this setting might be best at the GenomeDiagram
feature level. Perhaps a new attribute/argument 'strand_mode',
(a) ignore strand for vertical placement (what you want)
(b) divide vertical space in two (current behaviour)
(c) divide vertical space in six (frame specific placement)

Hmm. Leighton?

Peter

P.S. Frame specific placement would work best with an
overhaul of how we draw multi-fragment features like genes
with exons. Here a whole new sigil class for linking sub-parts
of a feature might make sense. That is again something we
only chatted about so far, but would make GenomeDiagram
more useful for drawing eukaryotic annotation.

From p.j.a.cock at googlemail.com  Wed Aug  1 06:43:59 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 11:43:59 +0100
Subject: [Biopython-dev] back_table in Bio.Data.CodonTable
In-Reply-To: <CAKN_8Do6cHn8ZJyriNppqSyoRGRiY-JYiLzwxEyd1jh=PqHx7Q@mail.gmail.com>
References: <CAKN_8DrVV1RfWHXb4bJHF9LtrmnxKya4Y0r=iG6GMEafemUUrA@mail.gmail.com>
	<CAKVJ-_7prWz2M3OCGHYrk4a2P0GZYxSyJfacHjHovwQ2K3HNEA@mail.gmail.com>
	<CAKN_8Do6cHn8ZJyriNppqSyoRGRiY-JYiLzwxEyd1jh=PqHx7Q@mail.gmail.com>
Message-ID: <CAKVJ-_7daCMtiYULJjEcv-mzpKgC7pMAMpBdW3N75M104k4VNg@mail.gmail.com>

On Tue, Jul 31, 2012 at 8:07 PM, Jeff Hussmann <jeff.hussmann at gmail.com> wrote:
> It seems desirable to have each amino acid's list of codons be given
> in a deterministic order. I have been sorting lexicographically using
> the ordering 'TCAG'. This is referred to as the 'conventional
> ordering' in CodonTable.__str__.

Lexical sorting (i.e. using Python's sort on a list of codons) seems best,
it is simple and predictable.

> The most flexible solution would be
> to take the ordering from self.nucleotide_alphabet.letters, but this
> would give 'GATC' for any CodonTable using IUPAC.unambiguous_dna as
> its nucleotide alphabet. Are there any Biopython-wide conventions
> here?

I'm not sure why the alphabets used that particular order over another.

Peter

From Leighton.Pritchard at hutton.ac.uk  Wed Aug  1 06:53:19 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Wed, 1 Aug 2012 10:53:19 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
Message-ID: <089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>

Hi all,

On 1 Aug 2012, at Wednesday, August 1, 10:27, Peter Cock wrote:

On Wed, Aug 1, 2012 at 1:37 AM, Zachary Charlop-Powers wrote:
However if you use  strand="None", Genome Diagram will draw
the features inline with the track and always in the positive direction.
For myself, and probably others, keeping the direction of the features is
immensely useful as you can often get a sense of operon structure in
prokaryote genomes just by looking at the genes.

That's true. I find it easiest to identify operon structure in that way (i.e. visually and approximately) by noting where the features swap between positive and negative strands. Other approaches might include colouring positive/negative/None strand features differently.

Of course the forward and
the minus strands can be drawn but condensing small sections of genes to a
single track saves space when making images.

It doesn't, if the single track is the same height as before - what differs is the whether the features on that track are half, or full, track height.

So, would it be possible to change the default behavior of Genome Diagram
to draw features inline (strand="None"), but to preserve their orientation?

I think there's a better way to get what you're after. Changing the default setting here would modify more than whether the arrow spans the whole track, and it would also mean that GenomeDiagram does not respect the strand data of features by default. I think that's a bad thing.

I think I know what you mean - that kind of picture is quite common
e.g. for viruses - but only where there are no overlapping genes
on opposite strands. GenomeDiagram was written originally primarily
for bacteria, were overlapping genes on opposite strands are more
common, which may explain the design choices made.

My original choice was made for a combination of reasons:

- I wanted to respect the strand information in the source data
- The 'box' sigil was easiest to draw, and was the first to be available (this carries no inherent directional information as an image)

The overlapping gene issue is relevant but, since the resolution of a drawn image is often such that boxes slightly overlap even when there is no feature overlap, it didn't feature in my consideration.

Currently strand controls both orientation (for arrows, no effect on
box sigils) and vertical placement (above, below, or straddling the
line). Basically you want to override the vertical placement only?
Note this is sigil dependent - it makes sense for the arrow, but not
the default box (which was originally the only sigil supported).

That's how I understand Zachary's suggestion: to draw an arrow with orientation preserved, but across the positive and negative strands of the track.

The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

The question is how to most cleanly expose this to the user while
not breaking anything else (e.g. cross links), and ideally allow for
a related option which Leighton and I have considered [?]

My original plan was to have more sigils available, implemented as draw_X() functions in the AbstractDrawer module. This would seem to be a good case for a draw_large_arrow() (or somesuch) function. The issue then would be a slight change to the prototypes for the existing draw_box and draw_arrow functions. Basically, we'd pass the overall bounding box and strand (x0, x1, btm, ctr, top, strand) information to the new functions, and let them decide where to place the sigil - above, below, or straddling the centre line.

Then, we could choose whether draw_arrow() takes an additional argument (e.g. straddle=True) for the behaviour that Zachary wants, or whether we use a new sigil ('large_arrow'), which could have its own function - just like that of draw_arrow() - but would probably be better implemented by just passing the straddle=True (or whatever) argument.

This way, the change is transparent to the user, except for perhaps choosing 'large_arrow' rather than 'arrow' as a sigil.

That suggests this setting might be best at the GenomeDiagram
feature level. Perhaps a new attribute/argument 'strand_mode',
(a) ignore strand for vertical placement (what you want)
(b) divide vertical space in two (current behaviour)
(c) divide vertical space in six (frame specific placement)

Hmm. Leighton?

I'm choosing to leave frame-specificity out of the discussion, for now ;)

Cheers,

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796

From p.j.a.cock at googlemail.com  Wed Aug  1 07:05:51 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 12:05:51 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
Message-ID: <CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>

On Wed, Aug 1, 2012 at 11:53 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>
> It doesn't, if the single track is the same height as before - what differs
> is the whether the features on that track are half, or full, track height.

Yes, but once you've configured the arrows to straddle the axis, you can
then allocate less vertical space to that track. i.e. it needs less space.

>> The question is how to most cleanly expose this to the user while
>> not breaking anything else (e.g. cross links), and ideally allow for
>> a related option which Leighton and I have considered [?]
>
> My original plan was to have more sigils available, implemented as draw_X()
> functions in the AbstractDrawer module. This would seem to be a good case
> for a draw_large_arrow() (or somesuch) function. The issue then would be a
> slight change to the prototypes for the existing draw_box and draw_arrow
> functions. Basically, we'd pass the overall bounding box and strand (x0, x1,
> btm, ctr, top, strand) information to the new functions, and let them decide
> where to place the sigil - above, below, or straddling the centre line.
>
> Then, we could choose whether draw_arrow() takes an additional argument
> (e.g. straddle=True) for the behaviour that Zachary wants, or whether we use
> a new sigil ('large_arrow'), which could have its own function - just like
> that of draw_arrow() - but would probably be better implemented by just
> passing the straddle=True (or whatever) argument.
>
> This way, the change is transparent to the user, except for perhaps choosing
> 'large_arrow' rather than 'arrow' as a sigil.

That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

Peter


From Leighton.Pritchard at hutton.ac.uk  Wed Aug  1 07:23:48 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Wed, 1 Aug 2012 11:23:48 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
	<CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>
Message-ID: <93ED1DEB-1C9B-4D34-A898-D326ED5F8C2F@hutton.ac.uk>


On 1 Aug 2012, at Wednesday, August 1, 12:05, Peter Cock wrote:

On Wed, Aug 1, 2012 at 11:53 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk<mailto:Leighton.Pritchard at hutton.ac.uk>> wrote:

It doesn't, if the single track is the same height as before - what differs
is the whether the features on that track are half, or full, track height.

Yes, but once you've configured the arrows to straddle the axis, you can
then allocate less vertical space to that track. i.e. it needs less space.

I understand that  - and maybe I'm being (over) pedantic - but you can allocate less vertical space to the track in either case: the question is what kind of feature representation gives you the desired information legibly at those settings ;)

That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

This option gets my vote.

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796

From zcharlop at mail.rockefeller.edu  Wed Aug  1 10:27:32 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Wed, 1 Aug 2012 14:27:32 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
Message-ID: <5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>

Leighton,
Peter,

I love that we're not in the same timezone; I ask a question when I leave work and - lo,and, behold - when I return in the morning there is a well thought out response.  Thank you both.


The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

I will take a look at this for a quick hack for some drawing I am working on.


That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

This option gets my vote.

L.


If you are both in agreement that this option is desirable and that it can be implemented in the sigil style, now we face the question of coding it. Would either of you consider working on it? If not this might be a problem I could tackle with a small amount of mentoring. Please let me know - I am happy to take a stab at it.

best regards,
zach cp


From p.j.a.cock at googlemail.com  Wed Aug  1 13:15:31 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 18:15:31 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
Message-ID: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>

On Wed, Aug 1, 2012 at 3:27 PM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu> wrote:
> Leighton,
> Peter,
>
> I love that we're not in the same timezone; I ask a question when I leave
> work and - lo,and, behold - when I return in the morning there is a well
> thought out response.  Thank you both.

:)

Peter wrote:
>>> The good news is the underlying drawing code can do this - the
>>> arrow drawing is just given a bounding box and the requested
>>> orientation (left or right) argument set by the get_feature_sigil
>>> method of the LinearDrawer or CircularDrawer.
>>>
>>> If you need this right now, a careful hack in get_feature_sigil is
>>> the way to proceed.

Zachary wrote:
> I will take a look at this for a quick hack for some drawing I am
> working on.

I hope you found any effort spent useful for understanding the
codebase... even if it doesn't turn out to be needed (see below).

Peter wrote:
>>> That was another idea I was considering. Under this model, the sigils
>>> could be given the full strand straddling bounding box, and decide if
>>> they will use all of this (i.e. the new 'large_arrow', or the current sigils
>>> when strand-less), or just half as in the stranded current 'arrow' and
>>> 'box' sigils where the strand is known.
>>>
>>> That could work quite well, and the end user API is quite clean.

Leighton wrote:
>> This option gets my vote.
>>
>> L.

Zachary wrote:
> If you are both in agreement that this option is desirable and that it can
> be implemented in the sigil style, now we face the question of coding it.
> Would either of you consider working on it? If not this might be a problem I
> could tackle with a small amount of mentoring. Please let me know - I am
> happy to take a stab at it.

I had a go this afternoon (a quite moment between rushes - grin), and
it wasn't as bad as I feared. This is on a git branch at the moment,

https://github.com/peterjc/biopython/tree/gd-big

Thus far, just two commits. The first refactors the current code to move
the strand handling into the sigil code (but should, I hope, have no
side effects):

https://github.com/peterjc/biopython/commit/d9c416be7dd2c7081bd66bd553c9feb0174ecc13

The second commit implements the new axis straddling arrow (for
both linear and circular diagrams) plus a minimal test:

https://github.com/peterjc/biopython/commit/b58903d5c455416028a8ae410b2063d536448d59

To match the current sigil argument names BOX and ARROW, I have
provisionally called BIGARROW. Any better ideas?

Also, to match the current arrow's behaviour, strand-less features
get an arrow pointing to the right (like a forward strand arrow).
Leighton and I had a little debate about this - with hindsight, the
original arrow sigil might have raised an error or drawn a box in
this situation - but I'm not willing to change this and break existing
code.

It would be great if you (Zachary) could give this a test, both to look
for regressions (anything that broke) and try the new sigil out. Are
you familiar with git, and installing Biopython from source?

Regards,

Peter

From zcharlop at mail.rockefeller.edu  Wed Aug  1 18:10:55 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Wed, 1 Aug 2012 22:10:55 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
Message-ID: <4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>

Peter wrote:
It would be great if you (Zachary) could give this a test, both to look
for regressions (anything that broke) and try the new sigil out. Are
you familiar with git, and installing Biopython from source?


Just reran my previous image-generation scripts with your BioPython. I used sigil="BIGARROW" instead of "ARROW" and it worked like a charm.  Awesome.

Would you want to add the "BIGARROW" option to the tutorial?


best,
zach cp


From p.j.a.cock at googlemail.com  Wed Aug  1 18:33:14 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 23:33:14 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
	<4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>
Message-ID: <CAKVJ-_7i+20yAtHjnLjjFGAsM0smTY4RGdmNhi1Ws19NL7nrZA@mail.gmail.com>

On Wed, Aug 1, 2012 at 11:10 PM, Zachary Charlop-Powers wrote:
>> Peter wrote:
>>
>> It would be great if you (Zachary) could give this a test, both to look
>> for regressions (anything that broke) and try the new sigil out. Are
>> you familiar with git, and installing Biopython from source?
>>
>
> Just reran my previous image-generation scripts with your BioPython.
> I used sigil="BIGARROW" instead of "ARROW" and it worked like a
> charm.  Awesome.

Great. Thanks for quickly testing this.

>
> Would you want to add the "BIGARROW" option to the tutorial?
>

Yes, if/when we merge this (and I'll try to talk to Leighton
about it tomorrow), then I would also want to update the
Tutorial to describe this new feature. There is almost no point
writing new code if we don't document it.

Peter

From tiagoantao at gmail.com  Wed Aug  1 23:39:43 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 1 Aug 2012 20:39:43 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Linux -
	Python 3.1
Message-ID: <CAA9RGEP545zevW7qaUsTqR-qFfWS+bbKJSbKCj+mwhiZufvFzw@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From Leighton.Pritchard at hutton.ac.uk  Thu Aug  2 03:42:47 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Thu, 2 Aug 2012 07:42:47 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
Message-ID: <ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>

Hi,

On 1 Aug 2012, at Wednesday, August 1, 18:15, Peter Cock wrote:

On Wed, Aug 1, 2012 at 3:27 PM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu<mailto:zcharlop at mail.rockefeller.edu>> wrote:
Leighton,
Peter,

I love that we're not in the same timezone; I ask a question when I leave
work and - lo,and, behold - when I return in the morning there is a well
thought out response.  Thank you both.

No worries.

I had a go this afternoon (a quite moment between rushes - grin),

Good job getting it done so quickly!

and it wasn't as bad as I feared.
[?]
To match the current sigil argument names BOX and ARROW, I have
provisionally called BIGARROW. Any better ideas?

BIGARROW sounds fine to me. I like literal names.

Leighton and I had a little debate about this - with hindsight, the
original arrow sigil might have raised an error or drawn a box in
this situation - but I'm not willing to change this and break existing
code.

Likewise - now it's been there so long, I think it would be inconsistent at this point to change it. Arguably, the default setting has to choose a direction simply because (single-headed) arrows have a direction. For those figures where you're being precise, users can use a box for a feature with no direction; if it's pointing the wrong way, users can set the feature strand. Left-to-right as a default is arbitrary, though.

Cheers,

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796

From p.j.a.cock at googlemail.com  Thu Aug  2 12:12:54 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 2 Aug 2012 17:12:54 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
	<ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>
Message-ID: <CAKVJ-_5s_MutuaxysAuyk6Xnfoeih6E-1a50MqnvboUBqjSMWA@mail.gmail.com>

On Thu, Aug 2, 2012 at 8:42 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>Peter wrote:
>>
>> To match the current sigil argument names BOX and ARROW, I have
>> provisionally called BIGARROW. Any better ideas?
>>
>
> BIGARROW sounds fine to me. I like literal names.
>

Great. Checked into the master, and I updated the Tutorial and
the Proux et al 2002 Figure 6 reproduction example to use this:

Before (cross-links with strand specific ARROW sigil):
http://biopython.org/DIST/docs/tutorial/images/three_track_cl2.png

After (cross-links with strand straddling BIGARROW sigil):
http://biopython.org/DIST/docs/tutorial/images/three_track_cl2a.png

Original (I don't know what was used to draw this):
http://dx.doi.org/10.1128/JB.184.21.6026-6036.2002

Regards,

Peter

From clements at galaxyproject.org  Fri Aug  3 19:23:25 2012
From: clements at galaxyproject.org (Dave Clements)
Date: Fri, 3 Aug 2012 16:23:25 -0700
Subject: [Biopython-dev] Galaxy is Hiring Postdocs
Message-ID: <CA+He-X9fND6V3MjszsFfJPotiRMobKgv2g_wjxTW=2aa+Z4gxg@mail.gmail.com>

Hello all,

The Galaxy Project <http://galaxyproject.org/>, a highly successful high
throughput data analysis platform for Life Sciences with over 23,000 users
worldwide <http://wiki.g2.bx.psu.edu/Galaxy%20Project/Statistics>, is
hiring:

The Taylor Lab <http://bx.mathcs.emory.edu/> in
Biology<http://www.biology.emory.edu>and Mathematics
& Computer Science <http://www.mathcs.emory.edu> at Emory
University<http://emory.edu/>is looking for
*postdoctoral scholars <http://bx.mathcs.emory.edu/joining/postdocs/>* to
work on the Galaxy Project. Postdoctoral
applicants<http://bx.mathcs.emory.edu/joining/postdocs/>should have
expertise in Bioinformatics and Computational Biology and
research interests that complement but extend the lab's current
interests<http://bx.mathcs.emory.edu/research/>:
The Galaxy project; distributed and high-performance computing for data
intensive science; vertebrate functional genomics; and genomics and
epigenomic mechanisms of gene regulation, the role of transcription factors
and chromatin structure in global gene expression, development, and
differentiation. See the
announcement<http://bx.mathcs.emory.edu/joining/postdocs/>for full
details (
http://bx.mathcs.emory.edu/joining/postdocs/).

The Nekrutenko Lab <http://www.bx.psu.edu/%7Eanton/> at the Huck Institutes
of Life Sciences <http://www.huck.psu.edu/> at Penn State
<http://psu.edu/>is seeking
*highly opinionated and biologically inclined* *Postdoctoral
researchers*within the Galaxy Project to develop best practices for
analysis of
next-generation sequencing data in all areas of Life Sciences where NGS is
used. Successful candidates will join a vibrant research group at the core
of the Galaxy Project and will work on setting trends in modern data-driven
life-sciences.
Please send your CV and names/e-mail addresses of three references to
jobs at galaxyproject.org.

Thanks,

Dave C.
-- 
<http://galaxyproject.org/wiki/GCC2012>http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/

From arklenna at gmail.com  Tue Aug  7 01:11:04 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 7 Aug 2012 01:11:04 -0400
Subject: [Biopython-dev] GSoC python variant update
Message-ID: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>

Full post: http://arklenna.tumblr.com/post/28890255191/

Summary:

* I'm working on the coordinate mapper Reece contributed:
http://biopython.org/pipermail/biopython/2010-June/006598.html

* I'm representing intron locations relative to CDS coords using the
HGVS standards: http://www.hgvs.org/mutnomen/refseq_figure.html
I'd like to know if there are other common ways of representing such positions.

* In order to customize the display of positions (e.g. 0-based or
1-based), I'm using a class as a configuration container. I've read on
StackOverflow that attempts to use globals or a singleton class are
discouraged in Python, but I have not found practical suggestions for
how to implement module-wide configurations. Suggestions are welcome.

* Any advice about circular genomes or strandedness is also welcome.

* This mapper will work for SeqRecords, SeqFeatures, FeatureLocations,
etc. Are there other Biopython objects that store sequence coordinates
and thus should be mappable?

Regards,

Lenna

From mjldehoon at yahoo.com  Tue Aug  7 02:40:13 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 6 Aug 2012 23:40:13 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
Message-ID: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Dear all,

Currently Bio.Motif has some support for writing TRANSFAC files but not for reading TRANSFAC files. I would like to add such a parser to Bio.Motif. Do you all agree that it fits in this module? Note that the TRANSFAC files very much look like EMBL files, and therefore contain much more information than what is currently in a Bio.Motif._Motif.Motif object (the object to be generated by Bio.Motif.read(handle, "transfac")). Perhaps the easiest is to add an attribute .annotations to Bio.Motif._Motif.Motif objects, and use it as a dictionary to store the EMBL-like annotations under their 2-letter keys.

On a related note, currently Bio.Motif._Motif.Motif objects also perform functions that are more appropriate for a separate PWM (position-weight matrix) class within Bio.Motif. It may be a good idea to have a separate PWM class for this functionality.

Best,
-Michiel.


From bartek at rezolwenta.eu.org  Tue Aug  7 03:18:43 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 7 Aug 2012 09:18:43 +0200
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CABHxouXrpENNt3DFjJ1jUu+tDfC8z7=n1GdXXkAtU-snVNwcGQ@mail.gmail.com>

Hi Michiel,

On Tue, Aug 7, 2012 at 8:40 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> Currently Bio.Motif has some support for writing TRANSFAC files but not for reading TRANSFAC files. I would like to add such a parser to Bio.Motif. Do you all agree that it fits in this module? Note that the TRANSFAC files very much look like EMBL files, and therefore contain much more information than what is currently in a Bio.Motif._Motif.Motif object (the object to be generated by Bio.Motif.read(handle, "transfac")). Perhaps the easiest is to add an attribute .annotations to Bio.Motif._Motif.Motif objects, and use it as a dictionary to store the EMBL-like annotations under their 2-letter keys.
>
That would certainly be a valuable addition. I didn't add it as a
format because it might get a bit confusing for users. The TRANSFAC
itself (trademarked, afaik), as distributed by the BIObase company and
is not available unless you pay them some license(you have to register
even for the "publicly available" one that comes with a license too).
If you do, then you get access to a number of interconnected datasets,
including information about what they call "matrices",  "sites" and
"transcription factors" and "classes". I think that if we want to
support their filetypes, we probably should think whether we should
support the matrix file only or maybe the other ones asa well. The
confusing part is that many programs use "transfac-like" formats, i.e.
files very similar to the part in the "matrix" file that corresponds
to the PWM itself. (For example see
http://www.benoslab.pitt.edu/stamp/help.html).

> On a related note, currently Bio.Motif._Motif.Motif objects also perform functions that are more appropriate for a separate PWM (position-weight matrix) class within Bio.Motif. It may be a good idea to have a separate PWM class for this functionality.

Currently, Bio.Motif.Motif class represents something sequence-like.
It can either be seen a set of instances (.add_instance(),
.search_instance()) or as a PWM (.log_odds(), search_pwm(), etc), It
can hold some annotation part (i.e. name etc), however, in my mind, it
is the core of the functionality for "motif" analysis. I can imagine
other types of motifs (we discussed regExp or HMM based motifs) that
could subclass Motif, but I think this should be the role of the Motif
class. Then comes the thing with annotations. I would rather vote for
something more similar to SeqRecord and Seq, where a new class
(MotifRecord?) would hold all the annotation data from TRANSFAC or
somesuch DB, and the Motif would remain more sequence-like. With
respect to moving the PWM-related functionality to a separate class,
I'm not sure. I think it is valuable to be able to load instances from
a file and then convert them to a PWM. It could be done with separate
classes, but I'm not sure it would be easier then...

best
Bartek
-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Tue Aug  7 04:39:15 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 7 Aug 2012 01:39:15 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <CABHxouXrpENNt3DFjJ1jUu+tDfC8z7=n1GdXXkAtU-snVNwcGQ@mail.gmail.com>
Message-ID: <1344328755.85288.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bartek,

Thanks for your reply.

--- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:
> If you do, then you get access to a number of interconnected
> datasets, including information about what they call "matrices",
>?"sites" and "transcription factors" and "classes". I think that if
> we want to support their filetypes, we probably should think whether
> we should support the matrix file only or maybe the other ones asa
> well.

I would suggest to just support the matrices for now.

> The confusing part is that many programs use "transfac-like"
> formats, i.e. files very similar to the part in the "matrix"
> file that corresponds to the PWM itself. (For example see
> http://www.benoslab.pitt.edu/stamp/help.html).

This also means that if Bio.Motif can parse TRANSFAC files, then it
can parse the transfac-like formats, at least to some degree. Personally I am actually more interested in the SwissRegulon database, which uses a transfac-like format

> Then comes the thing with annotations. I would rather
> vote for something more similar to SeqRecord and Seq,
> where a new class (MotifRecord?) would hold all the
> annotation data from TRANSFAC or somesuch DB, and the
> Motif would remain more sequence-like.

Are you suggesting that MotifRecord subclasses Bio.Motif._Motif.Motif?
For example we could have a Bio.Motif.Parsers.TRANSFAC.Motif class that subclasses Bio.Motif._Motif.Motif. Then  Bio.Motif._Motif.Motif remains sequence-like, and Bio.Motif.Parsers.TRANSFAC.Motif takes care of the annotations.

Alternatively we could say that Bio.Motif.Parsers.TRANSFAC.read returns a Bio.Motif.Parsers.TRANSFAC.Record object that contains the motif information as an attribute (so record.motif would be an instance of Bio.Motif._Motif.Motif).

Best,
-Michiel


From mjldehoon at yahoo.com  Tue Aug  7 10:47:00 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 7 Aug 2012 07:47:00 -0700 (PDT)
Subject: [Biopython-dev] Fw: Re:  Parsing TRANSFAC matrices with Bio.Motif
Message-ID: <1344350820.11922.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Forwarding Bartek's email to the list ..
I am pretty much OK with his suggestions, but feel free to comment or suggest other solutions before we start implementing this.

Best,
-Michiel.

--- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Date: Tuesday, August 7, 2012, 5:16 AM
> On Tue, Aug 7, 2012 at 10:39 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Bartek,
> >
> > Thanks for your reply.
> >
> > --- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org>
> wrote:
> >> If you do, then you get access to a number of
> interconnected
> >> datasets, including information about what they
> call "matrices",
> >> "sites" and "transcription factors" and "classes".
> I think that if
> >> we want to support their filetypes, we probably
> should think whether
> >> we should support the matrix file only or maybe the
> other ones asa
> >> well.
> >
> > I would suggest to just support the matrices for now.
> >
> I'm fine with that. Some links between the files might be
> less
> usefule, but that might be added later.
> 
> >> The confusing part is that many programs use
> "transfac-like"
> >> formats, i.e. files very similar to the part in the
> "matrix"
> >> file that corresponds to the PWM itself. (For
> example see
> >> http://www.benoslab.pitt.edu/stamp/help.html).
> >
> > This also means that if Bio.Motif can parse TRANSFAC
> files, then it
> > can parse the transfac-like formats, at least to some
> degree. Personally I am actually more interested in the
> SwissRegulon database, which uses a transfac-like format
> >
> 
> In principle yes, but there are slight variants making
> things "almost
> working". That's the main reason I didn't put the code I was
> using
> myself into biopython repository, as it might cause some
> weird
> breakages. For examples, some formats drop the P0 column
> (the
> "transfac-like" in STAMP, for one) which makes it impossible
> to figure
> out whether you are interpreting the numbers right unless
> you agree on
> some ordering of nucleotides. I would suggest that we should
> support
> databases named directly and, maybe, think about generic
> methods for
> "raw PSSM" files, that would require the user to give the
> nucleotide
> order...
> 
> >> Then comes the thing with annotations. I would
> rather
> >> vote for something more similar to SeqRecord and
> Seq,
> >> where a new class (MotifRecord?) would hold all
> the
> >> annotation data from TRANSFAC or somesuch DB, and
> the
> >> Motif would remain more sequence-like.
> >
> > Are you suggesting that MotifRecord subclasses
> Bio.Motif._Motif.Motif?
> > For example we could have a
> Bio.Motif.Parsers.TRANSFAC.Motif class that subclasses
> Bio.Motif._Motif.Motif. Then? Bio.Motif._Motif.Motif
> remains sequence-like, and Bio.Motif.Parsers.TRANSFAC.Motif
> takes care of the annotations.
> >
> > Alternatively we could say that
> Bio.Motif.Parsers.TRANSFAC.read returns a
> Bio.Motif.Parsers.TRANSFAC.Record object that contains the
> motif information as an attribute (so record.motif would be
> an instance of Bio.Motif._Motif.Motif).
> >
> 
> For me, personally, the version where transfac motif is a
> subclass of
> Motif is a more useful one. It is simpler, and it adds
> annotations as
> attributes of a motif. However, if we decided that we want
> the whole
> TRANSFAC db with all it's annotations, the more natural way
> would be
> to have separate classes for instances and motifs and maybe
> even
> separate record classes representing a database record
> (there might be
> more transfac records referencing the same matrix). I don't
> think that
> there is so much need for supporting all the stuff from
> TRANSFAC (I
> don't know anybody who would be using all their annotations,
> people
> seem to care only about matrices anyway) so I'd vote for the
> simpler
> way of subclassing Motif.
> 
> best
> Bartek
> -- 
> Bartek Wilczynski
> 


From w.arindrarto at gmail.com  Tue Aug  7 13:56:26 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 7 Aug 2012 19:56:26 +0200
Subject: [Biopython-dev] GSoC Project Update -- 11
Message-ID: <CADEGkF7Z46PKT2g28xyRPKYi0wKhNOpRm5+Q6wHz=KtKiBibBQ@mail.gmail.com>

Hello everyone,

I have just posted my latest update on my project here:
http://bow.web.id/blog/2012/08/back-on-the-main-branch/

It's been taking quite a while since I posted my last update since
there has been a considerable change to the SearchIO object model I'm
using. The details are in my blog post, but to keep it short, it was
because the previous model (QueryResult, Hit, and HSP) was inadequate
in handling files that have multiple sequences in their HSP (so far
seen in files output by BLAT and Exonerate). In my previous updates,
I've been using simple Python lists to store attributes related to
these multiple sequences, but that turned out to be problematic as it
may make the object have inconsistent attributes.

After trying out several different implementations and discussing them
with Peter, we've finally settled on a new model. The new model
changes the HSP object into a container that stores a new object:
HSPFragment. HSPFragment represents a single, contiguous alignment of
the hit and query sequence. It only stores the sequence, coordinates,
frames, and strands. Other attributes made by the search program (such
as evalues or scores) are stored in the HSP object.

This change required some modifications on all of the current parsers,
but from a user's perspective working with file formats other than
BLAT or Exonerate, the changes should be minimum.

Aside from this, there's also a small update on the main API which
lets it accept keyword arguments. The arguments modify behaviors of
the parser, and they are different for each parser. Currently, this is
only used by the BLAST tabular parser, but I imagine more parsers will
use this in the future.

Finally, having settled on a firmer object model, I'll be spending the
rest of my time to focus on the documentation. There may still be
small fixes to the code, but I expect nothing as major as this one.

regards,
Bow

From chapmanb at 50mail.com  Wed Aug  8 09:55:36 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 08 Aug 2012 09:55:36 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
References: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
Message-ID: <874nodh4iv.fsf@fastmail.fm>


Lenna;
This all sounds great and will be a nice practical addition to
Biopython. Thanks for taking it on. Some specific thoughts on your questions:

> * I'm representing intron locations relative to CDS coords using the
> HGVS standards: http://www.hgvs.org/mutnomen/refseq_figure.html
> I'd like to know if there are other common ways of representing such
> positions.

I don't know of one myself, so it's great to be following a standard
rather than reinventing something. Nice work.

> * In order to customize the display of positions (e.g. 0-based or
> 1-based), I'm using a class as a configuration container. I've read on
> StackOverflow that attempts to use globals or a singleton class are
> discouraged in Python, but I have not found practical suggestions for
> how to implement module-wide configurations. Suggestions are welcome.

With configuration items like this, you have two choices:

- A global variable.
- Pass the configuration to every function that needs it.

There are tradeoffs with both approaches, but for this case I agree with
your decision to use globals. Most people will want 0-based/Biopython
style but it gives those who don't a knob to switch over.

> * Any advice about circular genomes or strandedness is also welcome.

Circular handling is an unresolved issue in Biopython:

https://redmine.open-bio.org/issues/2578

It's a bit tricky, especially with features that span the origin.

I'd prioritize handling strandedness since you're going to have plenty
of reverse strand coding sequences. You're mapping not only within the
coding region but also back to the original sequence on the reverse
strand. So in your g2c mapping, the original gene goes from
e1 -> s1 -> e0 -> s0 as you read 5' to 3' across the sequence. The best
place to get started is to pick a reverse strand gene and then work
through the mappings, thinking through the orientations. I find drawing
it out to be the easiest way.

> * This mapper will work for SeqRecords, SeqFeatures, FeatureLocations,
> etc. Are there other Biopython objects that store sequence coordinates
> and thus should be mappable?

That sounds like a great start. Thanks again for this,
Brad

From p.j.a.cock at googlemail.com  Wed Aug  8 10:33:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 8 Aug 2012 15:33:05 +0100
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <874nodh4iv.fsf@fastmail.fm>
References: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
	<874nodh4iv.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5r9-dBQMqo95huP4kd7fmXEs_ZL8KJwTOCja+DWzmyoQ@mail.gmail.com>

On Wed, Aug 8, 2012 at 2:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>Lenna wrote:
>> * Any advice about circular genomes or strandedness is also welcome.
>
> Circular handling is an unresolved issue in Biopython:
>
> https://redmine.open-bio.org/issues/2578
>
> It's a bit tricky, especially with features that span the origin.
>
> I'd prioritize handling strandedness since you're going to have plenty
> of reverse strand coding sequences. You're mapping not only within the
> coding region but also back to the original sequence on the reverse
> strand. So in your g2c mapping, the original gene goes from
> e1 -> s1 -> e0 -> s0 as you read 5' to 3' across the sequence. The best
> place to get started is to pick a reverse strand gene and then work
> through the mappings, thinking through the orientations. I find drawing
> it out to be the easiest way.

And then think about mixed strand genes, e.g. transpliced tRNA is
a good example - there is a GenBank example in our unit tests.

Peter

From lgautier at gmail.com  Wed Aug  8 12:37:35 2012
From: lgautier at gmail.com (Laurent Gautier)
Date: Wed, 08 Aug 2012 18:37:35 +0200
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
Message-ID: <502295CF.3020103@gmail.com>

On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
> Lenna;
> This all sounds great and will be a nice practical addition to
> Biopython. Thanks for taking it on. Some specific thoughts on your questions:
>
>> >* I'm representing intron locations relative to CDS coords using the
>> >HGVS standards:http://www.hgvs.org/mutnomen/refseq_figure.html
>> >I'd like to know if there are other common ways of representing such
>> >positions.
> I don't know of one myself, so it's great to be following a standard
> rather than reinventing something. Nice work.
>
>> >* In order to customize the display of positions (e.g. 0-based or
>> >1-based), I'm using a class as a configuration container. I've read on
>> >StackOverflow that attempts to use globals or a singleton class are
>> >discouraged in Python, but I have not found practical suggestions for
>> >how to implement module-wide configurations. Suggestions are welcome.

Module-wide configuration can be implemented as variables, as long as 
they are declared before the functions using them.
If considering a package rather than a single module, options can be 
stored in a module dedicated to options (since Python modules are 
singletons).

> With configuration items like this, you have two choices:
>
> - A global variable.
> - Pass the configuration to every function that needs it.
>
> There are tradeoffs with both approaches, but for this case I agree with
> your decision to use globals. Most people will want 0-based/Biopython
> style but it gives those who don't a knob to switch over.

I'd argue that allowing to switch is an invitation to spectacular issues 
down the road.
An easy, yet frightening, example would be the case where using 
third-party code (such a module) changes this without you knowing.

An other scary thought is that this would amount to bringing the 
infamous Perl variable "$[" to Python. Go explain again that folks 
should Python for its elegance and simplicity after that.


Best,

L.

From arklenna at gmail.com  Wed Aug  8 14:44:33 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 8 Aug 2012 14:44:33 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <502295CF.3020103@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
Message-ID: <CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>

On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com> wrote:
> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>
>>
>>> >* In order to customize the display of positions (e.g. 0-based or
>>> >1-based), I'm using a class as a configuration container. I've read on
>>> >StackOverflow that attempts to use globals or a singleton class are
>>> >discouraged in Python, but I have not found practical suggestions for
>>> >how to implement module-wide configurations. Suggestions are welcome.
>
>
> Module-wide configuration can be implemented as variables, as long as they
> are declared before the functions using them.
> If considering a package rather than a single module, options can be stored
> in a module dedicated to options (since Python modules are singletons).
>

Hi Laurent,

I really like the idea of a configuration module. I will definitely
move in that direction.

>
>> With configuration items like this, you have two choices:
>>
>> - A global variable.
>> - Pass the configuration to every function that needs it.
>>
>> There are tradeoffs with both approaches, but for this case I agree with
>> your decision to use globals. Most people will want 0-based/Biopython
>> style but it gives those who don't a knob to switch over.
>
>
> I'd argue that allowing to switch is an invitation to spectacular issues
> down the road.
> An easy, yet frightening, example would be the case where using third-party
> code (such a module) changes this without you knowing.
>
> An other scary thought is that this would amount to bringing the infamous
> Perl variable "$[" to Python. Go explain again that folks should Python for
> its elegance and simplicity after that.
>
>

Yikes. My approach will not be comparable to $[. For starters, it
wouldn't modify the behavior of every sequence-like object.

My current thought would be to store the 0-based position in an
attribute `pos`, have a property `pos_str` that returns `pos` +
`Config.index`. For representations, `__str__` will return `pos_str`,
and `__repr__` will return `pos` (always 0-based). Math would always
use the 0-based position.

I intend to keep the influence of the hypothetical mapping Config
module limited to Biopython Seq* objects. It should also be possible
to make a kill switch, namely, a version of the Config module where
all of the settings are neutral to adding (i.e. `def __add__(self,
other): return other`).

Please let me know if this would not fully address your concerns.

Cheers,

Lenna

From lgautier at gmail.com  Wed Aug  8 17:58:26 2012
From: lgautier at gmail.com (Laurent Gautier)
Date: Wed, 08 Aug 2012 23:58:26 +0200
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
Message-ID: <5022E102.9010509@gmail.com>

On 2012-08-08 20:44, Lenna Peterson wrote:
> On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com> wrote:
>> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>>
>>>>> * In order to customize the display of positions (e.g. 0-based or
>>>>> 1-based), I'm using a class as a configuration container. I've read on
>>>>> StackOverflow that attempts to use globals or a singleton class are
>>>>> discouraged in Python, but I have not found practical suggestions for
>>>>> how to implement module-wide configurations. Suggestions are welcome.
>>
>> Module-wide configuration can be implemented as variables, as long as they
>> are declared before the functions using them.
>> If considering a package rather than a single module, options can be stored
>> in a module dedicated to options (since Python modules are singletons).
>>
> Hi Laurent,
>
> I really like the idea of a configuration module. I will definitely
> move in that direction.
>
>>> With configuration items like this, you have two choices:
>>>
>>> - A global variable.
>>> - Pass the configuration to every function that needs it.
>>>
>>> There are tradeoffs with both approaches, but for this case I agree with
>>> your decision to use globals. Most people will want 0-based/Biopython
>>> style but it gives those who don't a knob to switch over.
>>
>> I'd argue that allowing to switch is an invitation to spectacular issues
>> down the road.
>> An easy, yet frightening, example would be the case where using third-party
>> code (such a module) changes this without you knowing.
>>
>> An other scary thought is that this would amount to bringing the infamous
>> Perl variable "$[" to Python. Go explain again that folks should Python for
>> its elegance and simplicity after that.
>>
>>
> Yikes. My approach will not be comparable to $[. For starters, it
> wouldn't modify the behavior of every sequence-like object.
>
> My current thought would be to store the 0-based position in an
> attribute `pos`, have a property `pos_str` that returns `pos` +
> `Config.index`. For representations, `__str__` will return `pos_str`,
> and `__repr__` will return `pos` (always 0-based). Math would always
> use the 0-based position.
>
> I intend to keep the influence of the hypothetical mapping Config
> module limited to Biopython Seq* objects. It should also be possible
> to make a kill switch, namely, a version of the Config module where
> all of the settings are neutral to adding (i.e. `def __add__(self,
> other): return other`).

What about making the design decision that string representations that 
are 1-based then, and go beyond making a kill switch by just kill the 
switch ? You'd document it, folks that want 0-based positions would cook 
their own function(s).

I think that configuration modules can be very useful for an application 
(an example here:
http://flask.pocoo.org/snippets/2/ ), but I am more reserved about its 
use in a library.

But do not let me stop you from pursuing this; I am only expressing an 
opinion. One last point though.
Let me describe a possible scenario:

3rd-party module "foo" is using the Biopython Seq* part, and its author 
thinks that Config.index should at 1 one, so he/she sets it accordingly.
An early line in foo.py is:
from somewhere.in.biopython.seq import config
config.index = 1

There is an other piece of code (let's call it bar.py), written by 
someone else or by the same person at a different time. Now the hype is 
all about 0-based indexes, so the author sets it to be sure:
from somewhere.in.biopython.seq import config
config.index = 0

To complete the scenario bar.py is using foo.py, or the other way 
around. The requirement for one an other does not even have to be 
direct. Now config.index will be what the last piece of code sets it to, 
although other parts of the code might assume it is set to something else.

That sort of situation is not prevented from happening with any sort of 
module in Python (e.g., import sys; sys.stdout = sys.stderr), but people 
know they should not do it. Here the config.index would appear as 
something people should change if they like.

Again, that's just an opinion. Others might differ.

Best,


Laurent

>
> Please let me know if this would not fully address your concerns.
>
> Cheers,
>
> Lenna


From arklenna at gmail.com  Wed Aug  8 18:39:48 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 8 Aug 2012 18:39:48 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <5022E102.9010509@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
	<5022E102.9010509@gmail.com>
Message-ID: <CALfq9tJMTUc_FJ9tTv=AR2WN+GuVoiGg3tbhN2fZKzr0DfZYXg@mail.gmail.com>

On Wed, Aug 8, 2012 at 5:58 PM, Laurent Gautier <lgautier at gmail.com> wrote:
> On 2012-08-08 20:44, Lenna Peterson wrote:
>>
>> On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com>
>> wrote:
>>>
>>> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>>>
>>>>>> * In order to customize the display of positions (e.g. 0-based or
>>>>>> 1-based), I'm using a class as a configuration container. I've read on
>>>>>> StackOverflow that attempts to use globals or a singleton class are
>>>>>> discouraged in Python, but I have not found practical suggestions for
>>>>>> how to implement module-wide configurations. Suggestions are welcome.
>>>
>>>
>>> Module-wide configuration can be implemented as variables, as long as
>>> they
>>> are declared before the functions using them.
>>> If considering a package rather than a single module, options can be
>>> stored
>>> in a module dedicated to options (since Python modules are singletons).
>>>
>> Hi Laurent,
>>
>> I really like the idea of a configuration module. I will definitely
>> move in that direction.
>>
>>>> With configuration items like this, you have two choices:
>>>>
>>>> - A global variable.
>>>> - Pass the configuration to every function that needs it.
>>>>
>>>> There are tradeoffs with both approaches, but for this case I agree with
>>>> your decision to use globals. Most people will want 0-based/Biopython
>>>> style but it gives those who don't a knob to switch over.
>>>
>>>
>>> I'd argue that allowing to switch is an invitation to spectacular issues
>>> down the road.
>>> An easy, yet frightening, example would be the case where using
>>> third-party
>>> code (such a module) changes this without you knowing.
>>>
>>> An other scary thought is that this would amount to bringing the infamous
>>> Perl variable "$[" to Python. Go explain again that folks should Python
>>> for
>>> its elegance and simplicity after that.
>>>
>>>
>> Yikes. My approach will not be comparable to $[. For starters, it
>> wouldn't modify the behavior of every sequence-like object.
>>
>> My current thought would be to store the 0-based position in an
>> attribute `pos`, have a property `pos_str` that returns `pos` +
>> `Config.index`. For representations, `__str__` will return `pos_str`,
>> and `__repr__` will return `pos` (always 0-based). Math would always
>> use the 0-based position.
>>
>> I intend to keep the influence of the hypothetical mapping Config
>> module limited to Biopython Seq* objects. It should also be possible
>> to make a kill switch, namely, a version of the Config module where
>> all of the settings are neutral to adding (i.e. `def __add__(self,
>> other): return other`).
>
>
> What about making the design decision that string representations that are
> 1-based then, and go beyond making a kill switch by just kill the switch ?
> You'd document it, folks that want 0-based positions would cook their own
> function(s).
>
> I think that configuration modules can be very useful for an application (an
> example here:
> http://flask.pocoo.org/snippets/2/ ), but I am more reserved about its use
> in a library.
>
> But do not let me stop you from pursuing this; I am only expressing an
> opinion. One last point though.
> Let me describe a possible scenario:
>
> 3rd-party module "foo" is using the Biopython Seq* part, and its author
> thinks that Config.index should at 1 one, so he/she sets it accordingly.
> An early line in foo.py is:
> from somewhere.in.biopython.seq import config
> config.index = 1
>
> There is an other piece of code (let's call it bar.py), written by someone
> else or by the same person at a different time. Now the hype is all about
> 0-based indexes, so the author sets it to be sure:
> from somewhere.in.biopython.seq import config
> config.index = 0
>
> To complete the scenario bar.py is using foo.py, or the other way around.
> The requirement for one an other does not even have to be direct. Now
> config.index will be what the last piece of code sets it to, although other
> parts of the code might assume it is set to something else.
>
> That sort of situation is not prevented from happening with any sort of
> module in Python (e.g., import sys; sys.stdout = sys.stderr), but people
> know they should not do it. Here the config.index would appear as something
> people should change if they like.
>
> Again, that's just an opinion. Others might differ.
>
> Best,
>
>
> Laurent
>
>
>>
>> Please let me know if this would not fully address your concerns.
>>
>> Cheers,
>>
>> Lenna
>
>


Laurent,

I must thank you again for your foresight. I am realizing I may have
gotten carried away with configurability. My initial goal with the
index setting was to enable both GenBank and HGVS representations of
genomic positions; a much simpler and safer approach would be to have
`to_genbank()` and `to_hgvs()` methods. A user could set the relevant
objects' __str__ to either of those.

Cheers,

Lenna

From p.j.a.cock at googlemail.com  Thu Aug  9 05:07:15 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 9 Aug 2012 10:07:15 +0100
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <5022E102.9010509@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
	<5022E102.9010509@gmail.com>
Message-ID: <CAKVJ-_5sZX3YjORj_4q9wAnyAQ0us-kzZGA3gvstf=cYBdZeTw@mail.gmail.com>

On Wed, Aug 8, 2012 at 10:58 PM, Laurent Gautier <lgautier at gmail.com> wrote:
>
> What about making the design decision that string representations that are
> 1-based then, and go beyond making a kill switch by just kill the switch ?
> You'd document it, folks that want 0-based positions would cook their own
> function(s).
>
> I think that configuration modules can be very useful for an application ...

I agree that a module level config setting is unwise.

However, I'd much prefer the string representation was 0-based for
consistency, both internal to the module and with most of Biopython.
(The restriction module uses 1-based counting which I find very
annoying.)

You could still provide something like a format method to give a
string in common representations (e.g. GenBank/EMBL/INSDC
style location strings).

Peter

From mjldehoon at yahoo.com  Thu Aug  9 07:07:20 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 9 Aug 2012 04:07:20 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
Message-ID: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi guys,

In the Motif class in Bio.Motif._Motif, there is an attribute self.has_instances to identify whether the attributes self.instances is defined. I think that we can remove the self.has_instances attribute from the code and simply set self.instances=None when it is undefined. Same thing for self.counts and self.has_counts.
Any objections?

Best,
-Michiel.

From bartek at rezolwenta.eu.org  Thu Aug  9 08:26:33 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 9 Aug 2012 14:26:33 +0200
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>

On Thu, Aug 9, 2012 at 1:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi guys,
>
> In the Motif class in Bio.Motif._Motif, there is an attribute self.has_instances to identify whether the attributes self.instances is defined. I think that we can remove the self.has_instances attribute from the code and simply set self.instances=None when it is undefined. Same thing for self.counts and self.has_counts.
> Any objections?

Makes sense to me. +1

-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Thu Aug  9 12:00:14 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 9 Aug 2012 09:00:14 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
Message-ID: <1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>

OK, done. Thanks!
-Michiel.

--- On Thu, 8/9/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Thursday, August 9, 2012, 8:26 AM
> On Thu, Aug 9, 2012 at 1:07 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi guys,
> >
> > In the Motif class in Bio.Motif._Motif, there is an
> attribute self.has_instances to identify whether the
> attributes self.instances is defined. I think that we can
> remove the self.has_instances attribute from the code and
> simply set self.instances=None when it is undefined. Same
> thing for self.counts and self.has_counts.
> > Any objections?
> 
> Makes sense to me. +1
> 
> -- 
> Bartek Wilczynski
> 

From tiagoantao at gmail.com  Thu Aug  9 23:04:53 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 9 Aug 2012 20:04:53 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Linux 64
	- Python 2.7
Message-ID: <CAA9RGENG=-Qbi549wkWxdU7FLpsjTsoVtDnDSsAVceYfYZJiRQ@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Fri Aug 10 04:33:43 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 10 Aug 2012 09:33:43 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
	<1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>

On Thu, Aug 9, 2012 at 5:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> OK, done. Thanks!
> -Michiel.

You'll also need to update the example in the Tutorial, quote:

    The arnt and srf motifs can both do the same things for us,
    but they use different internal representations of the motif.
    We can tell that by inspecting the \verb|has_counts| and
    has_instances properties:

    >>> arnt.has_instances
   True
    >>> srf.has_instances
    False
    >>> srf.has_counts
    True

This means test_Tutorial.py is failing (across all platforms).
Presumably we would suggest switching these to somethinglike:

    >>> arnt.instances is None
    False

etc? In fact given the old methods were documents like
this, I would be happier if we could phase them out with
a deprecation warning via a read only property method,

    @property
    def has_instances(self):
        """"Does this motif have instances (DEPRECATED)."""
        import warnings
        from Bio import BiopythonDeprecationWarning
        warnings.warn("Check if motif.instance is None or not instead",
            BiopythonDeprecationWarning)
        return self.instances is not None

(untested, but something like that)

Peter

From p.j.a.cock at googlemail.com  Fri Aug 10 16:04:54 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 10 Aug 2012 21:04:54 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>
References: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
	<1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>
Message-ID: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>

On Fri, Aug 10, 2012 at 9:33 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 9, 2012 at 5:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> OK, done. Thanks!
>> -Michiel.
>
> You'll also need to update the example in the Tutorial, quote:
>
>     The arnt and srf motifs can both do the same things for us,
>     but they use different internal representations of the motif.
>     We can tell that by inspecting the \verb|has_counts| and
>     has_instances properties:
>
>     >>> arnt.has_instances
>    True
>     >>> srf.has_instances
>     False
>     >>> srf.has_counts
>     True
>
> This means test_Tutorial.py is failing (across all platforms).
> Presumably we would suggest switching these to somethinglike:
>
>     >>> arnt.instances is None
>     False

Fixed:
https://github.com/biopython/biopython/commit/b866e74dc9b6162517588ea4c0e4d1ecde5ed87c

> etc? In fact given the old methods were documents like
> this, I would be happier if we could phase them out with
> a deprecation warning via a read only property method,
>
>     @property
>     def has_instances(self):
>         """"Does this motif have instances (DEPRECATED)."""
>         import warnings
>         from Bio import BiopythonDeprecationWarning
>         warnings.warn("Check if motif.instance is None or not instead",
>             BiopythonDeprecationWarning)
>         return self.instances is not None
>
> (untested, but something like that)

Done:
https://github.com/biopython/biopython/commit/fd2223d118227c921524e070c803b97bc979a70f

Although since that won't work on old Biopython either (you'd
get an AttributeError), perhaps we should label these new
backwards compatible properties as obsolete with a pending
deprecation warning for the next release (delay the deprecation)?

Peter

From mjldehoon at yahoo.com  Fri Aug 10 23:48:29 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Aug 2012 20:48:29 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>
Message-ID: <1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Peter,

--- On Fri, 8/10/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > This means test_Tutorial.py is failing (across all
> platforms).
> > Presumably we would suggest switching these to
> somethinglike:
> >
> >? ???>>> arnt.instances is
> None
> >? ???False
> 
> Fixed:
> https://github.com/biopython/biopython/commit/b866e74dc9b6162517588ea4c0e4d1ecde5ed87c

Thanks for fixing this! Sorry I missed to do this when I was making these changes.

> Although since that won't work on old Biopython either
> (you'd
> get an AttributeError), perhaps we should label these new
> backwards compatible properties as obsolete with a pending
> deprecation warning for the next release (delay the
> deprecation)?
> 

I think we are being way too careful. Requiring proper deprecation warnings each time we make a change in Biopython will slow down its development and improvement. In the past when making changes to the existing code, we have gotten very few complaints; also in this case I doubt that anybody will miss has_counts, has_instances.

Best,
-Michiel.


From mjldehoon at yahoo.com  Sat Aug 11 00:25:05 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Aug 2012 21:25:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif AlignAce parser
Message-ID: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi guys,

Looking some more at the parsers in Bio.Motif.

In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.

I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.

The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.

The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.

Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.

This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>> from Bio.Motif.Parsers import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
If we move the parsers one level up, this would be
>>> from Bio.Motif import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.AlignAce.Record object at 0x10058c7d0>
which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.

Any comments, objections?

Best,
-Michiel.

From p.j.a.cock at googlemail.com  Sat Aug 11 06:50:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 11 Aug 2012 11:50:07 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>
	<1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_70nRt1oxF4M8HAaxrARgW7BSenZQZntOAgicXBCmZbxg@mail.gmail.com>

On Saturday, August 11, 2012, Michiel de Hoon wrote:

> Hi Peter,
>
> > Although since that won't work on old Biopython either
> > (you'd
> > get an AttributeError), perhaps we should label these new
> > backwards compatible properties as obsolete with a pending
> > deprecation warning for the next release (delay the
> > deprecation)?
> >
>
> I think we are being way too careful. Requiring proper deprecation
> warnings each time we make a change in Biopython will slow down its
> development and improvement. In the past when making changes to the
> existing code, we have gotten very few complaints; also in this case I
> doubt that anybody will miss has_counts, has_instances.
>
> Best,
> -Michiel.
>

In this case you're probably right about it not causing too much
inconvenience - this is a relatively new module after all.

Peter

From arklenna at gmail.com  Mon Aug 13 01:00:41 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Mon, 13 Aug 2012 01:00:41 -0400
Subject: [Biopython-dev] GSoC python variant update 10
Message-ID: <CALfq9tLZzvgL8pK=2sA_KRCJMwANMqbXjDxZxDp=FU5026GCqA@mail.gmail.com>

Link: http://arklenna.tumblr.com/post/29317968106/

Post:

Following extensive
[discussion](http://biopython.org/pipermail/biopython-dev/2012-August/009849.html)
on the dev list of the pros and cons of configuration classes/modules,
I have refactored my [coordinate
mapper](https://gist.github.com/3172753) to keep configuration as
isolated as possible.

All mapping functions use base 0 internally. Transformation to and
from 1-based coords is allowed by custom MapPosition objects. (they
are currently separate from the  Seq* positions but could probably
subclass ExactPosition). The MapPosition objects have to_dialect and
from_dialect methods that automatically handle conversion between
bases and other formatting details.

There are two different ways a user can convert a coordinate from HGVS:

    # ... assuming cm is an instance of CoordinateMapper
    # Manually construct position from HGVS
    CDS_coord = CDSPosition.from_hgvs("6+1")
    genomic_coord = cm.c2g(CDS_coord)
    print genomic_coord.to_hgvs()

    # Pass dialect argument to mapping function
    genomic_coord = cm.c2g("6+1", dialect="HGVS")
    print genomic_coord.to_hgvs()

Furthermore, the inheritance hierarchy is designed to allow a user to
set a default string representation:

    # Set MapPositions to print as HGVS by default
    def use_hgvs(self):
        return str(self.to_hgvs())
    MapPosition.__str__ = use_hgvs

The [version](https://gist.github.com/3172753/577b7c383e057b78cdcee64be33f18117a46faaf)
as of this writing is passing tests using base 0. I have not yet
implemented tests for `from_hgvs` or `to_hgvs`, but that's next on my
list. I'm hoping to have time for strand and mixed strand, too.


Cheers,

Lenna

From bartek at rezolwenta.eu.org  Mon Aug 13 09:12:35 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 13 Aug 2012 15:12:35 +0200
Subject: [Biopython-dev] Bio.Motif AlignAce parser
In-Reply-To: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CABHxouWmVRddd-Yz09m-B52oKi+jxKk+XJCU=vfDU_xOGtCRtw@mail.gmail.com>

Sounds great to me.

Bartek

On Sat, Aug 11, 2012 at 6:25 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi guys,
>
> Looking some more at the parsers in Bio.Motif.
>
> In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.
>
> I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.
>
> The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.
>
> The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.
>
> Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.
>
> This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>>> from Bio.Motif.Parsers import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
> If we move the parsers one level up, this would be
>>>> from Bio.Motif import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.AlignAce.Record object at 0x10058c7d0>
> which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.
>
> Any comments, objections?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
Bartek Wilczynski


From arnaud.poret at gmail.com  Mon Aug 13 10:07:39 2012
From: arnaud.poret at gmail.com (Arnaud Poret)
Date: Mon, 13 Aug 2012 16:07:39 +0200
Subject: [Biopython-dev] obo parser
Message-ID: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>

Hi everyone,

I'm a newcomer and I'm writing an obo parser for importing ontologies
into python. I'm not sure, but has already BioPython an obo parser?

If yes, I'm reinventing the wheel...

If no, I'll be glad to contribute.

From tiagoantao at gmail.com  Mon Aug 13 23:23:01 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 13 Aug 2012 20:23:01 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows
	XP - Python 2.5
Message-ID: <CAA9RGEP7OuWO7cEAiya46d6nJFj2ukVPwKednQoSQbUhHKeOPg@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Tue Aug 14 07:06:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Aug 2012 12:06:32 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
Message-ID: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>

On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 2, 2012 at 8:42 AM, Leighton Pritchard wrote:
>>Peter wrote:
>>>
>>> To match the current sigil argument names BOX and ARROW, I have
>>> provisionally called BIGARROW. Any better ideas?
>>>
>>
>> BIGARROW sounds fine to me. I like literal names.
>>
>
> Great. Checked into the master, and I updated the Tutorial and
> the Proux et al 2002 Figure 6 reproduction example to use this:
>
> Before (cross-links with strand specific ARROW sigil):
> http://biopython.org/DIST/docs/tutorial/images/three_track_cl2.png
>
> After (cross-links with strand straddling BIGARROW sigil):
> http://biopython.org/DIST/docs/tutorial/images/three_track_cl2a.png
>
> Original (I don't know what was used to draw this):
> http://dx.doi.org/10.1128/JB.184.21.6026-6036.2002
>
> Regards,
>
> Peter

Further to that work, I updated some older code for a JAGGY
sigil, and also an OCTO sigil (names open to suggestions),
which are on my gd-sigils branch which has documentation
in the tutorial, including this image of the expanded sigil set:
https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png

This is a slight simplification of the old JAGGY code in that it
does (yet) allow control of the teeth length (e.g. to have just
teeth on one end). I am thinking this could be exposed like
the existing arrow specific options.

I originally created the JAGGY sigil for marking a break point
in a contig/scaffold. For instance, you might want to mark a
run of NNNNN bases in a scaffold with a jaggy sigil (straddling
both strands) as a clear visual marker to explain why there
were no genes.

Other sigil ideas I pondered include an OVAL, which should
be quite easy for the linear diagrams, but rather more work to
implement for circular diagrams due to the distorted curves.

Peter

From p.j.a.cock at googlemail.com  Tue Aug 14 15:49:23 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Aug 2012 20:49:23 +0100
Subject: [Biopython-dev] GSoC SearchIO project
In-Reply-To: <87lim4h07o.fsf@fastmail.fm>
References: <1334014051.14489.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<87lim4h07o.fsf@fastmail.fm>
Message-ID: <CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>

On Tue, Apr 10, 2012 at 1:58 AM, Brad Chapman wrote:
> Michiel;
>> Hi Eric, Peter,
>>
>> > How about Bio.Search, for now?
>>
>> I would prefer Bio.Pairwise or Bio.Align.Pairwise, since that tells
>> users something about what the module is for. Bio.Search could be
>> anything (search PubMed? search the Entrez databases? search Google?
>> anyway Bio.Search does not suggest that this module is about pairwise
>> alignments). But Peter previously mentioned that he doesn't like
>> Bio.Pairwise; can we convince you?
>
> I agree with Peter on this one. The module is primarily about searching
> a sequence database with an input via multiple methods, not about
> pairwise alignment of two sequences with is what Bio.Align.Pairwise
> suggests to me.
>
> Brad

On potential problem with Bio.Search (on top of concerns raised
here about vagueness) Bow and I were just talking about during
our weekly GSoC video call was the existence of Bio/Search.py
which is obsolete and long overdue for removal. I have just
deprecated it (something I forgot to do before the last release):
https://github.com/biopython/biopython/commit/5a275ccd1df3def40df1eef517af755d373dadd8

We'd earlier talked about using Bio.Search as the namespace. I was
worried about the potential existence on a user's machine of both
Bio/Search.py (the old obsolete code) and Bio/Search/__init__.py
(aka SearchIO, the new module) and which would take precedence
when doing: from Bio import Search

Given how Python module installations work, that seems highly
likely to occur. The good news is that the package would take
priority - see http://www.python.org/doc/essays/packages.html

>>>> What If I Have a Module and a Package With The Same Name?
>>>>
>>>> You may have a directory (on sys.path) which has both a module
>>>> spam.py and a subdirectory spam that contains an __init__.py
>>>> (without the __init__.py, a directory is not recognized as a package).
>>>> In this case, the subdirectory has precedence, and importing spam
>>>> will ignore the spam.py file, loading the package spam instead. If
>>>> you want the module spam.py to have precedence, it must be
>>>> placed in a directory that comes earlier in sys.path.

So there is no technical reason to avoid Bio.Search as an
option for the Bio.SearchIO namespace. We could then
have Bio.Search.Applications for command line wrappers,
consistent with Bio.Phylo.Applications, Bio.Motif.Applications
and Bio.Align.Applications.

Of course, Bio.Search is still perhaps too broad a name... but
on balance perhaps it is still better than Bio.SearchIO?

Regards,

Peter

From tiagoantao at gmail.com  Tue Aug 14 16:39:12 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 14 Aug 2012 21:39:12 +0100
Subject: [Biopython-dev] jython/testing
Message-ID: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>

Hi,

I have been trying to use biopython with jython 2.7 alpha 2. Here
follows a report.


There are still a few problems (with SeqIO only):
test_SeqIO ... ERROR
test_SeqIO_QualityIO ... FAIL
test_SeqIO_index ... FAIL

The errors are something like (all the same kind of stuff really):

SeqIO
======================================================================
ERROR: test_SeqIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 341, in runTest
    suite = unittest.TestLoader().loadTestsFromName(name)
  File "/home/tr353/local/jython/Lib/unittest/loader.py", line 91, in
loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/home/tr353/local/jython/Lib/unittest/loader.py", line 91, in
loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 627, in <module>
    check_simple_write_read(records)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 352, in
check_simple_write_read
    records2 = list(SeqIO.parse(handle=handle, format=format))
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 352, in
check_simple_write_read
    records2 = list(SeqIO.parse(handle=handle, format=format))
  File "/home/tr353/tmp/biopython/Bio/SeqIO/__init__.py", line 537, in parse
    for r in i:
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 828, in SffIterator
    header_length, index_offset, index_length, number_of_reads, \
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 285, in
_sff_file_header
    magic_number, ver0, ver1, ver2, ver3, index_offset, index_length, \
error: unpack str size does not match format


SeqIO_QualityIO
======================================================================
ERROR: test_E3MFGYR02 (test_SeqIO_QualityIO.TestWriteRead)
Write and read back E3MFGYR02_random_10_reads.sff
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
551, in test_E3MFGYR02
    self.check(os.path.join("Roche", "E3MFGYR02_random_10_reads.sff"), "sff",
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
477, in check
    write_read(filename, format, f)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
52, in write_read
    records2 = list(SeqIO.parse(handle,out_format))
  File "/home/tr353/tmp/biopython/Bio/SeqIO/__init__.py", line 537, in parse
    for r in i:
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 828, in SffIterator
    header_length, index_offset, index_length, number_of_reads, \
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 285, in
_sff_file_header
    magic_number, ver0, ver1, ver2, ver3, index_offset, index_length, \
error: unpack str size does not match format

SeqIO.index
======================================================================
ERROR: test_sff_Roche_greek_sff_get_raw (test_SeqIO_index.IndexDictTests)
Index sff file Roche/greek.sff get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_index.py", line
430, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_index.py", line
301, in get_raw_check
    rec2 = SeqIO.SffIO._sff_read_seq_record(handle,
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 561, in
_sff_read_seq_record
    read_header_length, name_length, seq_len, clip_qual_left, \
error: unpack str size does not match format


I suppose this is because of issues with the alpha version of jython 2.7.

Tiago
PS - I do not have all external dependencies installed on my machine,
so a few modules are untested.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin

From p.j.a.cock at googlemail.com  Wed Aug 15 07:18:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Aug 2012 12:18:50 +0100
Subject: [Biopython-dev] jython/testing
In-Reply-To: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
References: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
Message-ID: <CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>

On Tue, Aug 14, 2012 at 9:39 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I have been trying to use biopython with jython 2.7 alpha 2. Here
> follows a report.
>
>
> There are still a few problems (with SeqIO only):
> test_SeqIO ... ERROR
> test_SeqIO_QualityIO ... FAIL
> test_SeqIO_index ... FAIL
>
> The errors are something like (all the same kind of stuff really):
>
> ...

I see that on my machine too. From looking at the tracebacks and
the associated code, the failures all involve BytesIO (or StringIO
depending on the Python version). Note that BytesIO is new in
Python 2.6, and thus also new in Jython 2.7 compared to Jython 2.5.

This is enough to demonstrate a bug in Jython 2.7a2, which explains
some if not all of our unit test failures:

Expected behaviour:

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import BytesIO
>>> raw = open("Roche/E3MFGYR02_random_10_reads.sff", "rb").read()
>>> raw == BytesIO(raw).read()
True
>>> len(raw)
17592
>>> quit()

Broken behaviour:

$ ~/jython2.7a2/jython
Jython 2.7a2 (default:9c148a201233, May 24 2012, 15:49:00)
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_33
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import BytesIO
>>> raw = open("Roche/E3MFGYR02_random_10_reads.sff", "rb").read()
>>> raw == BytesIO(raw).read()
False
>>> len(raw)
17592
>>> len(BytesIO(raw).read())
51577
>>> BytesIO(raw).read()[:100]
"bytearray(b'.sff\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00A\\xb8\\x00\\x00\\x02\\xfc\\x00\\x00\\x00\\n\\x01\\xb8\\"
>>> raw[:100]
'.sff\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00A\xb8\x00\x00\x02\xfc\x00\x00\x00\n\x01\xb8\x00\x04\x01\x90\x01TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT'
>>> quit()


I will report this.

Peter


From p.j.a.cock at googlemail.com  Wed Aug 15 07:26:19 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Aug 2012 12:26:19 +0100
Subject: [Biopython-dev] jython/testing
In-Reply-To: <CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>
References: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
	<CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>
Message-ID: <CAKVJ-_48f_q0AGstNRDzTuv3QDMJxHrzeZ5wfSC-Hcash8E3pw@mail.gmail.com>

On Wed, Aug 15, 2012 at 12:18 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Aug 14, 2012 at 9:39 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>> Hi,
>>
>> I have been trying to use biopython with jython 2.7 alpha 2. Here
>> follows a report.
>>
>>
>> There are still a few problems (with SeqIO only):
>> test_SeqIO ... ERROR
>> test_SeqIO_QualityIO ... FAIL
>> test_SeqIO_index ... FAIL
>>
>> The errors are something like (all the same kind of stuff really):
>>
>> ...
>
> I see that on my machine too. From looking at the tracebacks and
> the associated code, the failures all involve BytesIO (or StringIO
> depending on the Python version). Note that BytesIO is new in
> Python 2.6, and thus also new in Jython 2.7 compared to Jython 2.5.
>
> This is enough to demonstrate a bug in Jython 2.7a2, which explains
> some if not all of our unit test failures:
>
> ...
>
> I will report this.

Filed as http://bugs.jython.org/issue1959 with a shorter test case.

Peter


From arklenna at gmail.com  Thu Aug 16 21:58:46 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 16 Aug 2012 21:58:46 -0400
Subject: [Biopython-dev] GSoC Python variant (penultimate) update
Message-ID: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>

Post: http://arklenna.tumblr.com/post/29592108099/

I have been considering how to handle gene strandedness. As long as
I'm correctly interpreting the following position, my coordinate
mapper should produce the correct coordinates with negative strand or
mixed strand features.

GenBank: join(complement(25..30), 36..40)
Biopython: FeatureLocation(24, 30, -1) + FeatureLocation(35, 40)

(please click through to post for monospaced font)

    23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
       <----------------                ------------->
       5  4  3  2  1  0                 6  7  8  9  10

I have to admit that it wasn't until I read a BioStar
[post](http://biostars.org/post/show/3423/forward-and-reverse-strand-conventions/)
earlier this week that I fully understood the relationship between
plus/minus forward/reverse sense/antisense coding/template strands. So
please let me know as soon as possible if I've made a mistake in the
above code.

`c2g` yields the correct genome position, but not the strand. I still
need to integrate strand information into my `GenomePosition` object
and/or partially merge it with `ExactLocation`. This weekend I intend
to expand documentation and write a brief cookbook entry.

Cheers,

Lenna

From arnaud.poret at gmail.com  Fri Aug 17 03:38:28 2012
From: arnaud.poret at gmail.com (Arnaud Poret)
Date: Fri, 17 Aug 2012 09:38:28 +0200
Subject: [Biopython-dev] obo parser
Message-ID: <CACYTbORMZHX9vP5DfpamzikCKcP15M8OLqqGpNshViHRb-qPLw@mail.gmail.com>

Hi everyone,

I'm a newcomer and I'm writing an obo parser for importing ontologies
into python.

I'm not sure, but has already BioPython an obo parser?

If yes, I'm reinventing the wheel...

If no, I'll be glad to contribute.

Arnaud.

From p.j.a.cock at googlemail.com  Fri Aug 17 04:15:10 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 09:15:10 +0100
Subject: [Biopython-dev] obo parser
In-Reply-To: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>
References: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>
Message-ID: <CAKVJ-_68nurF0d6OvS=ne5MDLANkCNRXQfvZFfqHwqH-ZqOFLw@mail.gmail.com>

On Mon, Aug 13, 2012 at 3:07 PM, Arnaud Poret <arnaud.poret at gmail.com> wrote:
> Hi everyone,
>
> I'm a newcomer and I'm writing an obo parser for importing ontologies
> into python. I'm not sure, but has already BioPython an obo parser?
>
> If yes, I'm reinventing the wheel...
>
> If no, I'll be glad to contribute.

There does seem to be interest, questions about ontologies,
GO and OBO crop up every so often. There were some
people actually working on this too, but it has gone quiet.

e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009384.html
http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009031.html

Chris Lasher's repository has vanished, but Eric's older
work is still online (CC'd):
https://github.com/kellrott/biopython/tree/gosupport

Eric & Chris - where do things stand?

Regards,

Peter

From p.j.a.cock at googlemail.com  Fri Aug 17 04:21:01 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 09:21:01 +0100
Subject: [Biopython-dev] [GSoC] GSoC Python variant (penultimate) update
In-Reply-To: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>
References: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>
Message-ID: <CAKVJ-_5dxeVMvpemNDfUUSQLzTba9Bpt=LgwTLBzPRy-EtVavA@mail.gmail.com>

On Fri, Aug 17, 2012 at 2:58 AM, Lenna Peterson <arklenna at gmail.com> wrote:
>
> I have to admit that it wasn't until I read a BioStar
> [post](http://biostars.org/post/show/3423/forward-and-reverse-strand-conventions/)
> earlier this week that I fully understood the relationship between
> plus/minus forward/reverse sense/antisense coding/template strands. So
> please let me know as soon as possible if I've made a mistake in the
> above code.

Given this is nice and fresh in your mind, can you suggest any
clarifications to the Biopython Tutorial section talking about
this issue? The section on transcription & translation starting:

"Before talking about transcription, I want to try and clarify
the strand issue. Consider the following (made up) stretch
of double stranded DNA which encodes a short peptide: ..."

Hmm. That should probably say "I want to try to clarify...".

Peter

From p.j.a.cock at googlemail.com  Fri Aug 17 12:42:57 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 17:42:57 +0100
Subject: [Biopython-dev] BioSQL tests
Message-ID: <CAKVJ-_6b3FBfncEz_u16xKhKL4fAdKvRfFM1==8CRs-HHedOfg@mail.gmail.com>

Dear all,

I realised this week that I didn't have a working BioSQL
test setup under either MySQL or PostgreSQL, and
the buildbot machines are not testing these either.
Therefore I have re-factored the BioSQL unit tests as
follows:

First I turned my print-and-compare test_BioSQL_SeqIO.py
script into proper UnitTest based tests, so that all the BioSQL
tests could be combined in one file, test_BioSQL.py.

This allowed a further reorganisation to allow any one
machine to test all the supported back ends one after
the other - previously the setup only tested one backend
(defaulting to SQLite3). We now have three test scripts
named after the backend library used to connect to the
database:

test_BioSQL_MySQLdb.py
test_BioSQL_psycopg2.py
test_BioSQL_sqlite3.py

Subsequently I modified our TravisCI configuration to
install the required dependencies to run all these tests.
The default usernames and passwords for MySQLdb
and postgresql are set to match those under TravisCI.
Local users would probably have to adjust these values
(in the same way they used to prior to the refactoring).

Note that psycopg2 only works on C Python 2 & 3 for
now (there is a PyPy alternative I have not looked into).

MySQLdb only works on C Python 2 (there is a problem
installing it under Python 3.2).

This did show I'd broken using BioSQL under MySQLdb,
at least under this particular version, fixed now:
https://github.com/biopython/biopython/commit/4a67d851d1eda0a138b604c8aeffc151d331a29b

So the good news is that now TravisCI will run the
BioSQL tests on all three database backends, on
several versions of Python (but just on Linux).
http://travis-ci.org/biopython/biopython/

What I have not addressed is if/how we should deal
with test database setting under buildbot - perhaps by
environment variable overrides?

If anyone would like to look into using MySQLdb and/or
psycopg2 under PyPy and Jython, that would also be
useful too.

Regards,

Peter

From arklenna at gmail.com  Mon Aug 20 00:22:36 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Mon, 20 Aug 2012 00:22:36 -0400
Subject: [Biopython-dev] GSoC python variant final update
Message-ID: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>

Post: http://arklenna.tumblr.com/post/29808300789/

The coordinate mapper, with updated documentation, is now located on
this branch: https://github.com/lennax/biopython/tree/f_loc4
It awaits the merging of Peter's f_loc4 branch.

I've written an entry on coordinate mapping for the Cookbook:
http://biopython.org/wiki/Coordinate_mapping

Additionally, at Peter's suggestion, I've written a clarification of
strand as it relates to transcription and translation. It's available
here: https://docs.google.com/document/d/11R7EOJXn90lN5_SmaPOyN5rFfPQybbCbUBo6EY0R0pA/edit

It's been a great experience working with this project this summer.
Thank you to everyone involved.

Cheers,

Lenna

From mjldehoon at yahoo.com  Mon Aug 20 08:38:37 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 20 Aug 2012 05:38:37 -0700 (PDT)
Subject: [Biopython-dev] Bio.Cluster in the main Biopython documentation
Message-ID: <1345466317.39160.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Dear all,

Previously the documentation for Bio.Cluster was only available as a separate PDF on the Biopython website. I have now integrated this documentation into the Biopython Tutorial. The new tutorial is already uploaded to the repository, and will be visible at http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html once the nightly build is done.

Since the documentation for Bio.Cluster contains many references to the literature, I started using the LaTeX \cite command, which are understood and formatted properly by Hevea. While at it, I also converted the references I could find in other parts of the Tutorial to \cite references. This creates a list of references at the end of the Tutorial. Please let us know if you don't like this approach.

The documentation for Bio.Cluster is fairly long, and while modifying it for inclusion into the Tutorial some mistakes may have crept in. Please let me know if you find any such mistakes (or feel free to fix them yourself, if it is clear what the text should be).

For now we can leave the PDF with the separate description of Bio.Cluster on the website as is for users of Biopython 1.60, but once the next version of Biopython is out I would like to replace it with a PDF referring to the main Tutorial.

Thanks,
-Michiel.

From chapmanb at 50mail.com  Mon Aug 20 08:45:49 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 20 Aug 2012 08:45:49 -0400
Subject: [Biopython-dev] [GSoC] GSoC python variant final update
In-Reply-To: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
References: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
Message-ID: <87harxzq82.fsf@fastmail.fm>


Lenna;
Thanks for the documentation and getting that all code moved into a
branch. This looks great and looking forward to having it merged when
Peter's work goes in.

Thanks also for all the great work this summer and good luck on the
first day of PhD school,
Brad


> Post: http://arklenna.tumblr.com/post/29808300789/
>
> The coordinate mapper, with updated documentation, is now located on
> this branch: https://github.com/lennax/biopython/tree/f_loc4
> It awaits the merging of Peter's f_loc4 branch.
>
> I've written an entry on coordinate mapping for the Cookbook:
> http://biopython.org/wiki/Coordinate_mapping
>
> Additionally, at Peter's suggestion, I've written a clarification of
> strand as it relates to transcription and translation. It's available
> here: https://docs.google.com/document/d/11R7EOJXn90lN5_SmaPOyN5rFfPQybbCbUBo6EY0R0pA/edit
>
> It's been a great experience working with this project this summer.
> Thank you to everyone involved.
>
> Cheers,
>
> Lenna
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc

From redmine at redmine.open-bio.org  Tue Aug 21 06:27:14 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:27:14 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] (New) PDBParser fails to
	parse PDBs produced by PatchDock
Message-ID: <redmine.issue-3379.20120821102714@redmine.open-bio.org>


Issue #3379 has been reported by David Cain.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 06:27:14 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:27:14 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] (New) PDBParser fails to
	parse PDBs produced by PatchDock
Message-ID: <redmine.issue-3379.20120821102714@redmine.open-bio.org>


Issue #3379 has been reported by David Cain.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 06:36:07 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:36:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14935.20120821103607@redmine.open-bio.org>


Issue #3379 has been updated by Peter Cock.


If as I understood you, PatchDock is producing invalid PDB files, have you raised the issue with them too?

I accept that out of practicality, a little lenience in our parsers can be helpful, and may be appropriate in this case. Do you have any sample data files you could share - for example a valid PDB file before processing, and the problematic PDB file after processing with PatchDock?
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 07:08:53 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:08:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14936.20120821110853@redmine.open-bio.org>


Issue #3379 has been updated by Jo?o Rodrigues.


Disclaimer: I am a HADDOCK team member and therefore in direct competition with PATCHDOCK.

I totally disagree with this. This is not compliant with the PDB format at all: "Each file should terminate with a line containing only the word END". Having data beyond END is just bad practice in my opinion. There are two statements to close a chain/model - ENDMDL and TER - and these should be used. Sorry to be a pain, but if we are fixing this it's just encouraging a bad practice.. standards are there to be respected.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 07:21:57 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:21:57 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14937.20120821112157@redmine.open-bio.org>


Issue #3379 has been updated by Peter Cock.


Given Joao's comments, lenience does not sound appropriate in this case.

If the parser's current behaviour is to silently ignore data after an END line, that seems less than ideal.

How about we add a clear error/warning to the parser if there is content in the file after an END line? i.e. Treat it as an exception in strict mode, treat it as a warning in permissive mode (and continue to ignore anything after the END line)?

A sample file would be helpful to verify this, and could even be used for a unit test (with your permission).
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 07:26:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:26:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14938.20120821112648@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.


I completely agree with Jo?o, actually- disrespecting the file spec is a bad idea. I just figured I'd bring this to discussion.

I very much think a warning of some sort should be raised, though. Half the structure silently failing to parse is a big problem. I think your solution is perfect, and I'd be very happy to write the unit test.

I'll upload a sample file in just a bit.
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 08:05:37 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 12:05:37 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14939.20120821120537@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.

File complex.1.pdb added

I ran PatchDock's antigen-antibody complex mode on an antigen and antibody file (2fgw and 5ebx) that individually parse without warnings. (Note that I chose these files at random; their docking is useful only as an example).

I've attached the complex file produced by @PatchDock/transOutput.pl@) (only the top-scoring conformation considered). As you can see, the @CONECT@ and @END@ records of the antibody will stop the rest of the file from being parsed.

I'd be happy to take a stab at writing the error/warning message for premature @END@/@CONECT@ records in addition to the unit test that checks for this behavior.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 08:35:07 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 12:35:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14940.20120821123507@redmine.open-bio.org>


Issue #3379 has been updated by Jo?o Rodrigues.


Agreed with Peter that it should raise an exception/warning. This is really pure concatenation of the two PDBs.. If you could have a go at it, I could test it too. Thanks David.
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Tue Aug 21 12:01:21 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 21 Aug 2012 18:01:21 +0200
Subject: [Biopython-dev] GSoC SearchIO project
In-Reply-To: <CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>
References: <1334014051.14489.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<87lim4h07o.fsf@fastmail.fm>
	<CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>
Message-ID: <CADEGkF4URxn5zwXOwU1J6s21U22aLwTdUw3aU6G0=MRt+LbfOA@mail.gmail.com>

On Tue, Aug 14, 2012 at 9:49 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Apr 10, 2012 at 1:58 AM, Brad Chapman wrote:
>> Michiel;
>>> Hi Eric, Peter,
>>>
>>> > How about Bio.Search, for now?
>>>
>>> I would prefer Bio.Pairwise or Bio.Align.Pairwise, since that tells
>>> users something about what the module is for. Bio.Search could be
>>> anything (search PubMed? search the Entrez databases? search Google?
>>> anyway Bio.Search does not suggest that this module is about pairwise
>>> alignments). But Peter previously mentioned that he doesn't like
>>> Bio.Pairwise; can we convince you?
>>
>> I agree with Peter on this one. The module is primarily about searching
>> a sequence database with an input via multiple methods, not about
>> pairwise alignment of two sequences with is what Bio.Align.Pairwise
>> suggests to me.
>>
>> Brad
>
> On potential problem with Bio.Search (on top of concerns raised
> here about vagueness) Bow and I were just talking about during
> our weekly GSoC video call was the existence of Bio/Search.py
> which is obsolete and long overdue for removal. I have just
> deprecated it (something I forgot to do before the last release):
> https://github.com/biopython/biopython/commit/5a275ccd1df3def40df1eef517af755d373dadd8
>
> We'd earlier talked about using Bio.Search as the namespace. I was
> worried about the potential existence on a user's machine of both
> Bio/Search.py (the old obsolete code) and Bio/Search/__init__.py
> (aka SearchIO, the new module) and which would take precedence
> when doing: from Bio import Search
>
> Given how Python module installations work, that seems highly
> likely to occur. The good news is that the package would take
> priority - see http://www.python.org/doc/essays/packages.html
>
>>>>> What If I Have a Module and a Package With The Same Name?
>>>>>
>>>>> You may have a directory (on sys.path) which has both a module
>>>>> spam.py and a subdirectory spam that contains an __init__.py
>>>>> (without the __init__.py, a directory is not recognized as a package).
>>>>> In this case, the subdirectory has precedence, and importing spam
>>>>> will ignore the spam.py file, loading the package spam instead. If
>>>>> you want the module spam.py to have precedence, it must be
>>>>> placed in a directory that comes earlier in sys.path.
>
> So there is no technical reason to avoid Bio.Search as an
> option for the Bio.SearchIO namespace. We could then
> have Bio.Search.Applications for command line wrappers,
> consistent with Bio.Phylo.Applications, Bio.Motif.Applications
> and Bio.Align.Applications.
>
> Of course, Bio.Search is still perhaps too broad a name... but
> on balance perhaps it is still better than Bio.SearchIO?
>
> Regards,
>
> Peter

Hi everyone,

If I may add my two cents, for now I am in favor of putting the module
under Bio.Search. It is not the best name out there (it does sound a
bit vague), but it's the one that seem to be the most intuitive (until
a better alternative comes out). There were some other alternatives
that I and Peter have discussed, but they seem less appealing for us.
You're free to add your thoughts on these of course :) :

- Bio.SeqSearch. This sounds ok, but when you consider we have
Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, and Bio.SeqUtils, it becomes
quite confusing quickly.

- Bio.PSearch ('p' for pairwise). This one seemed the less intuitive
among the three options, so I'm not so big on this.

For now, I'm still writing everything (code, docstrings, tutorial)
using SearchIO. I suppose it's better if we could agree on a more
suitable name, though.

On another note, I'm also in favor of using the Bio.Phylo module
skeleton for Bio.SearchIO / Bio.Search. We may then group all sequence
search-related application wrappers under Applications (I actually
prefers 'app' for better PEP8 compliance, but that's another
discussion) and perhaps even refactor our remote search calls (e.g.
the 'qblast' module) under Bio.Search as well.

cheers,
Bow

From w.arindrarto at gmail.com  Tue Aug 21 12:09:07 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 21 Aug 2012 18:09:07 +0200
Subject: [Biopython-dev] GSoC Project Update -- 10
In-Reply-To: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
References: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
Message-ID: <CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>

Hi everyone,

I've just posted my last entry for my Google Summer of Code project
this year: http://bow.web.id/blog/2012/08/summers-over/

I want to say thank you to the Biopython community, especially Peter
for mentoring me this summer :), to OBF for accepting my proposal, and
to anyone who has helped and given me valuable inputs for me
throughout the project :).

It's been a priceless learning experience, and I only hope that my
code will be useful in return.

There are still some things to do before the code is merge-ready and
even more when the code is included in an official release, so I'll
still be around.

cheers,
Bow

From mictadlo at gmail.com  Tue Aug 21 20:55:30 2012
From: mictadlo at gmail.com (Mic)
Date: Wed, 22 Aug 2012 10:55:30 +1000
Subject: [Biopython-dev] [BioRuby] Final GSoC report
In-Reply-To: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
References: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
Message-ID: <CAOP6n=j9HzYzfDewwYjBWPUXkDt0JY-5_=OS90ta3h-y-Uhbag@mail.gmail.com>

Hi,
Python is able to connect to D with help of http://pyd.dsource.org/ .

Maybe it would be something for Biopython

Cheers,
Mic

On Wed, Aug 22, 2012 at 5:11 AM, Marjan Povolni <marian.povolny at gmail.com>wrote:

> http://blog.mpthecoder.com/post/29910330225/final-gsoc-report
>
> *Summary*
>
> Yesterday I tagged the 0.4 release of gff3-pltools, and that marks the end
> of the summer. At least in GSoC terms. Should I say end of the project? I
> don?t think so. The tools can still be improved, and the Ruby bindings
> should follow.
>
> The major changes since the last release include the following:
>
>    - filtering functionality has been moved to a separate utility:
>    gff3-filter, along with a new language for specifying filtering
> expressions,
>    - conversion to table format of selected fields has been moved to a
>    separate utility: gff3-select. However, the ?select option is still
> part of
>    gff3-filter,
>    - gff3-ffetch is now fetching FASTA sequences from GFF3 and FASTA files
>    for CDS and mRNA records and features,
>    - man pages for utilities.
>
> **
> The original idea was to create a GFF3/GTF parser in D and Ruby bindings.
> The Ruby bindings part didn?t work out because there is still no support
> for D shared libraries in Linux, but instead there are now a few useful
> command-line tools for processing GFF3 which can be used without
> programming knowledge.
>
> To me, the summer was fun, challenging, and a great experience. I even got
> to meet my mentor in person, and other community members too, and to make
> my first steps in bioinformatics. I even gave a small presentation at the
> EU-codefest. What a summer it was!
>
> Thanks to everybody who made it possible: Google, Open Bioinformatics
> Foundation and my mentor Pjotr Prins.
>
> --
> Marjan
>
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>


From p.j.a.cock at googlemail.com  Wed Aug 22 04:42:03 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 09:42:03 +0100
Subject: [Biopython-dev] [GSoC] GSoC Project Update -- 10
In-Reply-To: <CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>
References: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
	<CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>
Message-ID: <CAKVJ-_4ttoJscDgAe57tqo_p=LpEaKNs0CazTZV_0MjpfkkQHw@mail.gmail.com>

On Tue, Aug 21, 2012 at 5:09 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> I've just posted my last entry for my Google Summer of Code project
> this year: http://bow.web.id/blog/2012/08/summers-over/
>
> I want to say thank you to the Biopython community, especially Peter
> for mentoring me this summer :), to OBF for accepting my proposal, and
> to anyone who has helped and given me valuable inputs for me
> throughout the project :).
>
> It's been a priceless learning experience, and I only hope that my
> code will be useful in return.
>
> There are still some things to do before the code is merge-ready and
> even more when the code is included in an official release, so I'll
> still be around.
>
> cheers,
> Bow

Thank you Bow,

It has been a pleasure to mentor you, and I'm excited about
getting this (and Lenna's and other branches) into Biopython.

Now, back to the module naming discussion... ;)
http://lists.open-bio.org/pipermail/biopython-dev/2012-August/009868.html
http://lists.open-bio.org/pipermail/biopython-dev/2012-August/009888.html

Peter

From p.j.a.cock at googlemail.com  Wed Aug 22 07:07:11 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 12:07:11 +0100
Subject: [Biopython-dev] Beta code in the official releases?
Message-ID: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>

Hi all,

One of the ideas I discussed with Bow during this GSoC
project was introducing a new warning, something like
Bio.BiopythonBetaCode (the exact name isn't important),
to be used to label new experimental modules for which
we *expect* there to be changes in the next release.

The idea is to combine the simplicity of distribution and
installation of the 'monolithic' Biopython library with some
of the flexibility offered by a more modular approach.
This would be particularly helpful for those on Windows,
where installing a Biopython branch from git is quite a
daunting task.

The idea is that in one of the next releases you'd be able
to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
and see something like this:

>>> from Bio import SearchIO
Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
beta, and likely to change
  warnings.warn("Bio.SearchIO is in beta, and likely to change",
BiopythonBetaCode)

By using a specific warning class, any keen beta tester can
silence all the BiopythonBetaCode warnings if they wished to.

Is anyone familiar enough with Linux packaging polices to
have any thoughts on how they would treat this? Provided
we only use this for self contained modules, they could
potentially split the beta-modules into a sub-package (in the
same way that Biopython and its BioSQL support are split
in Debian).

I envision using this as a way to encourage wider 'beta testing'
of self contained modules which are close to a stable release.
Does anyone think this is a good idea? Are there any downsides
I'm overlooking?

Thanks,

Peter

From p.j.a.cock at googlemail.com  Wed Aug 22 07:10:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 12:10:56 +0100
Subject: [Biopython-dev] [BioRuby] [GSoC] Final GSoC report
In-Reply-To: <20120822104352.GA11847@thebird.nl>
References: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
	<CAOP6n=j9HzYzfDewwYjBWPUXkDt0JY-5_=OS90ta3h-y-Uhbag@mail.gmail.com>
	<CAE8u=e7f=ZkTWhxL0VgCanoKkq4kPHzWA-dwmnbqfNqUMFeP5A@mail.gmail.com>
	<20120822104352.GA11847@thebird.nl>
Message-ID: <CAKVJ-_4ehZR-vW6otB8=cxrgQ1TGLsG5+X1+qOrT=UAk9=XfVQ@mail.gmail.com>

On Wed, Aug 22, 2012 at 11:43 AM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> Yes, linking to D from an interpreted language is not hard, basically
> it is the same calling convention as that of C. So a D shared library
> looks the same as a C shared library to the calling code - all
> existing foreign function interfaces (FFI) work. That is the good
> news.

How do things stand from a cross-platform perspective?
i.e. When might this be doable on Linux, Mac OS X, and Windows?
(and other Unix like platforms of potential interest)

> The bad news, as Artem points out, is that there is a problem in the
> D garbage collector. Items get collected, which should not. This will
> be fixed sooner or later. The commitment is there, and it is moving
> up the priority list.

Is there a D issue/bug tracker for this?

Thanks,

Peter

From chapmanb at 50mail.com  Wed Aug 22 20:42:09 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 22 Aug 2012 20:42:09 -0400
Subject: [Biopython-dev] Beta code in the official releases?
In-Reply-To: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>
References: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>
Message-ID: <877gsq8mn2.fsf@fastmail.fm>


Peter;
+1. I'm for making the process of getting new code into
Biopython a bit quicker and this seems like a nice step in that
direction. With code has been well designed tested and documented, this
will help speed the transition into releases and get more eyes on it
quicker, while allowing some potential breaking changes as beta
functionality gets finalized.

Thanks for the good suggestion,
Brad


> Hi all,
>
> One of the ideas I discussed with Bow during this GSoC
> project was introducing a new warning, something like
> Bio.BiopythonBetaCode (the exact name isn't important),
> to be used to label new experimental modules for which
> we *expect* there to be changes in the next release.
>
> The idea is to combine the simplicity of distribution and
> installation of the 'monolithic' Biopython library with some
> of the flexibility offered by a more modular approach.
> This would be particularly helpful for those on Windows,
> where installing a Biopython branch from git is quite a
> daunting task.
>
> The idea is that in one of the next releases you'd be able
> to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
> and see something like this:
>
>>>> from Bio import SearchIO
> Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
> beta, and likely to change
>   warnings.warn("Bio.SearchIO is in beta, and likely to change",
> BiopythonBetaCode)
>
> By using a specific warning class, any keen beta tester can
> silence all the BiopythonBetaCode warnings if they wished to.
>
> Is anyone familiar enough with Linux packaging polices to
> have any thoughts on how they would treat this? Provided
> we only use this for self contained modules, they could
> potentially split the beta-modules into a sub-package (in the
> same way that Biopython and its BioSQL support are split
> in Debian).
>
> I envision using this as a way to encourage wider 'beta testing'
> of self contained modules which are close to a stable release.
> Does anyone think this is a good idea? Are there any downsides
> I'm overlooking?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From redmine at redmine.open-bio.org  Mon Aug 27 00:24:16 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 27 Aug 2012 04:24:16 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14944.20120827042416@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.


Regarding "pure concatenation," I wasn't exaggerating when I said really ugly Perl scripts. =)

I created a "pull request on the Biopython GitHub repository":https://github.com/biopython/biopython/pull/60. Could you give me some feedback on my solution? If the devs agree on a certain behavior, I'll start writing some unit tests.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From Andrew.Sczesnak at med.nyu.edu  Wed Aug 29 13:54:08 2012
From: Andrew.Sczesnak at med.nyu.edu (Sczesnak, Andrew)
Date: Wed, 29 Aug 2012 17:54:08 +0000
Subject: [Biopython-dev] Beta code in the official releases?
In-Reply-To: <877gsq8mn2.fsf@fastmail.fm>
References: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>,
	<877gsq8mn2.fsf@fastmail.fm>
Message-ID: <1F36894B170C114F9C902C20BC5129981AD23835@MSGWCDCPMB25.nyumc.org>

+1

It's been over a year since I first submit my MAF code!
________________________________________
From: biopython-dev-bounces at lists.open-bio.org [biopython-dev-bounces at lists.open-bio.org] on behalf of Brad Chapman [chapmanb at 50mail.com]
Sent: Wednesday, August 22, 2012 8:42 PM
To: Peter Cock; Biopython-Dev Mailing List
Subject: Re: [Biopython-dev] Beta code in the official releases?

Peter;
+1. I'm for making the process of getting new code into
Biopython a bit quicker and this seems like a nice step in that
direction. With code has been well designed tested and documented, this
will help speed the transition into releases and get more eyes on it
quicker, while allowing some potential breaking changes as beta
functionality gets finalized.

Thanks for the good suggestion,
Brad


> Hi all,
>
> One of the ideas I discussed with Bow during this GSoC
> project was introducing a new warning, something like
> Bio.BiopythonBetaCode (the exact name isn't important),
> to be used to label new experimental modules for which
> we *expect* there to be changes in the next release.
>
> The idea is to combine the simplicity of distribution and
> installation of the 'monolithic' Biopython library with some
> of the flexibility offered by a more modular approach.
> This would be particularly helpful for those on Windows,
> where installing a Biopython branch from git is quite a
> daunting task.
>
> The idea is that in one of the next releases you'd be able
> to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
> and see something like this:
>
>>>> from Bio import SearchIO
> Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
> beta, and likely to change
>   warnings.warn("Bio.SearchIO is in beta, and likely to change",
> BiopythonBetaCode)
>
> By using a specific warning class, any keen beta tester can
> silence all the BiopythonBetaCode warnings if they wished to.
>
> Is anyone familiar enough with Linux packaging polices to
> have any thoughts on how they would treat this? Provided
> we only use this for self contained modules, they could
> potentially split the beta-modules into a sub-package (in the
> same way that Biopython and its BioSQL support are split
> in Debian).
>
> I envision using this as a way to encourage wider 'beta testing'
> of self contained modules which are close to a stable release.
> Does anyone think this is a good idea? Are there any downsides
> I'm overlooking?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Thu Aug 30 04:16:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 09:16:13 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
Message-ID: <CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>

On Tue, Aug 14, 2012 at 12:06 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Further to that work, I updated some older code for a JAGGY
> sigil, and also an OCTO sigil (names open to suggestions),
> which are on my gd-sigils branch which has documentation
> in the tutorial, including this image of the expanded sigil set:
> https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png
>
> This is a slight simplification of the old JAGGY code in that it
> does (yet) allow control of the teeth length (e.g. to have just
> teeth on one end). I am thinking this could be exposed like
> the existing arrow specific options.
>
> I originally created the JAGGY sigil for marking a break point
> in a contig/scaffold. For instance, you might want to mark a
> run of NNNNN bases in a scaffold with a jaggy sigil (straddling
> both strands) as a clear visual marker to explain why there
> were no genes.
>
> Other sigil ideas I pondered include an OVAL, which should
> be quite easy for the linear diagrams, but rather more work to
> implement for circular diagrams due to the distorted curves.
>
> Peter

Do people think (either of) these two sigils are worth adding
to the main branch?

Potentially they can be generalised - the JAGGY sigil in
particular would be much more flexible if the head & tail
teeth presence (or tooth length?) could be controlled. e.g.
to draw a sigil with a flat edge on the left, and a jagged
edge on the right.

Peter

From Leighton.Pritchard at hutton.ac.uk  Thu Aug 30 04:51:50 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Thu, 30 Aug 2012 08:51:50 +0000
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
Message-ID: <E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>


On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:

On Tue, Aug 14, 2012 at 12:06 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:
On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:

Further to that work, I updated some older code for a JAGGY
sigil, and also an OCTO sigil (names open to suggestions),
which are on my gd-sigils branch which has documentation
in the tutorial, including this image of the expanded sigil set:
https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png


[?]

Do people think (either of) these two sigils are worth adding
to the main branch?

Yes - I do.

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796

From p.j.a.cock at googlemail.com  Thu Aug 30 06:18:57 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 11:18:57 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
	<E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
Message-ID: <CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>

On Thu, Aug 30, 2012 at 9:51 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>
> On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:
>> Do people think (either of) these two sigils are worth adding
>> to the main branch?
>
> Yes - I do.
>
> L.

Done. Branch rebased and applied to master.

Peter

From p.j.a.cock at googlemail.com  Thu Aug 30 07:46:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 12:46:05 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
	<E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
	<CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>
Message-ID: <CAKVJ-_70t56rm+7LYcC4TU6jhdFK_UzULgSOhS44=_iEisbY5A@mail.gmail.com>

On Thu, Aug 30, 2012 at 11:18 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 30, 2012 at 9:51 AM, Leighton Pritchard
> <Leighton.Pritchard at hutton.ac.uk> wrote:
>>
>> On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:
>>> Do people think (either of) these two sigils are worth adding
>>> to the main branch?
>>
>> Yes - I do.
>>
>> L.
>
> Done. Branch rebased and applied to master.
>
> Peter

And you can see the example in the Tutorial here,
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#sec:gd_sigils

(These sigils all work on circular diagrams too, see
the examples made by test_GenomeDiagram.py)

Peter

From zcharlop at mail.rockefeller.edu  Wed Aug  1 00:37:27 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Wed, 1 Aug 2012 00:37:27 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
Message-ID: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>

Hello Biopython,

I am writing about a small feature that I would like to see implemented
(and could possibly help to implement it: I haven't contributed before and
am not sure exactly how tough this will be).  When using Genome Diagram to
draw features you can specify which strand to put a feature on. If the
strand is positive it will go above the track in the positive-facing
direction and if negative it will go below the track in the negative facing
direction. (seehttp://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc200) . That's a
great behavior. However if you use  strand="None", Genome Diagram will draw
the features inline with the track and always in the positive direction.
For myself, and probably others, keeping the direction of the features is
immensely useful as you can often get a sense of operon structure in
prokaryote genomes just by looking at the genes. Of course the forward and
the minus strands can be drawn but condensing small sections of genes to a
single track saves space when making images.

So, would it be possible to change the default behavior of Genome Diagram
to draw features inline (strand="None"), but to preserve their orientation?

best,
zach cp


From p.j.a.cock at googlemail.com  Wed Aug  1 09:27:14 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 10:27:14 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
Message-ID: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>

On Wed, Aug 1, 2012 at 1:37 AM, Zachary Charlop-Powers wrote:
> Hello Biopython,
>
> I am writing about a small feature that I would like to see implemented
> (and could possibly help to implement it: I haven't contributed before and
> am not sure exactly how tough this will be).  When using Genome Diagram to
> draw features you can specify which strand to put a feature on. If the
> strand is positive it will go above the track in the positive-facing
> direction and if negative it will go below the track in the negative facing
> direction. (seehttp://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc200) . That's a
> great behavior.

Yep - all fine so far.

> However if you use  strand="None", Genome Diagram will draw
> the features inline with the track and always in the positive direction.
> For myself, and probably others, keeping the direction of the features is
> immensely useful as you can often get a sense of operon structure in
> prokaryote genomes just by looking at the genes. Of course the forward and
> the minus strands can be drawn but condensing small sections of genes to a
> single track saves space when making images.
>
> So, would it be possible to change the default behavior of Genome Diagram
> to draw features inline (strand="None"), but to preserve their orientation?

I think I know what you mean - that kind of picture is quite common
e.g. for viruses - but only where there are no overlapping genes
on opposite strands. GenomeDiagram was written originally primarily
for bacteria, were overlapping genes on opposite strands are more
common, which may explain the design choices made.

Currently strand controls both orientation (for arrows, no effect on
box sigils) and vertical placement (above, below, or straddling the
line). Basically you want to override the vertical placement only?
Note this is sigil dependent - it makes sense for the arrow, but not
the default box (which was originally the only sigil supported).

The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

The question is how to most cleanly expose this to the user while
not breaking anything else (e.g. cross links), and ideally allow for
a related option which Leighton and I have considered (but not
had a pressing need to implement) for frame specific placement.
i.e. Rather than treating the vertical drawing spaces as two regions
(above the axis line for the forward strand, below the line for the
reverse strand), treat it as six regions (three frames above and
below the axis line). I'm picturing something a bit like the view
in the Artemis annotation editor.

One question which constrains this design choice is would you
want to mix these placements on the same track? I think yes -
using plain strandless BOX features (at the bottom of the z-order
stack) is a really useful way to to highlight a region of interest
(which could have multiple genes drawn on top of it).

That suggests this setting might be best at the GenomeDiagram
feature level. Perhaps a new attribute/argument 'strand_mode',
(a) ignore strand for vertical placement (what you want)
(b) divide vertical space in two (current behaviour)
(c) divide vertical space in six (frame specific placement)

Hmm. Leighton?

Peter

P.S. Frame specific placement would work best with an
overhaul of how we draw multi-fragment features like genes
with exons. Here a whole new sigil class for linking sub-parts
of a feature might make sense. That is again something we
only chatted about so far, but would make GenomeDiagram
more useful for drawing eukaryotic annotation.


From p.j.a.cock at googlemail.com  Wed Aug  1 10:43:59 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 11:43:59 +0100
Subject: [Biopython-dev] back_table in Bio.Data.CodonTable
In-Reply-To: <CAKN_8Do6cHn8ZJyriNppqSyoRGRiY-JYiLzwxEyd1jh=PqHx7Q@mail.gmail.com>
References: <CAKN_8DrVV1RfWHXb4bJHF9LtrmnxKya4Y0r=iG6GMEafemUUrA@mail.gmail.com>
	<CAKVJ-_7prWz2M3OCGHYrk4a2P0GZYxSyJfacHjHovwQ2K3HNEA@mail.gmail.com>
	<CAKN_8Do6cHn8ZJyriNppqSyoRGRiY-JYiLzwxEyd1jh=PqHx7Q@mail.gmail.com>
Message-ID: <CAKVJ-_7daCMtiYULJjEcv-mzpKgC7pMAMpBdW3N75M104k4VNg@mail.gmail.com>

On Tue, Jul 31, 2012 at 8:07 PM, Jeff Hussmann <jeff.hussmann at gmail.com> wrote:
> It seems desirable to have each amino acid's list of codons be given
> in a deterministic order. I have been sorting lexicographically using
> the ordering 'TCAG'. This is referred to as the 'conventional
> ordering' in CodonTable.__str__.

Lexical sorting (i.e. using Python's sort on a list of codons) seems best,
it is simple and predictable.

> The most flexible solution would be
> to take the ordering from self.nucleotide_alphabet.letters, but this
> would give 'GATC' for any CodonTable using IUPAC.unambiguous_dna as
> its nucleotide alphabet. Are there any Biopython-wide conventions
> here?

I'm not sure why the alphabets used that particular order over another.

Peter


From Leighton.Pritchard at hutton.ac.uk  Wed Aug  1 10:53:19 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Wed, 1 Aug 2012 10:53:19 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
Message-ID: <089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>

Hi all,

On 1 Aug 2012, at Wednesday, August 1, 10:27, Peter Cock wrote:

On Wed, Aug 1, 2012 at 1:37 AM, Zachary Charlop-Powers wrote:
However if you use  strand="None", Genome Diagram will draw
the features inline with the track and always in the positive direction.
For myself, and probably others, keeping the direction of the features is
immensely useful as you can often get a sense of operon structure in
prokaryote genomes just by looking at the genes.

That's true. I find it easiest to identify operon structure in that way (i.e. visually and approximately) by noting where the features swap between positive and negative strands. Other approaches might include colouring positive/negative/None strand features differently.

Of course the forward and
the minus strands can be drawn but condensing small sections of genes to a
single track saves space when making images.

It doesn't, if the single track is the same height as before - what differs is the whether the features on that track are half, or full, track height.

So, would it be possible to change the default behavior of Genome Diagram
to draw features inline (strand="None"), but to preserve their orientation?

I think there's a better way to get what you're after. Changing the default setting here would modify more than whether the arrow spans the whole track, and it would also mean that GenomeDiagram does not respect the strand data of features by default. I think that's a bad thing.

I think I know what you mean - that kind of picture is quite common
e.g. for viruses - but only where there are no overlapping genes
on opposite strands. GenomeDiagram was written originally primarily
for bacteria, were overlapping genes on opposite strands are more
common, which may explain the design choices made.

My original choice was made for a combination of reasons:

- I wanted to respect the strand information in the source data
- The 'box' sigil was easiest to draw, and was the first to be available (this carries no inherent directional information as an image)

The overlapping gene issue is relevant but, since the resolution of a drawn image is often such that boxes slightly overlap even when there is no feature overlap, it didn't feature in my consideration.

Currently strand controls both orientation (for arrows, no effect on
box sigils) and vertical placement (above, below, or straddling the
line). Basically you want to override the vertical placement only?
Note this is sigil dependent - it makes sense for the arrow, but not
the default box (which was originally the only sigil supported).

That's how I understand Zachary's suggestion: to draw an arrow with orientation preserved, but across the positive and negative strands of the track.

The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

The question is how to most cleanly expose this to the user while
not breaking anything else (e.g. cross links), and ideally allow for
a related option which Leighton and I have considered [?]

My original plan was to have more sigils available, implemented as draw_X() functions in the AbstractDrawer module. This would seem to be a good case for a draw_large_arrow() (or somesuch) function. The issue then would be a slight change to the prototypes for the existing draw_box and draw_arrow functions. Basically, we'd pass the overall bounding box and strand (x0, x1, btm, ctr, top, strand) information to the new functions, and let them decide where to place the sigil - above, below, or straddling the centre line.

Then, we could choose whether draw_arrow() takes an additional argument (e.g. straddle=True) for the behaviour that Zachary wants, or whether we use a new sigil ('large_arrow'), which could have its own function - just like that of draw_arrow() - but would probably be better implemented by just passing the straddle=True (or whatever) argument.

This way, the change is transparent to the user, except for perhaps choosing 'large_arrow' rather than 'arrow' as a sigil.

That suggests this setting might be best at the GenomeDiagram
feature level. Perhaps a new attribute/argument 'strand_mode',
(a) ignore strand for vertical placement (what you want)
(b) divide vertical space in two (current behaviour)
(c) divide vertical space in six (frame specific placement)

Hmm. Leighton?

I'm choosing to leave frame-specificity out of the discussion, for now ;)

Cheers,

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796


From p.j.a.cock at googlemail.com  Wed Aug  1 11:05:51 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 12:05:51 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
Message-ID: <CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>

On Wed, Aug 1, 2012 at 11:53 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>
> It doesn't, if the single track is the same height as before - what differs
> is the whether the features on that track are half, or full, track height.

Yes, but once you've configured the arrows to straddle the axis, you can
then allocate less vertical space to that track. i.e. it needs less space.

>> The question is how to most cleanly expose this to the user while
>> not breaking anything else (e.g. cross links), and ideally allow for
>> a related option which Leighton and I have considered [?]
>
> My original plan was to have more sigils available, implemented as draw_X()
> functions in the AbstractDrawer module. This would seem to be a good case
> for a draw_large_arrow() (or somesuch) function. The issue then would be a
> slight change to the prototypes for the existing draw_box and draw_arrow
> functions. Basically, we'd pass the overall bounding box and strand (x0, x1,
> btm, ctr, top, strand) information to the new functions, and let them decide
> where to place the sigil - above, below, or straddling the centre line.
>
> Then, we could choose whether draw_arrow() takes an additional argument
> (e.g. straddle=True) for the behaviour that Zachary wants, or whether we use
> a new sigil ('large_arrow'), which could have its own function - just like
> that of draw_arrow() - but would probably be better implemented by just
> passing the straddle=True (or whatever) argument.
>
> This way, the change is transparent to the user, except for perhaps choosing
> 'large_arrow' rather than 'arrow' as a sigil.

That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

Peter


From Leighton.Pritchard at hutton.ac.uk  Wed Aug  1 11:23:48 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Wed, 1 Aug 2012 11:23:48 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<089BCE07-D9CB-4657-800C-8E0ACABED1A9@hutton.ac.uk>
	<CAKVJ-_467QvsoA1ank0ArnE3Ef3BoV_ctz=e-d1dknBaGYPaHA@mail.gmail.com>
Message-ID: <93ED1DEB-1C9B-4D34-A898-D326ED5F8C2F@hutton.ac.uk>


On 1 Aug 2012, at Wednesday, August 1, 12:05, Peter Cock wrote:

On Wed, Aug 1, 2012 at 11:53 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk<mailto:Leighton.Pritchard at hutton.ac.uk>> wrote:

It doesn't, if the single track is the same height as before - what differs
is the whether the features on that track are half, or full, track height.

Yes, but once you've configured the arrows to straddle the axis, you can
then allocate less vertical space to that track. i.e. it needs less space.

I understand that  - and maybe I'm being (over) pedantic - but you can allocate less vertical space to the track in either case: the question is what kind of feature representation gives you the desired information legibly at those settings ;)

That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

This option gets my vote.

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796


From zcharlop at mail.rockefeller.edu  Wed Aug  1 14:27:32 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Wed, 1 Aug 2012 14:27:32 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
Message-ID: <5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>

Leighton,
Peter,

I love that we're not in the same timezone; I ask a question when I leave work and - lo,and, behold - when I return in the morning there is a well thought out response.  Thank you both.


The good news is the underlying drawing code can do this - the
arrow drawing is just given a bounding box and the requested
orientation (left or right) argument set by the get_feature_sigil
method of the LinearDrawer or CircularDrawer.

If you need this right now, a careful hack in get_feature_sigil is
the way to proceed.

I will take a look at this for a quick hack for some drawing I am working on.


That was another idea I was considering. Under this model, the sigils
could be given the full strand straddling bounding box, and decide if
they will use all of this (i.e. the new 'large_arrow', or the current sigils
when strand-less), or just half as in the stranded current 'arrow' and
'box' sigils where the strand is known.

That could work quite well, and the end user API is quite clean.

This option gets my vote.

L.


If you are both in agreement that this option is desirable and that it can be implemented in the sigil style, now we face the question of coding it. Would either of you consider working on it? If not this might be a problem I could tackle with a small amount of mentoring. Please let me know - I am happy to take a stab at it.

best regards,
zach cp


From p.j.a.cock at googlemail.com  Wed Aug  1 17:15:31 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 18:15:31 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
Message-ID: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>

On Wed, Aug 1, 2012 at 3:27 PM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu> wrote:
> Leighton,
> Peter,
>
> I love that we're not in the same timezone; I ask a question when I leave
> work and - lo,and, behold - when I return in the morning there is a well
> thought out response.  Thank you both.

:)

Peter wrote:
>>> The good news is the underlying drawing code can do this - the
>>> arrow drawing is just given a bounding box and the requested
>>> orientation (left or right) argument set by the get_feature_sigil
>>> method of the LinearDrawer or CircularDrawer.
>>>
>>> If you need this right now, a careful hack in get_feature_sigil is
>>> the way to proceed.

Zachary wrote:
> I will take a look at this for a quick hack for some drawing I am
> working on.

I hope you found any effort spent useful for understanding the
codebase... even if it doesn't turn out to be needed (see below).

Peter wrote:
>>> That was another idea I was considering. Under this model, the sigils
>>> could be given the full strand straddling bounding box, and decide if
>>> they will use all of this (i.e. the new 'large_arrow', or the current sigils
>>> when strand-less), or just half as in the stranded current 'arrow' and
>>> 'box' sigils where the strand is known.
>>>
>>> That could work quite well, and the end user API is quite clean.

Leighton wrote:
>> This option gets my vote.
>>
>> L.

Zachary wrote:
> If you are both in agreement that this option is desirable and that it can
> be implemented in the sigil style, now we face the question of coding it.
> Would either of you consider working on it? If not this might be a problem I
> could tackle with a small amount of mentoring. Please let me know - I am
> happy to take a stab at it.

I had a go this afternoon (a quite moment between rushes - grin), and
it wasn't as bad as I feared. This is on a git branch at the moment,

https://github.com/peterjc/biopython/tree/gd-big

Thus far, just two commits. The first refactors the current code to move
the strand handling into the sigil code (but should, I hope, have no
side effects):

https://github.com/peterjc/biopython/commit/d9c416be7dd2c7081bd66bd553c9feb0174ecc13

The second commit implements the new axis straddling arrow (for
both linear and circular diagrams) plus a minimal test:

https://github.com/peterjc/biopython/commit/b58903d5c455416028a8ae410b2063d536448d59

To match the current sigil argument names BOX and ARROW, I have
provisionally called BIGARROW. Any better ideas?

Also, to match the current arrow's behaviour, strand-less features
get an arrow pointing to the right (like a forward strand arrow).
Leighton and I had a little debate about this - with hindsight, the
original arrow sigil might have raised an error or drawn a box in
this situation - but I'm not willing to change this and break existing
code.

It would be great if you (Zachary) could give this a test, both to look
for regressions (anything that broke) and try the new sigil out. Are
you familiar with git, and installing Biopython from source?

Regards,

Peter


From zcharlop at mail.rockefeller.edu  Wed Aug  1 22:10:55 2012
From: zcharlop at mail.rockefeller.edu (Zachary Charlop-Powers)
Date: Wed, 1 Aug 2012 22:10:55 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
Message-ID: <4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>

Peter wrote:
It would be great if you (Zachary) could give this a test, both to look
for regressions (anything that broke) and try the new sigil out. Are
you familiar with git, and installing Biopython from source?


Just reran my previous image-generation scripts with your BioPython. I used sigil="BIGARROW" instead of "ARROW" and it worked like a charm.  Awesome.

Would you want to add the "BIGARROW" option to the tutorial?


best,
zach cp


From p.j.a.cock at googlemail.com  Wed Aug  1 22:33:14 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 1 Aug 2012 23:33:14 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
	<4A304FFF-48C9-42D8-9A26-1A9FBDA9AAA7@rockefeller.edu>
Message-ID: <CAKVJ-_7i+20yAtHjnLjjFGAsM0smTY4RGdmNhi1Ws19NL7nrZA@mail.gmail.com>

On Wed, Aug 1, 2012 at 11:10 PM, Zachary Charlop-Powers wrote:
>> Peter wrote:
>>
>> It would be great if you (Zachary) could give this a test, both to look
>> for regressions (anything that broke) and try the new sigil out. Are
>> you familiar with git, and installing Biopython from source?
>>
>
> Just reran my previous image-generation scripts with your BioPython.
> I used sigil="BIGARROW" instead of "ARROW" and it worked like a
> charm.  Awesome.

Great. Thanks for quickly testing this.

>
> Would you want to add the "BIGARROW" option to the tutorial?
>

Yes, if/when we merge this (and I'll try to talk to Leighton
about it tomorrow), then I would also want to update the
Tutorial to describe this new feature. There is almost no point
writing new code if we don't document it.

Peter


From tiagoantao at gmail.com  Thu Aug  2 03:39:43 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 1 Aug 2012 20:39:43 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Linux -
	Python 3.1
Message-ID: <CAA9RGEP545zevW7qaUsTqR-qFfWS+bbKJSbKCj+mwhiZufvFzw@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From Leighton.Pritchard at hutton.ac.uk  Thu Aug  2 07:42:47 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Thu, 2 Aug 2012 07:42:47 +0000
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
Message-ID: <ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>

Hi,

On 1 Aug 2012, at Wednesday, August 1, 18:15, Peter Cock wrote:

On Wed, Aug 1, 2012 at 3:27 PM, Zachary Charlop-Powers
<zcharlop at mail.rockefeller.edu<mailto:zcharlop at mail.rockefeller.edu>> wrote:
Leighton,
Peter,

I love that we're not in the same timezone; I ask a question when I leave
work and - lo,and, behold - when I return in the morning there is a well
thought out response.  Thank you both.

No worries.

I had a go this afternoon (a quite moment between rushes - grin),

Good job getting it done so quickly!

and it wasn't as bad as I feared.
[?]
To match the current sigil argument names BOX and ARROW, I have
provisionally called BIGARROW. Any better ideas?

BIGARROW sounds fine to me. I like literal names.

Leighton and I had a little debate about this - with hindsight, the
original arrow sigil might have raised an error or drawn a box in
this situation - but I'm not willing to change this and break existing
code.

Likewise - now it's been there so long, I think it would be inconsistent at this point to change it. Arguably, the default setting has to choose a direction simply because (single-headed) arrows have a direction. For those figures where you're being precise, users can use a box for a feature with no direction; if it's pointing the wrong way, users can set the feature strand. Left-to-right as a default is arbitrary, though.

Cheers,

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796


From p.j.a.cock at googlemail.com  Thu Aug  2 16:12:54 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 2 Aug 2012 17:12:54 +0100
Subject: [Biopython-dev] Genome Diagram Default Behavior
In-Reply-To: <ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>
References: <2054694E-0D60-4F16-A7EE-ABC8AD59F344@rockefeller.edu>
	<CAKVJ-_5R8SjK3QCmc48UXnDMy5vdsYO_UgZ8W0EH3inPWq6jtA@mail.gmail.com>
	<5D18E2FE-3756-44E1-9DB3-4BAC690DFD78@rockefeller.edu>
	<CAKVJ-_4WETMxCxRzKExfqEAWQs_cjC6UewCniM=6V9hJp584yQ@mail.gmail.com>
	<ED9ED5E9-F108-470F-B825-145BE4F039B5@hutton.ac.uk>
Message-ID: <CAKVJ-_5s_MutuaxysAuyk6Xnfoeih6E-1a50MqnvboUBqjSMWA@mail.gmail.com>

On Thu, Aug 2, 2012 at 8:42 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>Peter wrote:
>>
>> To match the current sigil argument names BOX and ARROW, I have
>> provisionally called BIGARROW. Any better ideas?
>>
>
> BIGARROW sounds fine to me. I like literal names.
>

Great. Checked into the master, and I updated the Tutorial and
the Proux et al 2002 Figure 6 reproduction example to use this:

Before (cross-links with strand specific ARROW sigil):
http://biopython.org/DIST/docs/tutorial/images/three_track_cl2.png

After (cross-links with strand straddling BIGARROW sigil):
http://biopython.org/DIST/docs/tutorial/images/three_track_cl2a.png

Original (I don't know what was used to draw this):
http://dx.doi.org/10.1128/JB.184.21.6026-6036.2002

Regards,

Peter


From clements at galaxyproject.org  Fri Aug  3 23:23:25 2012
From: clements at galaxyproject.org (Dave Clements)
Date: Fri, 3 Aug 2012 16:23:25 -0700
Subject: [Biopython-dev] Galaxy is Hiring Postdocs
Message-ID: <CA+He-X9fND6V3MjszsFfJPotiRMobKgv2g_wjxTW=2aa+Z4gxg@mail.gmail.com>

Hello all,

The Galaxy Project <http://galaxyproject.org/>, a highly successful high
throughput data analysis platform for Life Sciences with over 23,000 users
worldwide <http://wiki.g2.bx.psu.edu/Galaxy%20Project/Statistics>, is
hiring:

The Taylor Lab <http://bx.mathcs.emory.edu/> in
Biology<http://www.biology.emory.edu>and Mathematics
& Computer Science <http://www.mathcs.emory.edu> at Emory
University<http://emory.edu/>is looking for
*postdoctoral scholars <http://bx.mathcs.emory.edu/joining/postdocs/>* to
work on the Galaxy Project. Postdoctoral
applicants<http://bx.mathcs.emory.edu/joining/postdocs/>should have
expertise in Bioinformatics and Computational Biology and
research interests that complement but extend the lab's current
interests<http://bx.mathcs.emory.edu/research/>:
The Galaxy project; distributed and high-performance computing for data
intensive science; vertebrate functional genomics; and genomics and
epigenomic mechanisms of gene regulation, the role of transcription factors
and chromatin structure in global gene expression, development, and
differentiation. See the
announcement<http://bx.mathcs.emory.edu/joining/postdocs/>for full
details (
http://bx.mathcs.emory.edu/joining/postdocs/).

The Nekrutenko Lab <http://www.bx.psu.edu/%7Eanton/> at the Huck Institutes
of Life Sciences <http://www.huck.psu.edu/> at Penn State
<http://psu.edu/>is seeking
*highly opinionated and biologically inclined* *Postdoctoral
researchers*within the Galaxy Project to develop best practices for
analysis of
next-generation sequencing data in all areas of Life Sciences where NGS is
used. Successful candidates will join a vibrant research group at the core
of the Galaxy Project and will work on setting trends in modern data-driven
life-sciences.
Please send your CV and names/e-mail addresses of three references to
jobs at galaxyproject.org.

Thanks,

Dave C.
-- 
<http://galaxyproject.org/wiki/GCC2012>http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/


From arklenna at gmail.com  Tue Aug  7 05:11:04 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Tue, 7 Aug 2012 01:11:04 -0400
Subject: [Biopython-dev] GSoC python variant update
Message-ID: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>

Full post: http://arklenna.tumblr.com/post/28890255191/

Summary:

* I'm working on the coordinate mapper Reece contributed:
http://biopython.org/pipermail/biopython/2010-June/006598.html

* I'm representing intron locations relative to CDS coords using the
HGVS standards: http://www.hgvs.org/mutnomen/refseq_figure.html
I'd like to know if there are other common ways of representing such positions.

* In order to customize the display of positions (e.g. 0-based or
1-based), I'm using a class as a configuration container. I've read on
StackOverflow that attempts to use globals or a singleton class are
discouraged in Python, but I have not found practical suggestions for
how to implement module-wide configurations. Suggestions are welcome.

* Any advice about circular genomes or strandedness is also welcome.

* This mapper will work for SeqRecords, SeqFeatures, FeatureLocations,
etc. Are there other Biopython objects that store sequence coordinates
and thus should be mappable?

Regards,

Lenna


From mjldehoon at yahoo.com  Tue Aug  7 06:40:13 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 6 Aug 2012 23:40:13 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
Message-ID: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>

Dear all,

Currently Bio.Motif has some support for writing TRANSFAC files but not for reading TRANSFAC files. I would like to add such a parser to Bio.Motif. Do you all agree that it fits in this module? Note that the TRANSFAC files very much look like EMBL files, and therefore contain much more information than what is currently in a Bio.Motif._Motif.Motif object (the object to be generated by Bio.Motif.read(handle, "transfac")). Perhaps the easiest is to add an attribute .annotations to Bio.Motif._Motif.Motif objects, and use it as a dictionary to store the EMBL-like annotations under their 2-letter keys.

On a related note, currently Bio.Motif._Motif.Motif objects also perform functions that are more appropriate for a separate PWM (position-weight matrix) class within Bio.Motif. It may be a good idea to have a separate PWM class for this functionality.

Best,
-Michiel.


From bartek at rezolwenta.eu.org  Tue Aug  7 07:18:43 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 7 Aug 2012 09:18:43 +0200
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>
References: <1344321613.96095.YahooMailClassic@web164001.mail.gq1.yahoo.com>
Message-ID: <CABHxouXrpENNt3DFjJ1jUu+tDfC8z7=n1GdXXkAtU-snVNwcGQ@mail.gmail.com>

Hi Michiel,

On Tue, Aug 7, 2012 at 8:40 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Dear all,
>
> Currently Bio.Motif has some support for writing TRANSFAC files but not for reading TRANSFAC files. I would like to add such a parser to Bio.Motif. Do you all agree that it fits in this module? Note that the TRANSFAC files very much look like EMBL files, and therefore contain much more information than what is currently in a Bio.Motif._Motif.Motif object (the object to be generated by Bio.Motif.read(handle, "transfac")). Perhaps the easiest is to add an attribute .annotations to Bio.Motif._Motif.Motif objects, and use it as a dictionary to store the EMBL-like annotations under their 2-letter keys.
>
That would certainly be a valuable addition. I didn't add it as a
format because it might get a bit confusing for users. The TRANSFAC
itself (trademarked, afaik), as distributed by the BIObase company and
is not available unless you pay them some license(you have to register
even for the "publicly available" one that comes with a license too).
If you do, then you get access to a number of interconnected datasets,
including information about what they call "matrices",  "sites" and
"transcription factors" and "classes". I think that if we want to
support their filetypes, we probably should think whether we should
support the matrix file only or maybe the other ones asa well. The
confusing part is that many programs use "transfac-like" formats, i.e.
files very similar to the part in the "matrix" file that corresponds
to the PWM itself. (For example see
http://www.benoslab.pitt.edu/stamp/help.html).

> On a related note, currently Bio.Motif._Motif.Motif objects also perform functions that are more appropriate for a separate PWM (position-weight matrix) class within Bio.Motif. It may be a good idea to have a separate PWM class for this functionality.

Currently, Bio.Motif.Motif class represents something sequence-like.
It can either be seen a set of instances (.add_instance(),
.search_instance()) or as a PWM (.log_odds(), search_pwm(), etc), It
can hold some annotation part (i.e. name etc), however, in my mind, it
is the core of the functionality for "motif" analysis. I can imagine
other types of motifs (we discussed regExp or HMM based motifs) that
could subclass Motif, but I think this should be the role of the Motif
class. Then comes the thing with annotations. I would rather vote for
something more similar to SeqRecord and Seq, where a new class
(MotifRecord?) would hold all the annotation data from TRANSFAC or
somesuch DB, and the Motif would remain more sequence-like. With
respect to moving the PWM-related functionality to a separate class,
I'm not sure. I think it is valuable to be able to load instances from
a file and then convert them to a PWM. It could be done with separate
classes, but I'm not sure it would be easier then...

best
Bartek
-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Tue Aug  7 08:39:15 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 7 Aug 2012 01:39:15 -0700 (PDT)
Subject: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
In-Reply-To: <CABHxouXrpENNt3DFjJ1jUu+tDfC8z7=n1GdXXkAtU-snVNwcGQ@mail.gmail.com>
Message-ID: <1344328755.85288.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Bartek,

Thanks for your reply.

--- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:
> If you do, then you get access to a number of interconnected
> datasets, including information about what they call "matrices",
>?"sites" and "transcription factors" and "classes". I think that if
> we want to support their filetypes, we probably should think whether
> we should support the matrix file only or maybe the other ones asa
> well.

I would suggest to just support the matrices for now.

> The confusing part is that many programs use "transfac-like"
> formats, i.e. files very similar to the part in the "matrix"
> file that corresponds to the PWM itself. (For example see
> http://www.benoslab.pitt.edu/stamp/help.html).

This also means that if Bio.Motif can parse TRANSFAC files, then it
can parse the transfac-like formats, at least to some degree. Personally I am actually more interested in the SwissRegulon database, which uses a transfac-like format

> Then comes the thing with annotations. I would rather
> vote for something more similar to SeqRecord and Seq,
> where a new class (MotifRecord?) would hold all the
> annotation data from TRANSFAC or somesuch DB, and the
> Motif would remain more sequence-like.

Are you suggesting that MotifRecord subclasses Bio.Motif._Motif.Motif?
For example we could have a Bio.Motif.Parsers.TRANSFAC.Motif class that subclasses Bio.Motif._Motif.Motif. Then  Bio.Motif._Motif.Motif remains sequence-like, and Bio.Motif.Parsers.TRANSFAC.Motif takes care of the annotations.

Alternatively we could say that Bio.Motif.Parsers.TRANSFAC.read returns a Bio.Motif.Parsers.TRANSFAC.Record object that contains the motif information as an attribute (so record.motif would be an instance of Bio.Motif._Motif.Motif).

Best,
-Michiel


From mjldehoon at yahoo.com  Tue Aug  7 14:47:00 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 7 Aug 2012 07:47:00 -0700 (PDT)
Subject: [Biopython-dev] Fw: Re:  Parsing TRANSFAC matrices with Bio.Motif
Message-ID: <1344350820.11922.YahooMailClassic@web164006.mail.gq1.yahoo.com>

Forwarding Bartek's email to the list ..
I am pretty much OK with his suggestions, but feel free to comment or suggest other solutions before we start implementing this.

Best,
-Michiel.

--- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Date: Tuesday, August 7, 2012, 5:16 AM
> On Tue, Aug 7, 2012 at 10:39 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Bartek,
> >
> > Thanks for your reply.
> >
> > --- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org>
> wrote:
> >> If you do, then you get access to a number of
> interconnected
> >> datasets, including information about what they
> call "matrices",
> >> "sites" and "transcription factors" and "classes".
> I think that if
> >> we want to support their filetypes, we probably
> should think whether
> >> we should support the matrix file only or maybe the
> other ones asa
> >> well.
> >
> > I would suggest to just support the matrices for now.
> >
> I'm fine with that. Some links between the files might be
> less
> usefule, but that might be added later.
> 
> >> The confusing part is that many programs use
> "transfac-like"
> >> formats, i.e. files very similar to the part in the
> "matrix"
> >> file that corresponds to the PWM itself. (For
> example see
> >> http://www.benoslab.pitt.edu/stamp/help.html).
> >
> > This also means that if Bio.Motif can parse TRANSFAC
> files, then it
> > can parse the transfac-like formats, at least to some
> degree. Personally I am actually more interested in the
> SwissRegulon database, which uses a transfac-like format
> >
> 
> In principle yes, but there are slight variants making
> things "almost
> working". That's the main reason I didn't put the code I was
> using
> myself into biopython repository, as it might cause some
> weird
> breakages. For examples, some formats drop the P0 column
> (the
> "transfac-like" in STAMP, for one) which makes it impossible
> to figure
> out whether you are interpreting the numbers right unless
> you agree on
> some ordering of nucleotides. I would suggest that we should
> support
> databases named directly and, maybe, think about generic
> methods for
> "raw PSSM" files, that would require the user to give the
> nucleotide
> order...
> 
> >> Then comes the thing with annotations. I would
> rather
> >> vote for something more similar to SeqRecord and
> Seq,
> >> where a new class (MotifRecord?) would hold all
> the
> >> annotation data from TRANSFAC or somesuch DB, and
> the
> >> Motif would remain more sequence-like.
> >
> > Are you suggesting that MotifRecord subclasses
> Bio.Motif._Motif.Motif?
> > For example we could have a
> Bio.Motif.Parsers.TRANSFAC.Motif class that subclasses
> Bio.Motif._Motif.Motif. Then? Bio.Motif._Motif.Motif
> remains sequence-like, and Bio.Motif.Parsers.TRANSFAC.Motif
> takes care of the annotations.
> >
> > Alternatively we could say that
> Bio.Motif.Parsers.TRANSFAC.read returns a
> Bio.Motif.Parsers.TRANSFAC.Record object that contains the
> motif information as an attribute (so record.motif would be
> an instance of Bio.Motif._Motif.Motif).
> >
> 
> For me, personally, the version where transfac motif is a
> subclass of
> Motif is a more useful one. It is simpler, and it adds
> annotations as
> attributes of a motif. However, if we decided that we want
> the whole
> TRANSFAC db with all it's annotations, the more natural way
> would be
> to have separate classes for instances and motifs and maybe
> even
> separate record classes representing a database record
> (there might be
> more transfac records referencing the same matrix). I don't
> think that
> there is so much need for supporting all the stuff from
> TRANSFAC (I
> don't know anybody who would be using all their annotations,
> people
> seem to care only about matrices anyway) so I'd vote for the
> simpler
> way of subclassing Motif.
> 
> best
> Bartek
> -- 
> Bartek Wilczynski
> 


From w.arindrarto at gmail.com  Tue Aug  7 17:56:26 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 7 Aug 2012 19:56:26 +0200
Subject: [Biopython-dev] GSoC Project Update -- 11
Message-ID: <CADEGkF7Z46PKT2g28xyRPKYi0wKhNOpRm5+Q6wHz=KtKiBibBQ@mail.gmail.com>

Hello everyone,

I have just posted my latest update on my project here:
http://bow.web.id/blog/2012/08/back-on-the-main-branch/

It's been taking quite a while since I posted my last update since
there has been a considerable change to the SearchIO object model I'm
using. The details are in my blog post, but to keep it short, it was
because the previous model (QueryResult, Hit, and HSP) was inadequate
in handling files that have multiple sequences in their HSP (so far
seen in files output by BLAT and Exonerate). In my previous updates,
I've been using simple Python lists to store attributes related to
these multiple sequences, but that turned out to be problematic as it
may make the object have inconsistent attributes.

After trying out several different implementations and discussing them
with Peter, we've finally settled on a new model. The new model
changes the HSP object into a container that stores a new object:
HSPFragment. HSPFragment represents a single, contiguous alignment of
the hit and query sequence. It only stores the sequence, coordinates,
frames, and strands. Other attributes made by the search program (such
as evalues or scores) are stored in the HSP object.

This change required some modifications on all of the current parsers,
but from a user's perspective working with file formats other than
BLAT or Exonerate, the changes should be minimum.

Aside from this, there's also a small update on the main API which
lets it accept keyword arguments. The arguments modify behaviors of
the parser, and they are different for each parser. Currently, this is
only used by the BLAST tabular parser, but I imagine more parsers will
use this in the future.

Finally, having settled on a firmer object model, I'll be spending the
rest of my time to focus on the documentation. There may still be
small fixes to the code, but I expect nothing as major as this one.

regards,
Bow


From chapmanb at 50mail.com  Wed Aug  8 13:55:36 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 08 Aug 2012 09:55:36 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
References: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
Message-ID: <874nodh4iv.fsf@fastmail.fm>


Lenna;
This all sounds great and will be a nice practical addition to
Biopython. Thanks for taking it on. Some specific thoughts on your questions:

> * I'm representing intron locations relative to CDS coords using the
> HGVS standards: http://www.hgvs.org/mutnomen/refseq_figure.html
> I'd like to know if there are other common ways of representing such
> positions.

I don't know of one myself, so it's great to be following a standard
rather than reinventing something. Nice work.

> * In order to customize the display of positions (e.g. 0-based or
> 1-based), I'm using a class as a configuration container. I've read on
> StackOverflow that attempts to use globals or a singleton class are
> discouraged in Python, but I have not found practical suggestions for
> how to implement module-wide configurations. Suggestions are welcome.

With configuration items like this, you have two choices:

- A global variable.
- Pass the configuration to every function that needs it.

There are tradeoffs with both approaches, but for this case I agree with
your decision to use globals. Most people will want 0-based/Biopython
style but it gives those who don't a knob to switch over.

> * Any advice about circular genomes or strandedness is also welcome.

Circular handling is an unresolved issue in Biopython:

https://redmine.open-bio.org/issues/2578

It's a bit tricky, especially with features that span the origin.

I'd prioritize handling strandedness since you're going to have plenty
of reverse strand coding sequences. You're mapping not only within the
coding region but also back to the original sequence on the reverse
strand. So in your g2c mapping, the original gene goes from
e1 -> s1 -> e0 -> s0 as you read 5' to 3' across the sequence. The best
place to get started is to pick a reverse strand gene and then work
through the mappings, thinking through the orientations. I find drawing
it out to be the easiest way.

> * This mapper will work for SeqRecords, SeqFeatures, FeatureLocations,
> etc. Are there other Biopython objects that store sequence coordinates
> and thus should be mappable?

That sounds like a great start. Thanks again for this,
Brad


From p.j.a.cock at googlemail.com  Wed Aug  8 14:33:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 8 Aug 2012 15:33:05 +0100
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <874nodh4iv.fsf@fastmail.fm>
References: <CALfq9tJxsTqkq3E6KN-iirEns7FhC2G7aXEX+4MNm+WTPiXw2Q@mail.gmail.com>
	<874nodh4iv.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5r9-dBQMqo95huP4kd7fmXEs_ZL8KJwTOCja+DWzmyoQ@mail.gmail.com>

On Wed, Aug 8, 2012 at 2:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>Lenna wrote:
>> * Any advice about circular genomes or strandedness is also welcome.
>
> Circular handling is an unresolved issue in Biopython:
>
> https://redmine.open-bio.org/issues/2578
>
> It's a bit tricky, especially with features that span the origin.
>
> I'd prioritize handling strandedness since you're going to have plenty
> of reverse strand coding sequences. You're mapping not only within the
> coding region but also back to the original sequence on the reverse
> strand. So in your g2c mapping, the original gene goes from
> e1 -> s1 -> e0 -> s0 as you read 5' to 3' across the sequence. The best
> place to get started is to pick a reverse strand gene and then work
> through the mappings, thinking through the orientations. I find drawing
> it out to be the easiest way.

And then think about mixed strand genes, e.g. transpliced tRNA is
a good example - there is a GenBank example in our unit tests.

Peter


From lgautier at gmail.com  Wed Aug  8 16:37:35 2012
From: lgautier at gmail.com (Laurent Gautier)
Date: Wed, 08 Aug 2012 18:37:35 +0200
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
Message-ID: <502295CF.3020103@gmail.com>

On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
> Lenna;
> This all sounds great and will be a nice practical addition to
> Biopython. Thanks for taking it on. Some specific thoughts on your questions:
>
>> >* I'm representing intron locations relative to CDS coords using the
>> >HGVS standards:http://www.hgvs.org/mutnomen/refseq_figure.html
>> >I'd like to know if there are other common ways of representing such
>> >positions.
> I don't know of one myself, so it's great to be following a standard
> rather than reinventing something. Nice work.
>
>> >* In order to customize the display of positions (e.g. 0-based or
>> >1-based), I'm using a class as a configuration container. I've read on
>> >StackOverflow that attempts to use globals or a singleton class are
>> >discouraged in Python, but I have not found practical suggestions for
>> >how to implement module-wide configurations. Suggestions are welcome.

Module-wide configuration can be implemented as variables, as long as 
they are declared before the functions using them.
If considering a package rather than a single module, options can be 
stored in a module dedicated to options (since Python modules are 
singletons).

> With configuration items like this, you have two choices:
>
> - A global variable.
> - Pass the configuration to every function that needs it.
>
> There are tradeoffs with both approaches, but for this case I agree with
> your decision to use globals. Most people will want 0-based/Biopython
> style but it gives those who don't a knob to switch over.

I'd argue that allowing to switch is an invitation to spectacular issues 
down the road.
An easy, yet frightening, example would be the case where using 
third-party code (such a module) changes this without you knowing.

An other scary thought is that this would amount to bringing the 
infamous Perl variable "$[" to Python. Go explain again that folks 
should Python for its elegance and simplicity after that.


Best,

L.


From arklenna at gmail.com  Wed Aug  8 18:44:33 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 8 Aug 2012 14:44:33 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <502295CF.3020103@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
Message-ID: <CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>

On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com> wrote:
> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>
>>
>>> >* In order to customize the display of positions (e.g. 0-based or
>>> >1-based), I'm using a class as a configuration container. I've read on
>>> >StackOverflow that attempts to use globals or a singleton class are
>>> >discouraged in Python, but I have not found practical suggestions for
>>> >how to implement module-wide configurations. Suggestions are welcome.
>
>
> Module-wide configuration can be implemented as variables, as long as they
> are declared before the functions using them.
> If considering a package rather than a single module, options can be stored
> in a module dedicated to options (since Python modules are singletons).
>

Hi Laurent,

I really like the idea of a configuration module. I will definitely
move in that direction.

>
>> With configuration items like this, you have two choices:
>>
>> - A global variable.
>> - Pass the configuration to every function that needs it.
>>
>> There are tradeoffs with both approaches, but for this case I agree with
>> your decision to use globals. Most people will want 0-based/Biopython
>> style but it gives those who don't a knob to switch over.
>
>
> I'd argue that allowing to switch is an invitation to spectacular issues
> down the road.
> An easy, yet frightening, example would be the case where using third-party
> code (such a module) changes this without you knowing.
>
> An other scary thought is that this would amount to bringing the infamous
> Perl variable "$[" to Python. Go explain again that folks should Python for
> its elegance and simplicity after that.
>
>

Yikes. My approach will not be comparable to $[. For starters, it
wouldn't modify the behavior of every sequence-like object.

My current thought would be to store the 0-based position in an
attribute `pos`, have a property `pos_str` that returns `pos` +
`Config.index`. For representations, `__str__` will return `pos_str`,
and `__repr__` will return `pos` (always 0-based). Math would always
use the 0-based position.

I intend to keep the influence of the hypothetical mapping Config
module limited to Biopython Seq* objects. It should also be possible
to make a kill switch, namely, a version of the Config module where
all of the settings are neutral to adding (i.e. `def __add__(self,
other): return other`).

Please let me know if this would not fully address your concerns.

Cheers,

Lenna


From lgautier at gmail.com  Wed Aug  8 21:58:26 2012
From: lgautier at gmail.com (Laurent Gautier)
Date: Wed, 08 Aug 2012 23:58:26 +0200
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
Message-ID: <5022E102.9010509@gmail.com>

On 2012-08-08 20:44, Lenna Peterson wrote:
> On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com> wrote:
>> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>>
>>>>> * In order to customize the display of positions (e.g. 0-based or
>>>>> 1-based), I'm using a class as a configuration container. I've read on
>>>>> StackOverflow that attempts to use globals or a singleton class are
>>>>> discouraged in Python, but I have not found practical suggestions for
>>>>> how to implement module-wide configurations. Suggestions are welcome.
>>
>> Module-wide configuration can be implemented as variables, as long as they
>> are declared before the functions using them.
>> If considering a package rather than a single module, options can be stored
>> in a module dedicated to options (since Python modules are singletons).
>>
> Hi Laurent,
>
> I really like the idea of a configuration module. I will definitely
> move in that direction.
>
>>> With configuration items like this, you have two choices:
>>>
>>> - A global variable.
>>> - Pass the configuration to every function that needs it.
>>>
>>> There are tradeoffs with both approaches, but for this case I agree with
>>> your decision to use globals. Most people will want 0-based/Biopython
>>> style but it gives those who don't a knob to switch over.
>>
>> I'd argue that allowing to switch is an invitation to spectacular issues
>> down the road.
>> An easy, yet frightening, example would be the case where using third-party
>> code (such a module) changes this without you knowing.
>>
>> An other scary thought is that this would amount to bringing the infamous
>> Perl variable "$[" to Python. Go explain again that folks should Python for
>> its elegance and simplicity after that.
>>
>>
> Yikes. My approach will not be comparable to $[. For starters, it
> wouldn't modify the behavior of every sequence-like object.
>
> My current thought would be to store the 0-based position in an
> attribute `pos`, have a property `pos_str` that returns `pos` +
> `Config.index`. For representations, `__str__` will return `pos_str`,
> and `__repr__` will return `pos` (always 0-based). Math would always
> use the 0-based position.
>
> I intend to keep the influence of the hypothetical mapping Config
> module limited to Biopython Seq* objects. It should also be possible
> to make a kill switch, namely, a version of the Config module where
> all of the settings are neutral to adding (i.e. `def __add__(self,
> other): return other`).

What about making the design decision that string representations that 
are 1-based then, and go beyond making a kill switch by just kill the 
switch ? You'd document it, folks that want 0-based positions would cook 
their own function(s).

I think that configuration modules can be very useful for an application 
(an example here:
http://flask.pocoo.org/snippets/2/ ), but I am more reserved about its 
use in a library.

But do not let me stop you from pursuing this; I am only expressing an 
opinion. One last point though.
Let me describe a possible scenario:

3rd-party module "foo" is using the Biopython Seq* part, and its author 
thinks that Config.index should at 1 one, so he/she sets it accordingly.
An early line in foo.py is:
from somewhere.in.biopython.seq import config
config.index = 1

There is an other piece of code (let's call it bar.py), written by 
someone else or by the same person at a different time. Now the hype is 
all about 0-based indexes, so the author sets it to be sure:
from somewhere.in.biopython.seq import config
config.index = 0

To complete the scenario bar.py is using foo.py, or the other way 
around. The requirement for one an other does not even have to be 
direct. Now config.index will be what the last piece of code sets it to, 
although other parts of the code might assume it is set to something else.

That sort of situation is not prevented from happening with any sort of 
module in Python (e.g., import sys; sys.stdout = sys.stderr), but people 
know they should not do it. Here the config.index would appear as 
something people should change if they like.

Again, that's just an opinion. Others might differ.

Best,


Laurent

>
> Please let me know if this would not fully address your concerns.
>
> Cheers,
>
> Lenna


From arklenna at gmail.com  Wed Aug  8 22:39:48 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Wed, 8 Aug 2012 18:39:48 -0400
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <5022E102.9010509@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
	<5022E102.9010509@gmail.com>
Message-ID: <CALfq9tJMTUc_FJ9tTv=AR2WN+GuVoiGg3tbhN2fZKzr0DfZYXg@mail.gmail.com>

On Wed, Aug 8, 2012 at 5:58 PM, Laurent Gautier <lgautier at gmail.com> wrote:
> On 2012-08-08 20:44, Lenna Peterson wrote:
>>
>> On Wed, Aug 8, 2012 at 12:37 PM, Laurent Gautier <lgautier at gmail.com>
>> wrote:
>>>
>>> On 2012-08-08 18:00, biopython-dev-request at lists.open-bio.org wro
>>>
>>>>>> * In order to customize the display of positions (e.g. 0-based or
>>>>>> 1-based), I'm using a class as a configuration container. I've read on
>>>>>> StackOverflow that attempts to use globals or a singleton class are
>>>>>> discouraged in Python, but I have not found practical suggestions for
>>>>>> how to implement module-wide configurations. Suggestions are welcome.
>>>
>>>
>>> Module-wide configuration can be implemented as variables, as long as
>>> they
>>> are declared before the functions using them.
>>> If considering a package rather than a single module, options can be
>>> stored
>>> in a module dedicated to options (since Python modules are singletons).
>>>
>> Hi Laurent,
>>
>> I really like the idea of a configuration module. I will definitely
>> move in that direction.
>>
>>>> With configuration items like this, you have two choices:
>>>>
>>>> - A global variable.
>>>> - Pass the configuration to every function that needs it.
>>>>
>>>> There are tradeoffs with both approaches, but for this case I agree with
>>>> your decision to use globals. Most people will want 0-based/Biopython
>>>> style but it gives those who don't a knob to switch over.
>>>
>>>
>>> I'd argue that allowing to switch is an invitation to spectacular issues
>>> down the road.
>>> An easy, yet frightening, example would be the case where using
>>> third-party
>>> code (such a module) changes this without you knowing.
>>>
>>> An other scary thought is that this would amount to bringing the infamous
>>> Perl variable "$[" to Python. Go explain again that folks should Python
>>> for
>>> its elegance and simplicity after that.
>>>
>>>
>> Yikes. My approach will not be comparable to $[. For starters, it
>> wouldn't modify the behavior of every sequence-like object.
>>
>> My current thought would be to store the 0-based position in an
>> attribute `pos`, have a property `pos_str` that returns `pos` +
>> `Config.index`. For representations, `__str__` will return `pos_str`,
>> and `__repr__` will return `pos` (always 0-based). Math would always
>> use the 0-based position.
>>
>> I intend to keep the influence of the hypothetical mapping Config
>> module limited to Biopython Seq* objects. It should also be possible
>> to make a kill switch, namely, a version of the Config module where
>> all of the settings are neutral to adding (i.e. `def __add__(self,
>> other): return other`).
>
>
> What about making the design decision that string representations that are
> 1-based then, and go beyond making a kill switch by just kill the switch ?
> You'd document it, folks that want 0-based positions would cook their own
> function(s).
>
> I think that configuration modules can be very useful for an application (an
> example here:
> http://flask.pocoo.org/snippets/2/ ), but I am more reserved about its use
> in a library.
>
> But do not let me stop you from pursuing this; I am only expressing an
> opinion. One last point though.
> Let me describe a possible scenario:
>
> 3rd-party module "foo" is using the Biopython Seq* part, and its author
> thinks that Config.index should at 1 one, so he/she sets it accordingly.
> An early line in foo.py is:
> from somewhere.in.biopython.seq import config
> config.index = 1
>
> There is an other piece of code (let's call it bar.py), written by someone
> else or by the same person at a different time. Now the hype is all about
> 0-based indexes, so the author sets it to be sure:
> from somewhere.in.biopython.seq import config
> config.index = 0
>
> To complete the scenario bar.py is using foo.py, or the other way around.
> The requirement for one an other does not even have to be direct. Now
> config.index will be what the last piece of code sets it to, although other
> parts of the code might assume it is set to something else.
>
> That sort of situation is not prevented from happening with any sort of
> module in Python (e.g., import sys; sys.stdout = sys.stderr), but people
> know they should not do it. Here the config.index would appear as something
> people should change if they like.
>
> Again, that's just an opinion. Others might differ.
>
> Best,
>
>
> Laurent
>
>
>>
>> Please let me know if this would not fully address your concerns.
>>
>> Cheers,
>>
>> Lenna
>
>


Laurent,

I must thank you again for your foresight. I am realizing I may have
gotten carried away with configurability. My initial goal with the
index setting was to enable both GenBank and HGVS representations of
genomic positions; a much simpler and safer approach would be to have
`to_genbank()` and `to_hgvs()` methods. A user could set the relevant
objects' __str__ to either of those.

Cheers,

Lenna


From p.j.a.cock at googlemail.com  Thu Aug  9 09:07:15 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 9 Aug 2012 10:07:15 +0100
Subject: [Biopython-dev] GSoC python variant update
In-Reply-To: <5022E102.9010509@gmail.com>
References: <mailman.7.1344441604.19335.biopython-dev@lists.open-bio.org>
	<502295CF.3020103@gmail.com>
	<CALfq9tLt-n+YVL6BCEFFFokQ6RY4v6=i5USkb6+OqdD0QSKg4Q@mail.gmail.com>
	<5022E102.9010509@gmail.com>
Message-ID: <CAKVJ-_5sZX3YjORj_4q9wAnyAQ0us-kzZGA3gvstf=cYBdZeTw@mail.gmail.com>

On Wed, Aug 8, 2012 at 10:58 PM, Laurent Gautier <lgautier at gmail.com> wrote:
>
> What about making the design decision that string representations that are
> 1-based then, and go beyond making a kill switch by just kill the switch ?
> You'd document it, folks that want 0-based positions would cook their own
> function(s).
>
> I think that configuration modules can be very useful for an application ...

I agree that a module level config setting is unwise.

However, I'd much prefer the string representation was 0-based for
consistency, both internal to the module and with most of Biopython.
(The restriction module uses 1-based counting which I find very
annoying.)

You could still provide something like a format method to give a
string in common representations (e.g. GenBank/EMBL/INSDC
style location strings).

Peter


From mjldehoon at yahoo.com  Thu Aug  9 11:07:20 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 9 Aug 2012 04:07:20 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
Message-ID: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi guys,

In the Motif class in Bio.Motif._Motif, there is an attribute self.has_instances to identify whether the attributes self.instances is defined. I think that we can remove the self.has_instances attribute from the code and simply set self.instances=None when it is undefined. Same thing for self.counts and self.has_counts.
Any objections?

Best,
-Michiel.


From bartek at rezolwenta.eu.org  Thu Aug  9 12:26:33 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 9 Aug 2012 14:26:33 +0200
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1344510440.89823.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>

On Thu, Aug 9, 2012 at 1:07 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi guys,
>
> In the Motif class in Bio.Motif._Motif, there is an attribute self.has_instances to identify whether the attributes self.instances is defined. I think that we can remove the self.has_instances attribute from the code and simply set self.instances=None when it is undefined. Same thing for self.counts and self.has_counts.
> Any objections?

Makes sense to me. +1

-- 
Bartek Wilczynski


From mjldehoon at yahoo.com  Thu Aug  9 16:00:14 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 9 Aug 2012 09:00:14 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
Message-ID: <1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>

OK, done. Thanks!
-Michiel.

--- On Thu, 8/9/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:

> From: Bartek Wilczynski <bartek at rezolwenta.eu.org>
> Subject: Re: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Thursday, August 9, 2012, 8:26 AM
> On Thu, Aug 9, 2012 at 1:07 PM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi guys,
> >
> > In the Motif class in Bio.Motif._Motif, there is an
> attribute self.has_instances to identify whether the
> attributes self.instances is defined. I think that we can
> remove the self.has_instances attribute from the code and
> simply set self.instances=None when it is undefined. Same
> thing for self.counts and self.has_counts.
> > Any objections?
> 
> Makes sense to me. +1
> 
> -- 
> Bartek Wilczynski
> 


From tiagoantao at gmail.com  Fri Aug 10 03:04:53 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 9 Aug 2012 20:04:53 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Linux 64
	- Python 2.7
Message-ID: <CAA9RGENG=-Qbi549wkWxdU7FLpsjTsoVtDnDSsAVceYfYZJiRQ@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Fri Aug 10 08:33:43 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 10 Aug 2012 09:33:43 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
References: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
	<1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>

On Thu, Aug 9, 2012 at 5:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> OK, done. Thanks!
> -Michiel.

You'll also need to update the example in the Tutorial, quote:

    The arnt and srf motifs can both do the same things for us,
    but they use different internal representations of the motif.
    We can tell that by inspecting the \verb|has_counts| and
    has_instances properties:

    >>> arnt.has_instances
   True
    >>> srf.has_instances
    False
    >>> srf.has_counts
    True

This means test_Tutorial.py is failing (across all platforms).
Presumably we would suggest switching these to somethinglike:

    >>> arnt.instances is None
    False

etc? In fact given the old methods were documents like
this, I would be happier if we could phase them out with
a deprecation warning via a read only property method,

    @property
    def has_instances(self):
        """"Does this motif have instances (DEPRECATED)."""
        import warnings
        from Bio import BiopythonDeprecationWarning
        warnings.warn("Check if motif.instance is None or not instead",
            BiopythonDeprecationWarning)
        return self.instances is not None

(untested, but something like that)

Peter


From p.j.a.cock at googlemail.com  Fri Aug 10 20:04:54 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 10 Aug 2012 21:04:54 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>
References: <CABHxouW164vGk-_Jw-F+vrKNjuKRR+M+6nWDovOmj=r2-JojFg@mail.gmail.com>
	<1344528014.32936.YahooMailClassic@web164006.mail.gq1.yahoo.com>
	<CAKVJ-_5fQye_iHLzuyFZrMWiS2xKeZqrGAWwmBoUFdoO7gZyOA@mail.gmail.com>
Message-ID: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>

On Fri, Aug 10, 2012 at 9:33 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 9, 2012 at 5:00 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> OK, done. Thanks!
>> -Michiel.
>
> You'll also need to update the example in the Tutorial, quote:
>
>     The arnt and srf motifs can both do the same things for us,
>     but they use different internal representations of the motif.
>     We can tell that by inspecting the \verb|has_counts| and
>     has_instances properties:
>
>     >>> arnt.has_instances
>    True
>     >>> srf.has_instances
>     False
>     >>> srf.has_counts
>     True
>
> This means test_Tutorial.py is failing (across all platforms).
> Presumably we would suggest switching these to somethinglike:
>
>     >>> arnt.instances is None
>     False

Fixed:
https://github.com/biopython/biopython/commit/b866e74dc9b6162517588ea4c0e4d1ecde5ed87c

> etc? In fact given the old methods were documents like
> this, I would be happier if we could phase them out with
> a deprecation warning via a read only property method,
>
>     @property
>     def has_instances(self):
>         """"Does this motif have instances (DEPRECATED)."""
>         import warnings
>         from Bio import BiopythonDeprecationWarning
>         warnings.warn("Check if motif.instance is None or not instead",
>             BiopythonDeprecationWarning)
>         return self.instances is not None
>
> (untested, but something like that)

Done:
https://github.com/biopython/biopython/commit/fd2223d118227c921524e070c803b97bc979a70f

Although since that won't work on old Biopython either (you'd
get an AttributeError), perhaps we should label these new
backwards compatible properties as obsolete with a pending
deprecation warning for the next release (delay the deprecation)?

Peter


From mjldehoon at yahoo.com  Sat Aug 11 03:48:29 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Aug 2012 20:48:29 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>
Message-ID: <1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>

Hi Peter,

--- On Fri, 8/10/12, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > This means test_Tutorial.py is failing (across all
> platforms).
> > Presumably we would suggest switching these to
> somethinglike:
> >
> >? ???>>> arnt.instances is
> None
> >? ???False
> 
> Fixed:
> https://github.com/biopython/biopython/commit/b866e74dc9b6162517588ea4c0e4d1ecde5ed87c

Thanks for fixing this! Sorry I missed to do this when I was making these changes.

> Although since that won't work on old Biopython either
> (you'd
> get an AttributeError), perhaps we should label these new
> backwards compatible properties as obsolete with a pending
> deprecation warning for the next release (delay the
> deprecation)?
> 

I think we are being way too careful. Requiring proper deprecation warnings each time we make a change in Biopython will slow down its development and improvement. In the past when making changes to the existing code, we have gotten very few complaints; also in this case I doubt that anybody will miss has_counts, has_instances.

Best,
-Michiel.


From mjldehoon at yahoo.com  Sat Aug 11 04:25:05 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 10 Aug 2012 21:25:05 -0700 (PDT)
Subject: [Biopython-dev] Bio.Motif AlignAce parser
Message-ID: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Hi guys,

Looking some more at the parsers in Bio.Motif.

In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.

I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.

The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.

The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.

Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.

This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>> from Bio.Motif.Parsers import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
If we move the parsers one level up, this would be
>>> from Bio.Motif import AlignAce
>>> record = AlignAce.read(handle)
>>> record
<Bio.Motif.AlignAce.Record object at 0x10058c7d0>
which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.

Any comments, objections?

Best,
-Michiel.


From p.j.a.cock at googlemail.com  Sat Aug 11 10:50:07 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 11 Aug 2012 11:50:07 +0100
Subject: [Biopython-dev] Bio.Motif._Motif has_instances, has_counts
In-Reply-To: <1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>
References: <CAKVJ-_6H5+t4OCwFS=h-+Cd96rV8btO7zqSjrYdST0CjfwmcMQ@mail.gmail.com>
	<1344656909.14019.YahooMailClassic@web164002.mail.gq1.yahoo.com>
Message-ID: <CAKVJ-_70nRt1oxF4M8HAaxrARgW7BSenZQZntOAgicXBCmZbxg@mail.gmail.com>

On Saturday, August 11, 2012, Michiel de Hoon wrote:

> Hi Peter,
>
> > Although since that won't work on old Biopython either
> > (you'd
> > get an AttributeError), perhaps we should label these new
> > backwards compatible properties as obsolete with a pending
> > deprecation warning for the next release (delay the
> > deprecation)?
> >
>
> I think we are being way too careful. Requiring proper deprecation
> warnings each time we make a change in Biopython will slow down its
> development and improvement. In the past when making changes to the
> existing code, we have gotten very few complaints; also in this case I
> doubt that anybody will miss has_counts, has_instances.
>
> Best,
> -Michiel.
>

In this case you're probably right about it not causing too much
inconvenience - this is a relatively new module after all.

Peter


From arklenna at gmail.com  Mon Aug 13 05:00:41 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Mon, 13 Aug 2012 01:00:41 -0400
Subject: [Biopython-dev] GSoC python variant update 10
Message-ID: <CALfq9tLZzvgL8pK=2sA_KRCJMwANMqbXjDxZxDp=FU5026GCqA@mail.gmail.com>

Link: http://arklenna.tumblr.com/post/29317968106/

Post:

Following extensive
[discussion](http://biopython.org/pipermail/biopython-dev/2012-August/009849.html)
on the dev list of the pros and cons of configuration classes/modules,
I have refactored my [coordinate
mapper](https://gist.github.com/3172753) to keep configuration as
isolated as possible.

All mapping functions use base 0 internally. Transformation to and
from 1-based coords is allowed by custom MapPosition objects. (they
are currently separate from the  Seq* positions but could probably
subclass ExactPosition). The MapPosition objects have to_dialect and
from_dialect methods that automatically handle conversion between
bases and other formatting details.

There are two different ways a user can convert a coordinate from HGVS:

    # ... assuming cm is an instance of CoordinateMapper
    # Manually construct position from HGVS
    CDS_coord = CDSPosition.from_hgvs("6+1")
    genomic_coord = cm.c2g(CDS_coord)
    print genomic_coord.to_hgvs()

    # Pass dialect argument to mapping function
    genomic_coord = cm.c2g("6+1", dialect="HGVS")
    print genomic_coord.to_hgvs()

Furthermore, the inheritance hierarchy is designed to allow a user to
set a default string representation:

    # Set MapPositions to print as HGVS by default
    def use_hgvs(self):
        return str(self.to_hgvs())
    MapPosition.__str__ = use_hgvs

The [version](https://gist.github.com/3172753/577b7c383e057b78cdcee64be33f18117a46faaf)
as of this writing is passing tests using base 0. I have not yet
implemented tests for `from_hgvs` or `to_hgvs`, but that's next on my
list. I'm hoping to have time for strand and mixed strand, too.


Cheers,

Lenna


From bartek at rezolwenta.eu.org  Mon Aug 13 13:12:35 2012
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 13 Aug 2012 15:12:35 +0200
Subject: [Biopython-dev] Bio.Motif AlignAce parser
In-Reply-To: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>
References: <1344659105.5874.YahooMailClassic@web164003.mail.gq1.yahoo.com>
Message-ID: <CABHxouWmVRddd-Yz09m-B52oKi+jxKk+XJCU=vfDU_xOGtCRtw@mail.gmail.com>

Sounds great to me.

Bartek

On Sat, Aug 11, 2012 at 6:25 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi guys,
>
> Looking some more at the parsers in Bio.Motif.
>
> In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.
>
> I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.
>
> The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.
>
> The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.
>
> Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.
>
> This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>>> from Bio.Motif.Parsers import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
> If we move the parsers one level up, this would be
>>>> from Bio.Motif import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.AlignAce.Record object at 0x10058c7d0>
> which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.
>
> Any comments, objections?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
Bartek Wilczynski


From arnaud.poret at gmail.com  Mon Aug 13 14:07:39 2012
From: arnaud.poret at gmail.com (Arnaud Poret)
Date: Mon, 13 Aug 2012 16:07:39 +0200
Subject: [Biopython-dev] obo parser
Message-ID: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>

Hi everyone,

I'm a newcomer and I'm writing an obo parser for importing ontologies
into python. I'm not sure, but has already BioPython an obo parser?

If yes, I'm reinventing the wheel...

If no, I'll be glad to contribute.


From tiagoantao at gmail.com  Tue Aug 14 03:23:01 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 13 Aug 2012 20:23:01 -0700
Subject: [Biopython-dev] Away Re: buildbot failure in Biopython on Windows
	XP - Python 2.5
Message-ID: <CAA9RGEP7OuWO7cEAiya46d6nJFj2ukVPwKednQoSQbUhHKeOPg@mail.gmail.com>

I am currently away from office. I will respond back on the 20th of August.

Regards,
Tiago

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Tue Aug 14 11:06:32 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Aug 2012 12:06:32 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
Message-ID: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>

On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 2, 2012 at 8:42 AM, Leighton Pritchard wrote:
>>Peter wrote:
>>>
>>> To match the current sigil argument names BOX and ARROW, I have
>>> provisionally called BIGARROW. Any better ideas?
>>>
>>
>> BIGARROW sounds fine to me. I like literal names.
>>
>
> Great. Checked into the master, and I updated the Tutorial and
> the Proux et al 2002 Figure 6 reproduction example to use this:
>
> Before (cross-links with strand specific ARROW sigil):
> http://biopython.org/DIST/docs/tutorial/images/three_track_cl2.png
>
> After (cross-links with strand straddling BIGARROW sigil):
> http://biopython.org/DIST/docs/tutorial/images/three_track_cl2a.png
>
> Original (I don't know what was used to draw this):
> http://dx.doi.org/10.1128/JB.184.21.6026-6036.2002
>
> Regards,
>
> Peter

Further to that work, I updated some older code for a JAGGY
sigil, and also an OCTO sigil (names open to suggestions),
which are on my gd-sigils branch which has documentation
in the tutorial, including this image of the expanded sigil set:
https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png

This is a slight simplification of the old JAGGY code in that it
does (yet) allow control of the teeth length (e.g. to have just
teeth on one end). I am thinking this could be exposed like
the existing arrow specific options.

I originally created the JAGGY sigil for marking a break point
in a contig/scaffold. For instance, you might want to mark a
run of NNNNN bases in a scaffold with a jaggy sigil (straddling
both strands) as a clear visual marker to explain why there
were no genes.

Other sigil ideas I pondered include an OVAL, which should
be quite easy for the linear diagrams, but rather more work to
implement for circular diagrams due to the distorted curves.

Peter


From p.j.a.cock at googlemail.com  Tue Aug 14 19:49:23 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 14 Aug 2012 20:49:23 +0100
Subject: [Biopython-dev] GSoC SearchIO project
In-Reply-To: <87lim4h07o.fsf@fastmail.fm>
References: <1334014051.14489.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<87lim4h07o.fsf@fastmail.fm>
Message-ID: <CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>

On Tue, Apr 10, 2012 at 1:58 AM, Brad Chapman wrote:
> Michiel;
>> Hi Eric, Peter,
>>
>> > How about Bio.Search, for now?
>>
>> I would prefer Bio.Pairwise or Bio.Align.Pairwise, since that tells
>> users something about what the module is for. Bio.Search could be
>> anything (search PubMed? search the Entrez databases? search Google?
>> anyway Bio.Search does not suggest that this module is about pairwise
>> alignments). But Peter previously mentioned that he doesn't like
>> Bio.Pairwise; can we convince you?
>
> I agree with Peter on this one. The module is primarily about searching
> a sequence database with an input via multiple methods, not about
> pairwise alignment of two sequences with is what Bio.Align.Pairwise
> suggests to me.
>
> Brad

On potential problem with Bio.Search (on top of concerns raised
here about vagueness) Bow and I were just talking about during
our weekly GSoC video call was the existence of Bio/Search.py
which is obsolete and long overdue for removal. I have just
deprecated it (something I forgot to do before the last release):
https://github.com/biopython/biopython/commit/5a275ccd1df3def40df1eef517af755d373dadd8

We'd earlier talked about using Bio.Search as the namespace. I was
worried about the potential existence on a user's machine of both
Bio/Search.py (the old obsolete code) and Bio/Search/__init__.py
(aka SearchIO, the new module) and which would take precedence
when doing: from Bio import Search

Given how Python module installations work, that seems highly
likely to occur. The good news is that the package would take
priority - see http://www.python.org/doc/essays/packages.html

>>>> What If I Have a Module and a Package With The Same Name?
>>>>
>>>> You may have a directory (on sys.path) which has both a module
>>>> spam.py and a subdirectory spam that contains an __init__.py
>>>> (without the __init__.py, a directory is not recognized as a package).
>>>> In this case, the subdirectory has precedence, and importing spam
>>>> will ignore the spam.py file, loading the package spam instead. If
>>>> you want the module spam.py to have precedence, it must be
>>>> placed in a directory that comes earlier in sys.path.

So there is no technical reason to avoid Bio.Search as an
option for the Bio.SearchIO namespace. We could then
have Bio.Search.Applications for command line wrappers,
consistent with Bio.Phylo.Applications, Bio.Motif.Applications
and Bio.Align.Applications.

Of course, Bio.Search is still perhaps too broad a name... but
on balance perhaps it is still better than Bio.SearchIO?

Regards,

Peter


From tiagoantao at gmail.com  Tue Aug 14 20:39:12 2012
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 14 Aug 2012 21:39:12 +0100
Subject: [Biopython-dev] jython/testing
Message-ID: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>

Hi,

I have been trying to use biopython with jython 2.7 alpha 2. Here
follows a report.


There are still a few problems (with SeqIO only):
test_SeqIO ... ERROR
test_SeqIO_QualityIO ... FAIL
test_SeqIO_index ... FAIL

The errors are something like (all the same kind of stuff really):

SeqIO
======================================================================
ERROR: test_SeqIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 341, in runTest
    suite = unittest.TestLoader().loadTestsFromName(name)
  File "/home/tr353/local/jython/Lib/unittest/loader.py", line 91, in
loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/home/tr353/local/jython/Lib/unittest/loader.py", line 91, in
loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 627, in <module>
    check_simple_write_read(records)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 352, in
check_simple_write_read
    records2 = list(SeqIO.parse(handle=handle, format=format))
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO.py", line 352, in
check_simple_write_read
    records2 = list(SeqIO.parse(handle=handle, format=format))
  File "/home/tr353/tmp/biopython/Bio/SeqIO/__init__.py", line 537, in parse
    for r in i:
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 828, in SffIterator
    header_length, index_offset, index_length, number_of_reads, \
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 285, in
_sff_file_header
    magic_number, ver0, ver1, ver2, ver3, index_offset, index_length, \
error: unpack str size does not match format


SeqIO_QualityIO
======================================================================
ERROR: test_E3MFGYR02 (test_SeqIO_QualityIO.TestWriteRead)
Write and read back E3MFGYR02_random_10_reads.sff
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
551, in test_E3MFGYR02
    self.check(os.path.join("Roche", "E3MFGYR02_random_10_reads.sff"), "sff",
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
477, in check
    write_read(filename, format, f)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_QualityIO.py", line
52, in write_read
    records2 = list(SeqIO.parse(handle,out_format))
  File "/home/tr353/tmp/biopython/Bio/SeqIO/__init__.py", line 537, in parse
    for r in i:
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 828, in SffIterator
    header_length, index_offset, index_length, number_of_reads, \
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 285, in
_sff_file_header
    magic_number, ver0, ver1, ver2, ver3, index_offset, index_length, \
error: unpack str size does not match format

SeqIO.index
======================================================================
ERROR: test_sff_Roche_greek_sff_get_raw (test_SeqIO_index.IndexDictTests)
Index sff file Roche/greek.sff get_raw
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_index.py", line
430, in <lambda>
    f = lambda x : x.get_raw_check(fn, fmt, alpha, c)
  File "/home/tr353/tmp/biopython/Tests/test_SeqIO_index.py", line
301, in get_raw_check
    rec2 = SeqIO.SffIO._sff_read_seq_record(handle,
  File "/home/tr353/tmp/biopython/Bio/SeqIO/SffIO.py", line 561, in
_sff_read_seq_record
    read_header_length, name_length, seq_len, clip_qual_left, \
error: unpack str size does not match format


I suppose this is because of issues with the alpha version of jython 2.7.

Tiago
PS - I do not have all external dependencies installed on my machine,
so a few modules are untested.

-- 
"Liberty for wolves is death to the lambs" - Isaiah Berlin


From p.j.a.cock at googlemail.com  Wed Aug 15 11:18:50 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Aug 2012 12:18:50 +0100
Subject: [Biopython-dev] jython/testing
In-Reply-To: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
References: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
Message-ID: <CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>

On Tue, Aug 14, 2012 at 9:39 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I have been trying to use biopython with jython 2.7 alpha 2. Here
> follows a report.
>
>
> There are still a few problems (with SeqIO only):
> test_SeqIO ... ERROR
> test_SeqIO_QualityIO ... FAIL
> test_SeqIO_index ... FAIL
>
> The errors are something like (all the same kind of stuff really):
>
> ...

I see that on my machine too. From looking at the tracebacks and
the associated code, the failures all involve BytesIO (or StringIO
depending on the Python version). Note that BytesIO is new in
Python 2.6, and thus also new in Jython 2.7 compared to Jython 2.5.

This is enough to demonstrate a bug in Jython 2.7a2, which explains
some if not all of our unit test failures:

Expected behaviour:

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import BytesIO
>>> raw = open("Roche/E3MFGYR02_random_10_reads.sff", "rb").read()
>>> raw == BytesIO(raw).read()
True
>>> len(raw)
17592
>>> quit()

Broken behaviour:

$ ~/jython2.7a2/jython
Jython 2.7a2 (default:9c148a201233, May 24 2012, 15:49:00)
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_33
Type "help", "copyright", "credits" or "license" for more information.
>>> from io import BytesIO
>>> raw = open("Roche/E3MFGYR02_random_10_reads.sff", "rb").read()
>>> raw == BytesIO(raw).read()
False
>>> len(raw)
17592
>>> len(BytesIO(raw).read())
51577
>>> BytesIO(raw).read()[:100]
"bytearray(b'.sff\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00A\\xb8\\x00\\x00\\x02\\xfc\\x00\\x00\\x00\\n\\x01\\xb8\\"
>>> raw[:100]
'.sff\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00A\xb8\x00\x00\x02\xfc\x00\x00\x00\n\x01\xb8\x00\x04\x01\x90\x01TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT'
>>> quit()


I will report this.

Peter


From p.j.a.cock at googlemail.com  Wed Aug 15 11:26:19 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 15 Aug 2012 12:26:19 +0100
Subject: [Biopython-dev] jython/testing
In-Reply-To: <CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>
References: <CAA9RGEOrdz2Bm_hs0M-Zod4g0J9rK=K574DjM0BDx51KjBH4sw@mail.gmail.com>
	<CAKVJ-_77GP9-GqKVn4fovL3cv-tbK3EcrkUb=pp9Y4Gp2db8_g@mail.gmail.com>
Message-ID: <CAKVJ-_48f_q0AGstNRDzTuv3QDMJxHrzeZ5wfSC-Hcash8E3pw@mail.gmail.com>

On Wed, Aug 15, 2012 at 12:18 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Aug 14, 2012 at 9:39 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>> Hi,
>>
>> I have been trying to use biopython with jython 2.7 alpha 2. Here
>> follows a report.
>>
>>
>> There are still a few problems (with SeqIO only):
>> test_SeqIO ... ERROR
>> test_SeqIO_QualityIO ... FAIL
>> test_SeqIO_index ... FAIL
>>
>> The errors are something like (all the same kind of stuff really):
>>
>> ...
>
> I see that on my machine too. From looking at the tracebacks and
> the associated code, the failures all involve BytesIO (or StringIO
> depending on the Python version). Note that BytesIO is new in
> Python 2.6, and thus also new in Jython 2.7 compared to Jython 2.5.
>
> This is enough to demonstrate a bug in Jython 2.7a2, which explains
> some if not all of our unit test failures:
>
> ...
>
> I will report this.

Filed as http://bugs.jython.org/issue1959 with a shorter test case.

Peter


From arklenna at gmail.com  Fri Aug 17 01:58:46 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Thu, 16 Aug 2012 21:58:46 -0400
Subject: [Biopython-dev] GSoC Python variant (penultimate) update
Message-ID: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>

Post: http://arklenna.tumblr.com/post/29592108099/

I have been considering how to handle gene strandedness. As long as
I'm correctly interpreting the following position, my coordinate
mapper should produce the correct coordinates with negative strand or
mixed strand features.

GenBank: join(complement(25..30), 36..40)
Biopython: FeatureLocation(24, 30, -1) + FeatureLocation(35, 40)

(please click through to post for monospaced font)

    23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
       <----------------                ------------->
       5  4  3  2  1  0                 6  7  8  9  10

I have to admit that it wasn't until I read a BioStar
[post](http://biostars.org/post/show/3423/forward-and-reverse-strand-conventions/)
earlier this week that I fully understood the relationship between
plus/minus forward/reverse sense/antisense coding/template strands. So
please let me know as soon as possible if I've made a mistake in the
above code.

`c2g` yields the correct genome position, but not the strand. I still
need to integrate strand information into my `GenomePosition` object
and/or partially merge it with `ExactLocation`. This weekend I intend
to expand documentation and write a brief cookbook entry.

Cheers,

Lenna


From arnaud.poret at gmail.com  Fri Aug 17 07:38:28 2012
From: arnaud.poret at gmail.com (Arnaud Poret)
Date: Fri, 17 Aug 2012 09:38:28 +0200
Subject: [Biopython-dev] obo parser
Message-ID: <CACYTbORMZHX9vP5DfpamzikCKcP15M8OLqqGpNshViHRb-qPLw@mail.gmail.com>

Hi everyone,

I'm a newcomer and I'm writing an obo parser for importing ontologies
into python.

I'm not sure, but has already BioPython an obo parser?

If yes, I'm reinventing the wheel...

If no, I'll be glad to contribute.

Arnaud.


From p.j.a.cock at googlemail.com  Fri Aug 17 08:15:10 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 09:15:10 +0100
Subject: [Biopython-dev] obo parser
In-Reply-To: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>
References: <CACYTbOT=NPqjpTnz4mAa-r1FaQH4X1gBVvr8HDvR-CYYE5iKRA@mail.gmail.com>
Message-ID: <CAKVJ-_68nurF0d6OvS=ne5MDLANkCNRXQfvZFfqHwqH-ZqOFLw@mail.gmail.com>

On Mon, Aug 13, 2012 at 3:07 PM, Arnaud Poret <arnaud.poret at gmail.com> wrote:
> Hi everyone,
>
> I'm a newcomer and I'm writing an obo parser for importing ontologies
> into python. I'm not sure, but has already BioPython an obo parser?
>
> If yes, I'm reinventing the wheel...
>
> If no, I'll be glad to contribute.

There does seem to be interest, questions about ontologies,
GO and OBO crop up every so often. There were some
people actually working on this too, but it has gone quiet.

e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009384.html
http://lists.open-bio.org/pipermail/biopython-dev/2011-July/009031.html

Chris Lasher's repository has vanished, but Eric's older
work is still online (CC'd):
https://github.com/kellrott/biopython/tree/gosupport

Eric & Chris - where do things stand?

Regards,

Peter


From p.j.a.cock at googlemail.com  Fri Aug 17 08:21:01 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 09:21:01 +0100
Subject: [Biopython-dev] [GSoC] GSoC Python variant (penultimate) update
In-Reply-To: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>
References: <CALfq9t+hXQV+9Gjvn7x_nVgX0t2yXGtwtuk4Xzz3m3a4FA2wpw@mail.gmail.com>
Message-ID: <CAKVJ-_5dxeVMvpemNDfUUSQLzTba9Bpt=LgwTLBzPRy-EtVavA@mail.gmail.com>

On Fri, Aug 17, 2012 at 2:58 AM, Lenna Peterson <arklenna at gmail.com> wrote:
>
> I have to admit that it wasn't until I read a BioStar
> [post](http://biostars.org/post/show/3423/forward-and-reverse-strand-conventions/)
> earlier this week that I fully understood the relationship between
> plus/minus forward/reverse sense/antisense coding/template strands. So
> please let me know as soon as possible if I've made a mistake in the
> above code.

Given this is nice and fresh in your mind, can you suggest any
clarifications to the Biopython Tutorial section talking about
this issue? The section on transcription & translation starting:

"Before talking about transcription, I want to try and clarify
the strand issue. Consider the following (made up) stretch
of double stranded DNA which encodes a short peptide: ..."

Hmm. That should probably say "I want to try to clarify...".

Peter


From p.j.a.cock at googlemail.com  Fri Aug 17 16:42:57 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 17 Aug 2012 17:42:57 +0100
Subject: [Biopython-dev] BioSQL tests
Message-ID: <CAKVJ-_6b3FBfncEz_u16xKhKL4fAdKvRfFM1==8CRs-HHedOfg@mail.gmail.com>

Dear all,

I realised this week that I didn't have a working BioSQL
test setup under either MySQL or PostgreSQL, and
the buildbot machines are not testing these either.
Therefore I have re-factored the BioSQL unit tests as
follows:

First I turned my print-and-compare test_BioSQL_SeqIO.py
script into proper UnitTest based tests, so that all the BioSQL
tests could be combined in one file, test_BioSQL.py.

This allowed a further reorganisation to allow any one
machine to test all the supported back ends one after
the other - previously the setup only tested one backend
(defaulting to SQLite3). We now have three test scripts
named after the backend library used to connect to the
database:

test_BioSQL_MySQLdb.py
test_BioSQL_psycopg2.py
test_BioSQL_sqlite3.py

Subsequently I modified our TravisCI configuration to
install the required dependencies to run all these tests.
The default usernames and passwords for MySQLdb
and postgresql are set to match those under TravisCI.
Local users would probably have to adjust these values
(in the same way they used to prior to the refactoring).

Note that psycopg2 only works on C Python 2 & 3 for
now (there is a PyPy alternative I have not looked into).

MySQLdb only works on C Python 2 (there is a problem
installing it under Python 3.2).

This did show I'd broken using BioSQL under MySQLdb,
at least under this particular version, fixed now:
https://github.com/biopython/biopython/commit/4a67d851d1eda0a138b604c8aeffc151d331a29b

So the good news is that now TravisCI will run the
BioSQL tests on all three database backends, on
several versions of Python (but just on Linux).
http://travis-ci.org/biopython/biopython/

What I have not addressed is if/how we should deal
with test database setting under buildbot - perhaps by
environment variable overrides?

If anyone would like to look into using MySQLdb and/or
psycopg2 under PyPy and Jython, that would also be
useful too.

Regards,

Peter


From arklenna at gmail.com  Mon Aug 20 04:22:36 2012
From: arklenna at gmail.com (Lenna Peterson)
Date: Mon, 20 Aug 2012 00:22:36 -0400
Subject: [Biopython-dev] GSoC python variant final update
Message-ID: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>

Post: http://arklenna.tumblr.com/post/29808300789/

The coordinate mapper, with updated documentation, is now located on
this branch: https://github.com/lennax/biopython/tree/f_loc4
It awaits the merging of Peter's f_loc4 branch.

I've written an entry on coordinate mapping for the Cookbook:
http://biopython.org/wiki/Coordinate_mapping

Additionally, at Peter's suggestion, I've written a clarification of
strand as it relates to transcription and translation. It's available
here: https://docs.google.com/document/d/11R7EOJXn90lN5_SmaPOyN5rFfPQybbCbUBo6EY0R0pA/edit

It's been a great experience working with this project this summer.
Thank you to everyone involved.

Cheers,

Lenna


From mjldehoon at yahoo.com  Mon Aug 20 12:38:37 2012
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 20 Aug 2012 05:38:37 -0700 (PDT)
Subject: [Biopython-dev] Bio.Cluster in the main Biopython documentation
Message-ID: <1345466317.39160.YahooMailClassic@web164003.mail.gq1.yahoo.com>

Dear all,

Previously the documentation for Bio.Cluster was only available as a separate PDF on the Biopython website. I have now integrated this documentation into the Biopython Tutorial. The new tutorial is already uploaded to the repository, and will be visible at http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html once the nightly build is done.

Since the documentation for Bio.Cluster contains many references to the literature, I started using the LaTeX \cite command, which are understood and formatted properly by Hevea. While at it, I also converted the references I could find in other parts of the Tutorial to \cite references. This creates a list of references at the end of the Tutorial. Please let us know if you don't like this approach.

The documentation for Bio.Cluster is fairly long, and while modifying it for inclusion into the Tutorial some mistakes may have crept in. Please let me know if you find any such mistakes (or feel free to fix them yourself, if it is clear what the text should be).

For now we can leave the PDF with the separate description of Bio.Cluster on the website as is for users of Biopython 1.60, but once the next version of Biopython is out I would like to replace it with a PDF referring to the main Tutorial.

Thanks,
-Michiel.


From chapmanb at 50mail.com  Mon Aug 20 12:45:49 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 20 Aug 2012 08:45:49 -0400
Subject: [Biopython-dev] [GSoC] GSoC python variant final update
In-Reply-To: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
References: <CALfq9t+quDjby4Dvg1iscs-oSAutAR9yzmT3G_b4-6oFVQxFVw@mail.gmail.com>
Message-ID: <87harxzq82.fsf@fastmail.fm>


Lenna;
Thanks for the documentation and getting that all code moved into a
branch. This looks great and looking forward to having it merged when
Peter's work goes in.

Thanks also for all the great work this summer and good luck on the
first day of PhD school,
Brad


> Post: http://arklenna.tumblr.com/post/29808300789/
>
> The coordinate mapper, with updated documentation, is now located on
> this branch: https://github.com/lennax/biopython/tree/f_loc4
> It awaits the merging of Peter's f_loc4 branch.
>
> I've written an entry on coordinate mapping for the Cookbook:
> http://biopython.org/wiki/Coordinate_mapping
>
> Additionally, at Peter's suggestion, I've written a clarification of
> strand as it relates to transcription and translation. It's available
> here: https://docs.google.com/document/d/11R7EOJXn90lN5_SmaPOyN5rFfPQybbCbUBo6EY0R0pA/edit
>
> It's been a great experience working with this project this summer.
> Thank you to everyone involved.
>
> Cheers,
>
> Lenna
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc


From redmine at redmine.open-bio.org  Tue Aug 21 10:27:14 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:27:14 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] (New) PDBParser fails to
	parse PDBs produced by PatchDock
Message-ID: <redmine.issue-3379.20120821102714@redmine.open-bio.org>


Issue #3379 has been reported by David Cain.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 10:27:14 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:27:14 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] (New) PDBParser fails to
	parse PDBs produced by PatchDock
Message-ID: <redmine.issue-3379.20120821102714@redmine.open-bio.org>


Issue #3379 has been reported by David Cain.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 10:36:07 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 10:36:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14935.20120821103607@redmine.open-bio.org>


Issue #3379 has been updated by Peter Cock.


If as I understood you, PatchDock is producing invalid PDB files, have you raised the issue with them too?

I accept that out of practicality, a little lenience in our parsers can be helpful, and may be appropriate in this case. Do you have any sample data files you could share - for example a valid PDB file before processing, and the problematic PDB file after processing with PatchDock?
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 11:08:53 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:08:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14936.20120821110853@redmine.open-bio.org>


Issue #3379 has been updated by Jo?o Rodrigues.


Disclaimer: I am a HADDOCK team member and therefore in direct competition with PATCHDOCK.

I totally disagree with this. This is not compliant with the PDB format at all: "Each file should terminate with a line containing only the word END". Having data beyond END is just bad practice in my opinion. There are two statements to close a chain/model - ENDMDL and TER - and these should be used. Sorry to be a pain, but if we are fixing this it's just encouraging a bad practice.. standards are there to be respected.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 11:21:57 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:21:57 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14937.20120821112157@redmine.open-bio.org>


Issue #3379 has been updated by Peter Cock.


Given Joao's comments, lenience does not sound appropriate in this case.

If the parser's current behaviour is to silently ignore data after an END line, that seems less than ideal.

How about we add a clear error/warning to the parser if there is content in the file after an END line? i.e. Treat it as an exception in strict mode, treat it as a warning in permissive mode (and continue to ignore anything after the END line)?

A sample file would be helpful to verify this, and could even be used for a unit test (with your permission).
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 11:26:48 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 11:26:48 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14938.20120821112648@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.


I completely agree with Jo?o, actually- disrespecting the file spec is a bad idea. I just figured I'd bring this to discussion.

I very much think a warning of some sort should be raised, though. Half the structure silently failing to parse is a big problem. I think your solution is perfect, and I'd be very happy to write the unit test.

I'll upload a sample file in just a bit.
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 12:05:37 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 12:05:37 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14939.20120821120537@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.

File complex.1.pdb added

I ran PatchDock's antigen-antibody complex mode on an antigen and antibody file (2fgw and 5ebx) that individually parse without warnings. (Note that I chose these files at random; their docking is useful only as an example).

I've attached the complex file produced by @PatchDock/transOutput.pl@) (only the top-scoring conformation considered). As you can see, the @CONECT@ and @END@ records of the antibody will stop the rest of the file from being parsed.

I'd be happy to take a stab at writing the error/warning message for premature @END@/@CONECT@ records in addition to the unit test that checks for this behavior.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug 21 12:35:07 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 21 Aug 2012 12:35:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14940.20120821123507@redmine.open-bio.org>


Issue #3379 has been updated by Jo?o Rodrigues.


Agreed with Peter that it should raise an exception/warning. This is really pure concatenation of the two PDBs.. If you could have a go at it, I could test it too. Thanks David.
----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Tue Aug 21 16:01:21 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 21 Aug 2012 18:01:21 +0200
Subject: [Biopython-dev] GSoC SearchIO project
In-Reply-To: <CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>
References: <1334014051.14489.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<87lim4h07o.fsf@fastmail.fm>
	<CAKVJ-_7b=RGpDGX3v0x5PJFWgW5dB3Otfg8Sq2Gehhg4SU2bUg@mail.gmail.com>
Message-ID: <CADEGkF4URxn5zwXOwU1J6s21U22aLwTdUw3aU6G0=MRt+LbfOA@mail.gmail.com>

On Tue, Aug 14, 2012 at 9:49 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Apr 10, 2012 at 1:58 AM, Brad Chapman wrote:
>> Michiel;
>>> Hi Eric, Peter,
>>>
>>> > How about Bio.Search, for now?
>>>
>>> I would prefer Bio.Pairwise or Bio.Align.Pairwise, since that tells
>>> users something about what the module is for. Bio.Search could be
>>> anything (search PubMed? search the Entrez databases? search Google?
>>> anyway Bio.Search does not suggest that this module is about pairwise
>>> alignments). But Peter previously mentioned that he doesn't like
>>> Bio.Pairwise; can we convince you?
>>
>> I agree with Peter on this one. The module is primarily about searching
>> a sequence database with an input via multiple methods, not about
>> pairwise alignment of two sequences with is what Bio.Align.Pairwise
>> suggests to me.
>>
>> Brad
>
> On potential problem with Bio.Search (on top of concerns raised
> here about vagueness) Bow and I were just talking about during
> our weekly GSoC video call was the existence of Bio/Search.py
> which is obsolete and long overdue for removal. I have just
> deprecated it (something I forgot to do before the last release):
> https://github.com/biopython/biopython/commit/5a275ccd1df3def40df1eef517af755d373dadd8
>
> We'd earlier talked about using Bio.Search as the namespace. I was
> worried about the potential existence on a user's machine of both
> Bio/Search.py (the old obsolete code) and Bio/Search/__init__.py
> (aka SearchIO, the new module) and which would take precedence
> when doing: from Bio import Search
>
> Given how Python module installations work, that seems highly
> likely to occur. The good news is that the package would take
> priority - see http://www.python.org/doc/essays/packages.html
>
>>>>> What If I Have a Module and a Package With The Same Name?
>>>>>
>>>>> You may have a directory (on sys.path) which has both a module
>>>>> spam.py and a subdirectory spam that contains an __init__.py
>>>>> (without the __init__.py, a directory is not recognized as a package).
>>>>> In this case, the subdirectory has precedence, and importing spam
>>>>> will ignore the spam.py file, loading the package spam instead. If
>>>>> you want the module spam.py to have precedence, it must be
>>>>> placed in a directory that comes earlier in sys.path.
>
> So there is no technical reason to avoid Bio.Search as an
> option for the Bio.SearchIO namespace. We could then
> have Bio.Search.Applications for command line wrappers,
> consistent with Bio.Phylo.Applications, Bio.Motif.Applications
> and Bio.Align.Applications.
>
> Of course, Bio.Search is still perhaps too broad a name... but
> on balance perhaps it is still better than Bio.SearchIO?
>
> Regards,
>
> Peter

Hi everyone,

If I may add my two cents, for now I am in favor of putting the module
under Bio.Search. It is not the best name out there (it does sound a
bit vague), but it's the one that seem to be the most intuitive (until
a better alternative comes out). There were some other alternatives
that I and Peter have discussed, but they seem less appealing for us.
You're free to add your thoughts on these of course :) :

- Bio.SeqSearch. This sounds ok, but when you consider we have
Bio.Seq, Bio.SeqRecord, Bio.SeqFeature, and Bio.SeqUtils, it becomes
quite confusing quickly.

- Bio.PSearch ('p' for pairwise). This one seemed the less intuitive
among the three options, so I'm not so big on this.

For now, I'm still writing everything (code, docstrings, tutorial)
using SearchIO. I suppose it's better if we could agree on a more
suitable name, though.

On another note, I'm also in favor of using the Bio.Phylo module
skeleton for Bio.SearchIO / Bio.Search. We may then group all sequence
search-related application wrappers under Applications (I actually
prefers 'app' for better PEP8 compliance, but that's another
discussion) and perhaps even refactor our remote search calls (e.g.
the 'qblast' module) under Bio.Search as well.

cheers,
Bow


From w.arindrarto at gmail.com  Tue Aug 21 16:09:07 2012
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 21 Aug 2012 18:09:07 +0200
Subject: [Biopython-dev] GSoC Project Update -- 10
In-Reply-To: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
References: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
Message-ID: <CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>

Hi everyone,

I've just posted my last entry for my Google Summer of Code project
this year: http://bow.web.id/blog/2012/08/summers-over/

I want to say thank you to the Biopython community, especially Peter
for mentoring me this summer :), to OBF for accepting my proposal, and
to anyone who has helped and given me valuable inputs for me
throughout the project :).

It's been a priceless learning experience, and I only hope that my
code will be useful in return.

There are still some things to do before the code is merge-ready and
even more when the code is included in an official release, so I'll
still be around.

cheers,
Bow


From mictadlo at gmail.com  Wed Aug 22 00:55:30 2012
From: mictadlo at gmail.com (Mic)
Date: Wed, 22 Aug 2012 10:55:30 +1000
Subject: [Biopython-dev] [BioRuby] Final GSoC report
In-Reply-To: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
References: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
Message-ID: <CAOP6n=j9HzYzfDewwYjBWPUXkDt0JY-5_=OS90ta3h-y-Uhbag@mail.gmail.com>

Hi,
Python is able to connect to D with help of http://pyd.dsource.org/ .

Maybe it would be something for Biopython

Cheers,
Mic

On Wed, Aug 22, 2012 at 5:11 AM, Marjan Povolni <marian.povolny at gmail.com>wrote:

> http://blog.mpthecoder.com/post/29910330225/final-gsoc-report
>
> *Summary*
>
> Yesterday I tagged the 0.4 release of gff3-pltools, and that marks the end
> of the summer. At least in GSoC terms. Should I say end of the project? I
> don?t think so. The tools can still be improved, and the Ruby bindings
> should follow.
>
> The major changes since the last release include the following:
>
>    - filtering functionality has been moved to a separate utility:
>    gff3-filter, along with a new language for specifying filtering
> expressions,
>    - conversion to table format of selected fields has been moved to a
>    separate utility: gff3-select. However, the ?select option is still
> part of
>    gff3-filter,
>    - gff3-ffetch is now fetching FASTA sequences from GFF3 and FASTA files
>    for CDS and mRNA records and features,
>    - man pages for utilities.
>
> **
> The original idea was to create a GFF3/GTF parser in D and Ruby bindings.
> The Ruby bindings part didn?t work out because there is still no support
> for D shared libraries in Linux, but instead there are now a few useful
> command-line tools for processing GFF3 which can be used without
> programming knowledge.
>
> To me, the summer was fun, challenging, and a great experience. I even got
> to meet my mentor in person, and other community members too, and to make
> my first steps in bioinformatics. I even gave a small presentation at the
> EU-codefest. What a summer it was!
>
> Thanks to everybody who made it possible: Google, Open Bioinformatics
> Foundation and my mentor Pjotr Prins.
>
> --
> Marjan
>
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>


From p.j.a.cock at googlemail.com  Wed Aug 22 08:42:03 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 09:42:03 +0100
Subject: [Biopython-dev] [GSoC] GSoC Project Update -- 10
In-Reply-To: <CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>
References: <CADEGkF7LHf=qVQ4pF4HzmuRp5ddEZ0skSwGmGe5j0j0JJBsjgw@mail.gmail.com>
	<CADEGkF7=dnCkaDuBX0QeLr_z7jN3N2=5HqvCAh-2KvbOqJv51A@mail.gmail.com>
Message-ID: <CAKVJ-_4ttoJscDgAe57tqo_p=LpEaKNs0CazTZV_0MjpfkkQHw@mail.gmail.com>

On Tue, Aug 21, 2012 at 5:09 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi everyone,
>
> I've just posted my last entry for my Google Summer of Code project
> this year: http://bow.web.id/blog/2012/08/summers-over/
>
> I want to say thank you to the Biopython community, especially Peter
> for mentoring me this summer :), to OBF for accepting my proposal, and
> to anyone who has helped and given me valuable inputs for me
> throughout the project :).
>
> It's been a priceless learning experience, and I only hope that my
> code will be useful in return.
>
> There are still some things to do before the code is merge-ready and
> even more when the code is included in an official release, so I'll
> still be around.
>
> cheers,
> Bow

Thank you Bow,

It has been a pleasure to mentor you, and I'm excited about
getting this (and Lenna's and other branches) into Biopython.

Now, back to the module naming discussion... ;)
http://lists.open-bio.org/pipermail/biopython-dev/2012-August/009868.html
http://lists.open-bio.org/pipermail/biopython-dev/2012-August/009888.html

Peter


From p.j.a.cock at googlemail.com  Wed Aug 22 11:07:11 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 12:07:11 +0100
Subject: [Biopython-dev] Beta code in the official releases?
Message-ID: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>

Hi all,

One of the ideas I discussed with Bow during this GSoC
project was introducing a new warning, something like
Bio.BiopythonBetaCode (the exact name isn't important),
to be used to label new experimental modules for which
we *expect* there to be changes in the next release.

The idea is to combine the simplicity of distribution and
installation of the 'monolithic' Biopython library with some
of the flexibility offered by a more modular approach.
This would be particularly helpful for those on Windows,
where installing a Biopython branch from git is quite a
daunting task.

The idea is that in one of the next releases you'd be able
to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
and see something like this:

>>> from Bio import SearchIO
Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
beta, and likely to change
  warnings.warn("Bio.SearchIO is in beta, and likely to change",
BiopythonBetaCode)

By using a specific warning class, any keen beta tester can
silence all the BiopythonBetaCode warnings if they wished to.

Is anyone familiar enough with Linux packaging polices to
have any thoughts on how they would treat this? Provided
we only use this for self contained modules, they could
potentially split the beta-modules into a sub-package (in the
same way that Biopython and its BioSQL support are split
in Debian).

I envision using this as a way to encourage wider 'beta testing'
of self contained modules which are close to a stable release.
Does anyone think this is a good idea? Are there any downsides
I'm overlooking?

Thanks,

Peter


From p.j.a.cock at googlemail.com  Wed Aug 22 11:10:56 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 22 Aug 2012 12:10:56 +0100
Subject: [Biopython-dev] [BioRuby] [GSoC] Final GSoC report
In-Reply-To: <20120822104352.GA11847@thebird.nl>
References: <CADKP5Cm-ka5LVrNERv-TrJZxevrREN2oM2PkekdyBjnauVNKrA@mail.gmail.com>
	<CAOP6n=j9HzYzfDewwYjBWPUXkDt0JY-5_=OS90ta3h-y-Uhbag@mail.gmail.com>
	<CAE8u=e7f=ZkTWhxL0VgCanoKkq4kPHzWA-dwmnbqfNqUMFeP5A@mail.gmail.com>
	<20120822104352.GA11847@thebird.nl>
Message-ID: <CAKVJ-_4ehZR-vW6otB8=cxrgQ1TGLsG5+X1+qOrT=UAk9=XfVQ@mail.gmail.com>

On Wed, Aug 22, 2012 at 11:43 AM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> Yes, linking to D from an interpreted language is not hard, basically
> it is the same calling convention as that of C. So a D shared library
> looks the same as a C shared library to the calling code - all
> existing foreign function interfaces (FFI) work. That is the good
> news.

How do things stand from a cross-platform perspective?
i.e. When might this be doable on Linux, Mac OS X, and Windows?
(and other Unix like platforms of potential interest)

> The bad news, as Artem points out, is that there is a problem in the
> D garbage collector. Items get collected, which should not. This will
> be fixed sooner or later. The commitment is there, and it is moving
> up the priority list.

Is there a D issue/bug tracker for this?

Thanks,

Peter


From chapmanb at 50mail.com  Thu Aug 23 00:42:09 2012
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 22 Aug 2012 20:42:09 -0400
Subject: [Biopython-dev] Beta code in the official releases?
In-Reply-To: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>
References: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>
Message-ID: <877gsq8mn2.fsf@fastmail.fm>


Peter;
+1. I'm for making the process of getting new code into
Biopython a bit quicker and this seems like a nice step in that
direction. With code has been well designed tested and documented, this
will help speed the transition into releases and get more eyes on it
quicker, while allowing some potential breaking changes as beta
functionality gets finalized.

Thanks for the good suggestion,
Brad


> Hi all,
>
> One of the ideas I discussed with Bow during this GSoC
> project was introducing a new warning, something like
> Bio.BiopythonBetaCode (the exact name isn't important),
> to be used to label new experimental modules for which
> we *expect* there to be changes in the next release.
>
> The idea is to combine the simplicity of distribution and
> installation of the 'monolithic' Biopython library with some
> of the flexibility offered by a more modular approach.
> This would be particularly helpful for those on Windows,
> where installing a Biopython branch from git is quite a
> daunting task.
>
> The idea is that in one of the next releases you'd be able
> to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
> and see something like this:
>
>>>> from Bio import SearchIO
> Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
> beta, and likely to change
>   warnings.warn("Bio.SearchIO is in beta, and likely to change",
> BiopythonBetaCode)
>
> By using a specific warning class, any keen beta tester can
> silence all the BiopythonBetaCode warnings if they wished to.
>
> Is anyone familiar enough with Linux packaging polices to
> have any thoughts on how they would treat this? Provided
> we only use this for self contained modules, they could
> potentially split the beta-modules into a sub-package (in the
> same way that Biopython and its BioSQL support are split
> in Debian).
>
> I envision using this as a way to encourage wider 'beta testing'
> of self contained modules which are close to a stable release.
> Does anyone think this is a good idea? Are there any downsides
> I'm overlooking?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From redmine at redmine.open-bio.org  Mon Aug 27 04:24:16 2012
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 27 Aug 2012 04:24:16 +0000
Subject: [Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse
	PDBs produced by PatchDock
References: <redmine.issue-3379.20120821102714@redmine.open-bio.org>
Message-ID: <redmine.journal-14944.20120827042416@redmine.open-bio.org>


Issue #3379 has been updated by David Cain.


Regarding "pure concatenation," I wasn't exaggerating when I said really ugly Perl scripts. =)

I created a "pull request on the Biopython GitHub repository":https://github.com/biopython/biopython/pull/60. Could you give me some feedback on my solution? If the devs agree on a certain behavior, I'll start writing some unit tests.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From Andrew.Sczesnak at med.nyu.edu  Wed Aug 29 17:54:08 2012
From: Andrew.Sczesnak at med.nyu.edu (Sczesnak, Andrew)
Date: Wed, 29 Aug 2012 17:54:08 +0000
Subject: [Biopython-dev] Beta code in the official releases?
In-Reply-To: <877gsq8mn2.fsf@fastmail.fm>
References: <CAKVJ-_6_8JUXmCx5q-eghSczNxqPmSbaaTc_GJ_QCqQOtjGUbg@mail.gmail.com>,
	<877gsq8mn2.fsf@fastmail.fm>
Message-ID: <1F36894B170C114F9C902C20BC5129981AD23835@MSGWCDCPMB25.nyumc.org>

+1

It's been over a year since I first submit my MAF code!
________________________________________
From: biopython-dev-bounces at lists.open-bio.org [biopython-dev-bounces at lists.open-bio.org] on behalf of Brad Chapman [chapmanb at 50mail.com]
Sent: Wednesday, August 22, 2012 8:42 PM
To: Peter Cock; Biopython-Dev Mailing List
Subject: Re: [Biopython-dev] Beta code in the official releases?

Peter;
+1. I'm for making the process of getting new code into
Biopython a bit quicker and this seems like a nice step in that
direction. With code has been well designed tested and documented, this
will help speed the transition into releases and get more eyes on it
quicker, while allowing some potential breaking changes as beta
functionality gets finalized.

Thanks for the good suggestion,
Brad


> Hi all,
>
> One of the ideas I discussed with Bow during this GSoC
> project was introducing a new warning, something like
> Bio.BiopythonBetaCode (the exact name isn't important),
> to be used to label new experimental modules for which
> we *expect* there to be changes in the next release.
>
> The idea is to combine the simplicity of distribution and
> installation of the 'monolithic' Biopython library with some
> of the flexibility offered by a more modular approach.
> This would be particularly helpful for those on Windows,
> where installing a Biopython branch from git is quite a
> daunting task.
>
> The idea is that in one of the next releases you'd be able
> to try Bio.SearchIO (or Bio.Struct or GFF or Variants or ...)
> and see something like this:
>
>>>> from Bio import SearchIO
> Bio/SearchIO/__init__.py:16: BiopythonBetaCode: Bio.SearchIO is in
> beta, and likely to change
>   warnings.warn("Bio.SearchIO is in beta, and likely to change",
> BiopythonBetaCode)
>
> By using a specific warning class, any keen beta tester can
> silence all the BiopythonBetaCode warnings if they wished to.
>
> Is anyone familiar enough with Linux packaging polices to
> have any thoughts on how they would treat this? Provided
> we only use this for self contained modules, they could
> potentially split the beta-modules into a sub-package (in the
> same way that Biopython and its BioSQL support are split
> in Debian).
>
> I envision using this as a way to encourage wider 'beta testing'
> of self contained modules which are close to a stable release.
> Does anyone think this is a good idea? Are there any downsides
> I'm overlooking?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Thu Aug 30 08:16:13 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 09:16:13 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
Message-ID: <CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>

On Tue, Aug 14, 2012 at 12:06 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Further to that work, I updated some older code for a JAGGY
> sigil, and also an OCTO sigil (names open to suggestions),
> which are on my gd-sigils branch which has documentation
> in the tutorial, including this image of the expanded sigil set:
> https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png
>
> This is a slight simplification of the old JAGGY code in that it
> does (yet) allow control of the teeth length (e.g. to have just
> teeth on one end). I am thinking this could be exposed like
> the existing arrow specific options.
>
> I originally created the JAGGY sigil for marking a break point
> in a contig/scaffold. For instance, you might want to mark a
> run of NNNNN bases in a scaffold with a jaggy sigil (straddling
> both strands) as a clear visual marker to explain why there
> were no genes.
>
> Other sigil ideas I pondered include an OVAL, which should
> be quite easy for the linear diagrams, but rather more work to
> implement for circular diagrams due to the distorted curves.
>
> Peter

Do people think (either of) these two sigils are worth adding
to the main branch?

Potentially they can be generalised - the JAGGY sigil in
particular would be much more flexible if the head & tail
teeth presence (or tooth length?) could be controlled. e.g.
to draw a sigil with a flat edge on the left, and a jagged
edge on the right.

Peter


From Leighton.Pritchard at hutton.ac.uk  Thu Aug 30 08:51:50 2012
From: Leighton.Pritchard at hutton.ac.uk (Leighton Pritchard)
Date: Thu, 30 Aug 2012 08:51:50 +0000
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
Message-ID: <E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>


On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:

On Tue, Aug 14, 2012 at 12:06 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:
On Thu, Aug 2, 2012 at 5:12 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:

Further to that work, I updated some older code for a JAGGY
sigil, and also an OCTO sigil (names open to suggestions),
which are on my gd-sigils branch which has documentation
in the tutorial, including this image of the expanded sigil set:
https://github.com/peterjc/biopython/blob/e09e264dd73953554609498c15b67d86686592fb/Doc/images/GD_sigils.png


[?]

Do people think (either of) these two sigils are worth adding
to the main branch?

Yes - I do.

L.

--
Dr Leighton Pritchard
Information and Computing Sciences Group; Weeds, Pests and Diseases Theme
DG31, James Hutton Institute (Dundee)
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:leighton.pritchard at hutton.ac.uk       w:http://www.hutton.ac.uk/staff/leighton-pritchard
gpg/pgp: 0xFEFC205C tel: +44(0)844 928 5428 x8827 or +44(0)1382 568827


________________________________________________________

This email is from the James Hutton Institute, however the views
expressed by the sender are not necessarily the views of the James Hutton
Institute and its subsidiaries. This email and any attachments are confidential and 
are intended solely for the use of the recipient(s) to whom they are addressed.
If you are not the intended recipient, you should not read, copy, disclose or rely on 
any information contained in this email, and we would ask you to contact the 
sender immediately and delete the email from your system.  Although the James 
Hutton Institute has taken reasonable precautions to ensure no viruses are present 
in this email, neither the Institute nor the sender accepts any responsibility for any 
viruses, and it is your responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. 
Charity No. SC041796


From p.j.a.cock at googlemail.com  Thu Aug 30 10:18:57 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 11:18:57 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
	<E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
Message-ID: <CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>

On Thu, Aug 30, 2012 at 9:51 AM, Leighton Pritchard
<Leighton.Pritchard at hutton.ac.uk> wrote:
>
> On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:
>> Do people think (either of) these two sigils are worth adding
>> to the main branch?
>
> Yes - I do.
>
> L.

Done. Branch rebased and applied to master.

Peter


From p.j.a.cock at googlemail.com  Thu Aug 30 11:46:05 2012
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 30 Aug 2012 12:46:05 +0100
Subject: [Biopython-dev] Genome Diagram Sigils, was: Default Behavior
In-Reply-To: <CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>
References: <CAKVJ-_7XxVStvWrte03kQ00HwVD771EdzV8PN3dSTtSM2p0fsA@mail.gmail.com>
	<CAKVJ-_7h_JYG+bDs20_A=YYrGsTQPfyWtWB4pzY4REanYr7haQ@mail.gmail.com>
	<E33884CD-F69C-48DA-8AE6-B6A31644FA81@hutton.ac.uk>
	<CAKVJ-_6vrwse+6vtrdGBEnjbabb4==Yr++uL4cis2kji=vDqnw@mail.gmail.com>
Message-ID: <CAKVJ-_70t56rm+7LYcC4TU6jhdFK_UzULgSOhS44=_iEisbY5A@mail.gmail.com>

On Thu, Aug 30, 2012 at 11:18 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Aug 30, 2012 at 9:51 AM, Leighton Pritchard
> <Leighton.Pritchard at hutton.ac.uk> wrote:
>>
>> On 30 Aug 2012, at Thursday, August 30, 09:16, Peter Cock wrote:
>>> Do people think (either of) these two sigils are worth adding
>>> to the main branch?
>>
>> Yes - I do.
>>
>> L.
>
> Done. Branch rebased and applied to master.
>
> Peter

And you can see the example in the Tutorial here,
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html#sec:gd_sigils

(These sigils all work on circular diagrams too, see
the examples made by test_GenomeDiagram.py)

Peter