From p.j.a.cock at googlemail.com  Mon Oct  3 07:20:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 3 Oct 2011 12:20:21 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
Message-ID: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>

Hi Brad (et al),

You might have seen on Twitter at the end of last week I mentioned
some work to extend Brad's Bio.Graphics.BasicChromosome to allow
features within a chromosome segment, optionally with labels.

The branch is here:
https://github.com/peterjc/biopython/tree/chr_diag

I put together a non-trivial example of showing the tRNA genes in
Arabidopsis as a unit test in test_GraphicsChromosome.py - this is
deliberately showing too many features in order to check the label
placement algorithm:

http://twitpic.com/6sgr1m

This kind of figure is also used for showing SNP placement and genetic
marker loci used in breeding etc.

If I had put more (or a more uniform set of) features you'd get
something worthy of the nickname "millipede diagram", looking like a
segmented body (the chromosome) with thousands of legs (the lines for
the labels).

This isn't quite backwards compatible - the old code draws the
chromosomes left aligned within their allocated space, while I put
them centrally in order to draw labels on either side.

Iddo sounded enthusiastic on Twitter. Does this look worth including
as is? Would someone (doesn't have to be Brad) like to test/review it
please?

Thanks,

Peter

From bioinformed at gmail.com  Mon Oct  3 17:28:21 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Mon, 3 Oct 2011 17:28:21 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
Message-ID: <CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> You might have seen on Twitter at the end of last week I mentioned
> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> features within a chromosome segment, optionally with labels.
>
>

This looks to be extremely useful.  Is there any support for layouts to
stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
don't fit as well in linear displays.  I also think supporting chromosome
bands would be extremely useful.  These could include full cytobands,
centromeres, euchromatic vs hetrochromatic regions, user configurable bands
(e.g. linkage regions, IBD blocks, etc.)

The figure shows off what I'm thinking about the banding and layout, even
though it uses colored circles instead of text labels:
http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf

If there is interest, I may have some time to work on these features once
the basic infrastructure is stable.

Best regards,
-Kevin

From p.j.a.cock at googlemail.com  Mon Oct  3 18:24:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 3 Oct 2011 23:24:12 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
Message-ID: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>

On Monday, October 3, 2011, Kevin Jacobs &lt;jacobs at bioinformed.com&gt; <
bioinformed at gmail.com> wrote:
> On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock at googlemail.com
>wrote:
>
>> You might have seen on Twitter at the end of last week I mentioned
>> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> features within a chromosome segment, optionally with labels.
>>
>>
>
> This looks to be extremely useful.  Is there any support for layouts to
> stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
> don't fit as well in linear displays.  I also think supporting chromosome
> bands would be extremely useful.  These could include full cytobands,
> centromeres, euchromatic vs hetrochromatic regions, user configurable
bands
> (e.g. linkage regions, IBD blocks, etc.)
>
> The figure shows off what I'm thinking about the banding and layout, even
> though it uses colored circles instead of text labels:
> http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf
>
> If there is interest, I may have some time to work on these features once
> the basic infrastructure is stable.
>
> Best regards,
> -Kevin

Hi Kevin,

I'm glad to hear there is some interest in this :)

That example you linked to is interesting - there are several
things of specific interest - and helps as I'm not yet
familiar with all the technical terms you used.

Notches in the chromosome which I assume are centromeres
(I can see how that might be added to the Bio code as another
segment type, similar to the telomeres).

Coloured background regions in the chromosome (should be
able to do this already), some of which are hatched (not possible
right now... would have to look into ReportLab's capabilities here).
This is what you meant by banding?

Multiple coloured dots for labels. Doable, but a nice API might
be tricky.

For layout did you mean the fact this isn't just a row of
chromosomes left to right, but here there are two rows?
I'm inclined to say the user should just move things in
the PDF for a final version using Adobe of Inkscape ;)

Regards,

Peter

From keith.hughitt at gmail.com  Tue Oct  4 07:31:51 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Tue, 4 Oct 2011 07:31:51 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
Message-ID: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>

Hi all,

I was thinking recently that it would be nice if the FASTA file reader were
able to check for known formats (e.g. NCBI) and then use that information to
choose better values for name, id, etc.

After some discussion with Peter Cock on GitHub, however, he convinced me
that this would be problematic in terms of backwards compatibility, and that
instead a better approach might be to add a new sub-format ("fasta-ncbi") to
the list of supported format readers.

This could go something like:

1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
files. Add it the the mapping of iterators.
2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
the result by assigning a new id, name, etc (other suggestions?)
3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)
4. Modify code that interacts with NCBI services which return FASTA files
and have it return a NCBIFasterIterator (First use a deprecation/warning to
let users know of the pending change?)

Does this sound like it would be a useful feature? What about the basic
approach outlined above? Any suggestions?

Keith

From p.j.a.cock at googlemail.com  Tue Oct  4 07:46:19 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 4 Oct 2011 12:46:19 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
Message-ID: <CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>

On Tue, Oct 4, 2011 at 12:31 PM, Keith Hughitt <keith.hughitt at gmail.com> wrote:
> Hi all,
>
> I was thinking recently that it would be nice if the FASTA file reader were
> able to check for known formats (e.g. NCBI) and then use that information to
> choose better values for name, id, etc.
>
> After some discussion with Peter Cock on GitHub, however, he convinced me
> that this would be problematic in terms of backwards compatibility, and that
> instead a better approach might be to add a new sub-format ("fasta-ncbi") to
> the list of supported format readers.
>
> This could go something like:
>
> 1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
> files. Add it the the mapping of iterators.

Yes.

> 2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
> the result by assigning a new id, name, etc (other suggestions?)

Store the GI number in the SeqRecord's annotation under key "gi"
to match the GenBank parser. There may be other things like this.

If the FASTA header does not match the NCBI style, that should
probably trigger an exception.

> 3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)

This will be harder, but yes in principle.

> 4. Modify code that interacts with NCBI services which return FASTA files
> and have it return a NCBIFasterIterator (First use a deprecation/warning to
> let users know of the pending change?)

No need. I'm pretty sure all the NCBI code (like Bio.Entrez) returns
handles so it is up to the end user to decide what to do with the
data, e.g. parse it with the current SeqIO "fasta" format, or save it
straight to disk.

> Does this sound like it would be a useful feature? What about the basic
> approach outlined above? Any suggestions?
>
> Keith

Yes, it sounds useful. I'm not sure where the most current NCBI
documentation is, but this is a good start:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html

Peter

From chapmanb at 50mail.com  Wed Oct  5 08:03:31 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 05 Oct 2011 08:03:31 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
Message-ID: <87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>


Peter;

> >> You might have seen on Twitter at the end of last week I mentioned
> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> >> features within a chromosome segment, optionally with labels.

This is awesome, thanks for extending it. All of your tweaks are good
improvements, and I'm +1 for including it in the next release. Please
improve away.

Thanks much,
Brad

From bioinformed at gmail.com  Wed Oct  5 09:16:56 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Wed, 5 Oct 2011 09:16:56 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
Message-ID: <CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Notches in the chromosome which I assume are centromeres
> (I can see how that might be added to the Bio code as another
> segment type, similar to the telomeres).
>

Yes-- although the visual style for centromeres need not be precisely as
shown in my example.


> Coloured background regions in the chromosome (should be
> able to do this already), some of which are hatched (not possible
> right now... would have to look into ReportLab's capabilities here).
> This is what you meant by banding?
>

Yes-- being able to show cytobands and custom bands to designate regions
will be very useful for me.  As before, I'm not wed to the cross-hatching,
in fact the standard displays use only grayscale.

Multiple coloured dots for labels. Doable, but a nice API might
> be tricky.
>

I don't much care about those -- I'd be happy with text labels.


> For layout did you mean the fact this isn't just a row of
> chromosomes left to right, but here there are two rows?
> I'm inclined to say the user should just move things in
> the PDF for a final version using Adobe of Inkscape ;)
>

Correct.  I'd prefer to have some programmatic control of layout, since I'd
hate to have to manually edit every whole-genome plot.  Since I'm working
exclusively with human data for now, it would be possible to pre-specify a
few standard layouts and avoid the trouble of supporting dynamic features.

Just let me know when the code is stable enough to start poking around.
 I'll float a proposal for what I think could be done to obtain feedback
before I commit much time to coding.

Thanks,
-Kevin

From p.j.a.cock at googlemail.com  Wed Oct  5 09:32:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 14:32:34 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
Message-ID: <CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>

On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>
>> Notches in the chromosome which I assume are centromeres
>> (I can see how that might be added to the Bio code as another
>> segment type, similar to the telomeres).
>
> Yes-- although the visual style for centromeres need not be precisely as
> shown in my example.
>
>>
>> Coloured background regions in the chromosome (should be
>> able to do this already), some of which are hatched (not possible
>> right now... would have to look into ReportLab's capabilities here).
>> This is what you meant by banding?
>
> Yes-- being able to show cytobands and custom bands to designate regions
> will be very useful for me. ?As before, I'm not wed to the cross-hatching,
> in fact the standard displays use only grayscale.

OK - simple colours are easy, I can add that to the test case example.

>>
>> Multiple coloured dots for labels. Doable, but a nice API might
>> be tricky.
>
> I don't much care about those -- I'd be happy with text labels.
>

Good.

>>
>> For layout did you mean the fact this isn't just a row of
>> chromosomes left to right, but here there are two rows?
>> I'm inclined to say the user should just move things in
>> the PDF for a final version using Adobe of Inkscape ;)
>
> Correct. ?I'd prefer to have some?programmatic?control of layout, since I'd
> hate to have to manually edit every whole-genome plot. ?Since I'm working
> exclusively with human data for now, it would be possible to pre-specify a
> few standard layouts and avoid the trouble of supporting dynamic features.
> Just let me know when the code is stable enough to start poking around.
> ?I'll float a proposal for what I think could be done to obtain feedback
> before I commit much time to coding.

Would an option for using multiple rows be enough? It
wouldn't be quite as compact as the tweaked human
example you showed - but probably good enough to
print on a single page.

Another option is to do the PDF editing programmatically,
for example with ReportLab. You can embed multiple
(smaller) PDF files within a larger container. Its a bit
fiddly, but would be worth the effort for a major pipeline
where you always use the same (few) organism(s).

Peter


From p.j.a.cock at googlemail.com  Wed Oct  5 10:40:56 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 15:40:56 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
Message-ID: <CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>

On Wed, Oct 5, 2011 at 1:03 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> >> You might have seen on Twitter at the end of last week I mentioned
>> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> >> features within a chromosome segment, optionally with labels.
>
> This is awesome, thanks for extending it. All of your tweaks are good
> improvements, and I'm +1 for including it in the next release. Please
> improve away.

Awesome. I've applied the current branch to the trunk, although
I'm not promising there won't be changes to the new stuff between
now and the next release.

In particular, doing the labels (and their placement) for the whole
of a chromosome (and not just for a segment) would allow us to
squeeze in more labels (e.g. in example I showed using the
vertical space currently reserved for the telomeres).

Peter

From p.j.a.cock at googlemail.com  Wed Oct  5 17:17:38 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 22:17:38 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
	<CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>
Message-ID: <CAKVJ-_4TZGr4KoZmfr_7cCPcyZ1XLRdQg3ytY4JynFSz==n6SQ@mail.gmail.com>

On Wed, Oct 5, 2011 at 2:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
>> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>> Coloured background regions in the chromosome (should be
>>> able to do this already), some of which are hatched (not possible
>>> right now... would have to look into ReportLab's capabilities here).
>>> This is what you meant by banding?
>>
>> Yes-- being able to show cytobands and custom bands to designate regions
>> will be very useful for me. ?As before, I'm not wed to the cross-hatching,
>> in fact the standard displays use only grayscale.
>
> OK - simple colours are easy, I can add that to the test case example.

Done, using some random placements - I didn't manage to find
the real Arabidopsis cytoband data which would have been nicer.
https://github.com/biopython/biopython/commit/24deaca63ba55e28519a4c85650ad74e849f203e

Peter


From p.j.a.cock at googlemail.com  Wed Oct  5 18:31:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 23:31:18 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
	<CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>
Message-ID: <CAKVJ-_73kxfxFGB2ugeUteGSGVt9ZycRyWERB4J41HX2kneJiw@mail.gmail.com>

On Wed, Oct 5, 2011 at 3:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> In particular, doing the labels (and their placement) for the whole
> of a chromosome (and not just for a segment) would allow us to
> squeeze in more labels (e.g. in example I showed using the
> vertical space currently reserved for the telomeres).
>

Done,

https://github.com/biopython/biopython/commit/d3d19440bdbaabbf4cd305e43dea627f68cf6ecf

We may want to review how chromosome segment labels work -
probably simplest to add them to the dynamically placed label
list, otherwise the two can overlap.

Peter

From tiagoantao at gmail.com  Thu Oct  6 12:17:40 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Oct 2011 17:17:40 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
Message-ID: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>

Hi,

This might be a red herring but:
http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html :
sprot_search_ful(text, make_wild=None, swissprot=1, trembl=None,
cgi='http://www.expasy.ch/cgi-bin/sprot-search-ful')

That cgi does not exist...

Tiago
-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From p.j.a.cock at googlemail.com  Thu Oct  6 12:23:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 6 Oct 2011 17:23:03 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
In-Reply-To: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
References: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
Message-ID: <CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>

2011/10/6 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> This might be a red herring but:
> http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html :
> sprot_search_ful(text, make_wild=None, swissprot=1, trembl=None,
> cgi='http://www.expasy.ch/cgi-bin/sprot-search-ful')
>
> That cgi does not exist...
>
> Tiago

Looks like they've changed the URL or turned off a redirect :(

If you can work out what they should be, please go ahead an fix it.
A working unit test would be good (mark it as requires internet).

Peter


From tiagoantao at gmail.com  Thu Oct  6 12:33:11 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Oct 2011 17:33:11 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
In-Reply-To: <CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>
References: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
	<CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>
Message-ID: <CAA9RGEOTkzX6+UY019sxi2GTYM9rB3-HG8+z=KyRpQhGoM6p-A@mail.gmail.com>

2011/10/6 Peter Cock <p.j.a.cock at googlemail.com>:
> Looks like they've changed the URL or turned off a redirect :(
>
> If you can work out what they should be, please go ahead an fix it.
> A working unit test would be good (mark it as requires internet).

I will add the bug to redmine. I currently am pressed on time to sort
this out :(

I can have a look next week.

From redmine at redmine.open-bio.org  Thu Oct  6 13:06:26 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 6 Oct 2011 17:06:26 +0000
Subject: [Biopython-dev] [Biopython - Bug #3301] (New) Bio.ExPASy
	sprot_search_ful has wrong cgi address
Message-ID: <redmine.issue-3301.20111006170626@redmine.open-bio.org>


Issue #3301 has been reported by Tiago Antao.

----------------------------------------
Bug #3301: Bio.ExPASy sprot_search_ful has wrong cgi address
https://redmine.open-bio.org/issues/3301

Author: Tiago Antao
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


The Bio.ExPASy sprot_search_ful has a cgi of http://www.expasy.ch/cgi-bin/sprot-search-ful , but that URL is not available anymore.
See:
http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html 


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From keith.hughitt at gmail.com  Fri Oct  7 07:18:10 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Fri, 7 Oct 2011 07:18:10 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
Message-ID: <CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>

Okay, I took at stab at it. The code is in the master branch of my fork:
https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73

I wasn't sure what the best choices are for id/name so for now I stored the
gid in id (and also in the annotations), and the accession as name. Any
suggestions?

I also haven't written any test code yet. Should I parameterize
TitleFunctions.simple_check and multi_check, or is there another approach
you would advise?

Keith

From p.j.a.cock at googlemail.com  Fri Oct  7 08:49:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 13:49:30 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
Message-ID: <CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>

On Fri, Oct 7, 2011 at 12:18 PM, Keith Hughitt <keith.hughitt at gmail.com> wrote:
> Okay, I took at stab at it. The code is in the master branch of my
> fork:?https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73

You are only handling gi|<gi_num>|ref|<accession>|<description>
whereas the NCBI have a *lot* of other variations to consider:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html

This is quite an open ended bit of work...

> I wasn't sure what the best choices are for id/name so for now I stored the
> gid in id (and also in the annotations), and the accession as name. Any
> suggestions?

I suggest collecting a selection of matched NCBI FASTA and
GenBank/GenPept files, and how Biopython handles the
GenBank/GenPept version (format name "genbank" alias "gb"
in Bio.SeqIO) and try to make handling the FASTA version as
"fasta-ncbi" do the same.

e.g. From our unit tests (from the NCBI FTP site), these are
a pair:

Tests/GenBank/NC_005816.gb
Tests/GenBank/NC_005816.fna

> I also haven't written any test code yet. Should I parameterize
> TitleFunctions.simple_check and multi_check, or is there
> another approach you would advise?
> Keith

Probably write some completely new tests. e.g. Use the
existing test files mentioned above, and verify that both
the "genbank" and the "fasta-ncbi" parser give the same
results (ignoring things not in the FASTA file of course).

Peter


From andrew.sczesnak at med.nyu.edu  Fri Oct  7 11:38:04 2011
From: andrew.sczesnak at med.nyu.edu (Andrew Sczesnak)
Date: Fri, 07 Oct 2011 11:38:04 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
Message-ID: <4E8F1CDC.8090500@med.nyu.edu>

Adding my unsolicited opinion here, what do y'all think of this 
NCBIFasta parser being a more general "callback" parser, where a 
function passed to read() or write() translates some arbitrary 
delimited-text into an (id, name, description) tuple, as in:

def x(seqrec):
     # gi|<gi_num>|ref|<accession>|<description>
     y = seqrec.description.strip().split("|")

     #       gi     acc  desc
     return (y[1], y[3]. y[4])

# calls x on every record in the FASTA
for seqrec in SeqIO.parse(fp, "fasta", x):
     print seqrec

This would be similar to key_function in SeqIO.to_dict() and would shift 
the responsibility of handling variation in formats to the user. 
Alternatively, a few functions to parse different styles of description 
lines could be included in the module.


Andrew

On 10/07/2011 08:49 AM, Peter Cock wrote:
> On Fri, Oct 7, 2011 at 12:18 PM, Keith Hughitt<keith.hughitt at gmail.com>  wrote:
>> Okay, I took at stab at it. The code is in the master branch of my
>> fork: https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73
>
> You are only handling gi|<gi_num>|ref|<accession>|<description>
> whereas the NCBI have a *lot* of other variations to consider:
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html
>
> This is quite an open ended bit of work...
>
>> I wasn't sure what the best choices are for id/name so for now I stored the
>> gid in id (and also in the annotations), and the accession as name. Any
>> suggestions?
>
> I suggest collecting a selection of matched NCBI FASTA and
> GenBank/GenPept files, and how Biopython handles the
> GenBank/GenPept version (format name "genbank" alias "gb"
> in Bio.SeqIO) and try to make handling the FASTA version as
> "fasta-ncbi" do the same.
>
> e.g. From our unit tests (from the NCBI FTP site), these are
> a pair:
>
> Tests/GenBank/NC_005816.gb
> Tests/GenBank/NC_005816.fna
>
>> I also haven't written any test code yet. Should I parameterize
>> TitleFunctions.simple_check and multi_check, or is there
>> another approach you would advise?
>> Keith
>
> Probably write some completely new tests. e.g. Use the
> existing test files mentioned above, and verify that both
> the "genbank" and the "fasta-ncbi" parser give the same
> results (ignoring things not in the FASTA file of course).
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From p.j.a.cock at googlemail.com  Fri Oct  7 12:00:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 17:00:52 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <4E8F1CDC.8090500@med.nyu.edu>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
Message-ID: <CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>

On Fri, Oct 7, 2011 at 4:38 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> Adding my unsolicited opinion here, what do y'all think of this NCBIFasta
> parser being a more general "callback" parser, where a function passed to
> read() or write() translates some arbitrary delimited-text into ...
>
> This would be similar to key_function in SeqIO.to_dict() and would shift the
> responsibility of handling variation in formats to the user. Alternatively,
> a few functions to parse different styles of description lines could be
> included in the module.
>
> Andrew

Hi Andrew,

Interesting idea, although it doesn't fit that well with the current
(deliberately) simple high level Bio.SeqIO.parse/read API,
that doesn't mean we can't do it (see Bio.Phylo.parse).

In this case I fail to see what benefit this gives over the current
situation, where the user can do this themselves with the
current FASTA parser,

e.g. With a function and a generator expression,

records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))

or more simply within a loop:

for record in SeqIO.parse(filename, "fasta")):
    do_ncbi_my_way(record)
    #Do stuff with record

etc.

Maybe it is down to personal preference of coding style?

I would much prefer a new "fasta-ncbi" parser in SeqIO
that handled all the documented NCBI FASTA identifiers.

I'm being negative here - but please don't let that deter you
from posting ideas. This is a public list and we/I welcome
constructive criticism and alternative ideas to the table.

Regards,

Peter

From p.j.a.cock at googlemail.com  Fri Oct  7 12:16:55 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 17:16:55 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <4E8F239D.30504@med.nyu.edu>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
	<4E8F239D.30504@med.nyu.edu>
Message-ID: <CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>

On Fri, Oct 7, 2011 at 5:06 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
>>
>> Maybe it is down to personal preference of coding style?
>
> I agree, there isn't much difference between specifying the callback
> function in parse() or within the loop. To me, this points out that
> re-implementing a FASTA parser simply for a format of description
> line seems unnecessary.
>
> If a user is interesting in extracting a particular piece of information
> from a FASTA description and knows the input format of the file, how
> difficult is it for them to split() it on their own? What exactly are the
> advantages of a separate parser?

Not enough of an advantage for me personally to have gone
and written it myself ;)

I can see some benefits in extracting information from the
NCBI identifier and storing them in the SeqRecord's dbxref
list and annotation dictionary (as consistently with our other
parsers as possible) if you are going to want to use those
fields yourself.

Perhaps Keith can explain his interest with some examples?

Peter

From andrew.sczesnak at med.nyu.edu  Fri Oct  7 12:06:53 2011
From: andrew.sczesnak at med.nyu.edu (Andrew Sczesnak)
Date: Fri, 07 Oct 2011 12:06:53 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
Message-ID: <4E8F239D.30504@med.nyu.edu>

On 10/07/2011 12:00 PM, Peter Cock wrote:
> Hi Andrew,
>
> Interesting idea, although it doesn't fit that well with the current
> (deliberately) simple high level Bio.SeqIO.parse/read API,
> that doesn't mean we can't do it (see Bio.Phylo.parse).
>
> In this case I fail to see what benefit this gives over the current
> situation, where the user can do this themselves with the
> current FASTA parser,
>
> e.g. With a function and a generator expression,
>
> records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))
>
> or more simply within a loop:
>
> for record in SeqIO.parse(filename, "fasta")):
>      do_ncbi_my_way(record)
>      #Do stuff with record
>
> etc.
>
> Maybe it is down to personal preference of coding style?

I agree, there isn't much difference between specifying the callback 
function in parse() or within the loop. To me, this points out that 
re-implementing a FASTA parser simply for a format of description line 
seems unnecessary.

If a user is interesting in extracting a particular piece of information 
from a FASTA description and knows the input format of the file, how 
difficult is it for them to split() it on their own? What exactly are 
the advantages of a separate parser?

> I would much prefer a new "fasta-ncbi" parser in SeqIO
> that handled all the documented NCBI FASTA identifiers.
>
> I'm being negative here - but please don't let that deter you
> from posting ideas. This is a public list and we/I welcome
> constructive criticism and alternative ideas to the table.
>
> Regards,
>
> Peter

From keith.hughitt at gmail.com  Fri Oct  7 13:02:30 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Fri, 7 Oct 2011 13:02:30 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
	<4E8F239D.30504@med.nyu.edu>
	<CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>
Message-ID: <CAOJcpR_XMp2+fOwP6fcYtfv-ix+c8noSSJSU+rEQyC17MZi_Nw@mail.gmail.com>

It's really just meant to be a bit of "polish." Originally I was thinking
not about having a separate parser but simply extending the existing FASTA
parser to recognize common formats (e.g. NCBI) and choose better ids,
annotations, etc.

Since that would create problems in terms of backwards compatibility,
however, adding a new parser seemed like the next best option.

Part of the goal, personally, was also just to find a small but useful task
I could work on to begin to learn the code and contribute some. It shouldn't
be forced though, so I don't want to contribute something unless it's
actually an improvement.

Keith

On Fri, Oct 7, 2011 at 12:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Oct 7, 2011 at 5:06 PM, Andrew Sczesnak
> <andrew.sczesnak at med.nyu.edu> wrote:
> >>
> >> Maybe it is down to personal preference of coding style?
> >
> > I agree, there isn't much difference between specifying the callback
> > function in parse() or within the loop. To me, this points out that
> > re-implementing a FASTA parser simply for a format of description
> > line seems unnecessary.
> >
> > If a user is interesting in extracting a particular piece of information
> > from a FASTA description and knows the input format of the file, how
> > difficult is it for them to split() it on their own? What exactly are the
> > advantages of a separate parser?
>
> Not enough of an advantage for me personally to have gone
> and written it myself ;)
>
> I can see some benefits in extracting information from the
> NCBI identifier and storing them in the SeqRecord's dbxref
> list and annotation dictionary (as consistently with our other
> parsers as possible) if you are going to want to use those
> fields yourself.
>
> Perhaps Keith can explain his interest with some examples?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From b.invergo at gmail.com  Mon Oct 10 06:36:47 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 10 Oct 2011 12:36:47 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
Message-ID: <1318243007.12974.16.camel@localhost.localdomain>

Hi all,
I've received a request to implement the parsing of the main
supplementary output files of the PAML programs ('rst' files). I can't
submit a bug on Bugzilla, so I'll just announce my intention to work on
this here on the list.

One question though. The rst file for baseml includes an alignment which
is in the Phylip sequential format. I thought that it would be nice to
parse that directly into a Biopython MultipleSeqAlignment. It's my
understanding that Biopython only supports the interleaved format. Would
it be worth it for me to extend that functionality to include the
sequential format or would it be preferable to convert the alignments to
be interleaved within the parser itself?

Regards,
Brandon Invergo


From p.j.a.cock at googlemail.com  Mon Oct 10 08:21:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 13:21:52 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318243007.12974.16.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
Message-ID: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>

On Mon, Oct 10, 2011 at 11:36 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi all,
> I've received a request to implement the parsing of the main
> supplementary output files of the PAML programs ('rst' files). I can't
> submit a bug on Bugzilla, so I'll just announce my intention to work on
> this here on the list.

That's because we moved to RedMine, there should have
been a link on the old Bugzilla page, but anyway its here:
https://redmine.open-bio.org/projects/biopython

> One question though. The rst file for baseml includes an alignment which
> is in the Phylip sequential format. I thought that it would be nice to
> parse that directly into a Biopython MultipleSeqAlignment. It's my
> understanding that Biopython only supports the interleaved format. Would
> it be worth it for me to extend that functionality to include the
> sequential format or would it be preferable to convert the alignments to
> be interleaved within the parser itself?
>
> Regards,
> Brandon Invergo

If you can extend the current PHYLIP parser (strict or relaxed)
to cover interleaved and sequential, that would be nice. For
strict mode at least, we can in principle follow whatever the
original PHYLIP tools do to detect this automatically. It may
be safer to make it explicit though - from what I recall without
seeing the PHYLIP implementation's source code it was not
obvious how to do this reliably.

Peter

From b.invergo at gmail.com  Mon Oct 10 09:22:18 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 10 Oct 2011 15:22:18 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
Message-ID: <1318252938.12974.54.camel@localhost.localdomain>

Hi Peter

> That's because we moved to RedMine, there should have
> been a link on the old Bugzilla page, but anyway its here:
> https://redmine.open-bio.org/projects/biopython

Ok, I'll file an enhancement request there. I didn't see a link on the
Bugzilla page and there are still some links to Bugzilla on the wiki,
like in the "What's being worked on" section. I missed the Issue Tracker
link on the left (incidentally, I think this is a design problem of the
typical wiki layout and not Biopython-specific...I never notice the
contents of that list), so it might be advisable to include the link
under the Contribute heading of the main page.

> If you can extend the current PHYLIP parser (strict or relaxed)
> to cover interleaved and sequential, that would be nice. For
> strict mode at least, we can in principle follow whatever the
> original PHYLIP tools do to detect this automatically. It may
> be safer to make it explicit though - from what I recall without
> seeing the PHYLIP implementation's source code it was not
> obvious how to do this reliably.

Ok, I'll take a look at the PHYLIP source code to see how they do it
there.

I'll report back with problems/notable progress/questions.

Cheers,
Brandon


From redmine at redmine.open-bio.org  Mon Oct 10 09:29:47 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 10 Oct 2011 13:29:47 +0000
Subject: [Biopython-dev] [Biopython - Feature #3303] (New) Support PHYLIP
	sequential alignment format in AlignIO
Message-ID: <redmine.issue-3303.20111010132947@redmine.open-bio.org>


Issue #3303 has been reported by Brandon Invergo.

----------------------------------------
Feature #3303: Support PHYLIP sequential alignment format in AlignIO
https://redmine.open-bio.org/issues/3303

Author: Brandon Invergo
Status: New
Priority: Normal
Assignee: Brandon Invergo
Category: 
Target version: 
URL: 


Currently only PHYLIP alignments in the interleaved format can be read by AlignIO however since some programs still work on the sequential format it would be helpful to be able to support that as well.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Oct 10 09:31:13 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 10 Oct 2011 13:31:13 +0000
Subject: [Biopython-dev] [Biopython - Feature #3304] (New) Parse PAML
	supplementary (rst) output files
Message-ID: <redmine.issue-3304.20111010133113@redmine.open-bio.org>


Issue #3304 has been reported by Brandon Invergo.

----------------------------------------
Feature #3304: Parse PAML supplementary (rst) output files
https://redmine.open-bio.org/issues/3304

Author: Brandon Invergo
Status: New
Priority: Normal
Assignee: Brandon Invergo
Category: 
Target version: 
URL: 


PAML programs create several output files, the main one of which is already parsed by the Bio.Phylo.PAML modules. The primary supplementary output files ('rst' files) also contain information that is useful for some users so they should be parsed as well.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Oct 10 12:35:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 17:35:15 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318252938.12974.54.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318252938.12974.54.camel@localhost.localdomain>
Message-ID: <CAKVJ-_58zyuqwy8J9EbM1xDWHj6vXX35HokF8aC=KN_8f9WD7g@mail.gmail.com>

On Mon, Oct 10, 2011 at 2:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi Peter
>
>> That's because we moved to RedMine, there should have
>> been a link on the old Bugzilla page, but anyway its here:
>> https://redmine.open-bio.org/projects/biopython
>
> Ok, I'll file an enhancement request there. I didn't see a link on the
> Bugzilla page and there are still some links to Bugzilla on the wiki,
> like in the "What's being worked on" section.

Fixed, thanks.

> I missed the Issue Tracker
> link on the left (incidentally, I think this is a design problem of the
> typical wiki layout and not Biopython-specific...I never notice the
> contents of that list), so it might be advisable to include the link
> under the Contribute heading of the main page.

Good idea, done.

Peter

From p.j.a.cock at googlemail.com  Mon Oct 10 17:47:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 22:47:03 +0100
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
Message-ID: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>

This was on the "SeqFeature start/end and making positions act
like ints" thread last month:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009183.html

On Mon, Sep 19, 2011 at 10:03 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Well, slightly easier - I have some more dramatic changes to
>> the SeqFeature and FeatureLocation objects planned, but I'm
>> still playing with this.
>
> One of the key changes (which can be done without
> really changing the API) is to move the database &
> accession and the strand from the SeqFeature to the
> FeatureLocation. These are intimately connected with
> the location, as much as the start/end.
>
> This is one of the things I've been working on here:
> https://github.com/peterjc/biopython/commits/f_loc
>
> The other key change on that experimental branch
> is moving away from sub_features for join locations
> (etc). Here I was trying a new CoupoundLocation
> object, but am still wondering if this should be done
> in the SeqFeature or FeatureLocation object instead
> (or if SeqFeature should subclass FeatureLocation).
>
> Peter

That branch needs some manual merge conflict
resolution with the integer subclassing position
changes that landed on the trunk, which I've started:

https://github.com/peterjc/biopython/tree/f_loc2

Would someone like to review that please?

It moves the strand, ref and db_ref properties from
the SeqFeature object to the FeatureLocation object,
implementing read/write proxy methods for backward
compatibility.

Other than the commit which changes the __str__
method (the fine details of which I am happy to tweak
with discussion) this should be almost 100% back
compatible:

https://github.com/peterjc/biopython/commit/fed003821d0d223a7b3042ccc3bdf8442348f043

The one break I am aware of is you can't now create a
SeqFeature with an empty location and then try to set
the strand or db regs before setting the location object.
(which is what the GenBank parser was doing).

The motivation is that the strand and (optional) database
reference for which the location start/end apply are both
essential parts of the location information, and I feel never
should have been attached to the SeqFeature directly.

Furthermore, this separation is useful as a step towards
reworking the current use of the SeqFeature's sub_feature
list for multi-part locations (e.g. joins in GenBank/EMBL),
more on this later.

Thanks,

Peter

From b.invergo at gmail.com  Tue Oct 11 03:51:26 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 09:51:26 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
Message-ID: <1318319486.3137.19.camel@localhost.localdomain>

> If you can extend the current PHYLIP parser (strict or relaxed)
> to cover interleaved and sequential, that would be nice. For
> strict mode at least, we can in principle follow whatever the
> original PHYLIP tools do to detect this automatically. It may
> be safer to make it explicit though - from what I recall without
> seeing the PHYLIP implementation's source code it was not
> obvious how to do this reliably.
> 
I checked out the PHYLIP code and yes it's not really obvious how the
mode is detected. In fact, it seems that many of the programs ask for
user input to specify the format of the alignment.

So, regarding making it explicit, I'm not sure if this is what you meant
but I was thinking it might be simplest to add another Iterator/Writer
pair in the PhylipIO module for SequentialPhylip which inherit from the
basic Phylip classes, overriding the next() method in the iterator and
the write_alignment() method in the writer, much in the way that the
RelaxedPhylip classes work. 

This would mean that there would be no flexibility in the naming rules
(ie relaxed vs strict) for the SequentialPhylip format, unless I were to
also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
sequence name length restriction to 30 characters and since the whole
reason for embarking on this exercise was to support PAML's output of
PHYLIP alignments, if only one naming convention is to be implemented I
think it would be best to default to the relaxed rules.


Slightly unrelated musings: I was thinking that with Biopython's support
for reading PHYLIP alignments and Newick trees into objects, at some
point it would probably be convenient to make the Bio.Phylo.PAML package
more integrated by allowing the user to pass in such objects as
arguments rather than writing them to files first; the PAML module could
write them to temp files itself. I think some minor changes might have
to be made in places (ie for PAML to accept interleaved alignments, the
header line must contain an 'I' flag after the seq # and seq len
integers) and I'd have to think about how best to allow passing such
objects while still retaining the ability to specify filenames without
using kludgy, non-pythonic type-checking. Anyway, another task for
another day, but I thought I'd throw it out there.

Regards,
Brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 04:20:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 09:20:52 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318319486.3137.19.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
Message-ID: <CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>

On Tue, Oct 11, 2011 at 8:51 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> If you can extend the current PHYLIP parser (strict or relaxed)
>> to cover interleaved and sequential, that would be nice. For
>> strict mode at least, we can in principle follow whatever the
>> original PHYLIP tools do to detect this automatically. It may
>> be safer to make it explicit though - from what I recall without
>> seeing the PHYLIP implementation's source code it was not
>> obvious how to do this reliably.
>>
> I checked out the PHYLIP code and yes it's not really obvious how the
> mode is detected. In fact, it seems that many of the programs ask for
> user input to specify the format of the alignment.
>
> So, regarding making it explicit, I'm not sure if this is what you meant
> but I was thinking it might be simplest to add another Iterator/Writer
> pair in the PhylipIO module for SequentialPhylip which inherit from the
> basic Phylip classes, overriding the next() method in the iterator and
> the write_alignment() method in the writer, much in the way that the
> RelaxedPhylip classes work.

Something like that as a new format variant, yes.

> This would mean that there would be no flexibility in the naming rules
> (ie relaxed vs strict) for the SequentialPhylip format, unless I were to
> also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
> sequence name length restriction to 30 characters and since the whole
> reason for embarking on this exercise was to support PAML's output of
> PHYLIP alignments, if only one naming convention is to be implemented I
> think it would be best to default to the relaxed rules.

Practical.

> Slightly unrelated musings: I was thinking that with Biopython's support
> for reading PHYLIP alignments and Newick trees into objects, at some
> point it would probably be convenient to make the Bio.Phylo.PAML package
> more integrated by allowing the user to pass in such objects as
> arguments rather than writing them to files first; the PAML module could
> write them to temp files itself. I think some minor changes might have
> to be made in places (ie for PAML to accept interleaved alignments, the
> header line must contain an 'I' flag after the seq # and seq len
> integers) and I'd have to think about how best to allow passing such
> objects while still retaining the ability to specify filenames without
> using kludgy, non-pythonic type-checking. Anyway, another task for
> another day, but I thought I'd throw it out there.

Do we need to write the "I" flag in our PHYLIP output?

Peter

From b.invergo at gmail.com  Tue Oct 11 05:33:13 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 11:33:13 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
Message-ID: <1318325593.3137.51.camel@localhost.localdomain>


> Something like that as a new format variant, yes.
> 
> > ...
> 
> Practical.
> 
Ok, I'll start working on that then.


> Do we need to write the "I" flag in our PHYLIP output?

It took me a while to hunt down information on PHYLIP flags. I found
this link which mentions them:
http://www.no.embnet.org/phylipdoc/
They're only used by the program which is using the alignment as input,
corresponding to the PHYLIP programs' menu options. In general, they
have no affect on the format of the alignment (aside from the
'S'/sequential vs 'I'/interleaved flags). However, some of them might
require extra information immediately below the header line, before the
alignment starts. This complicates things. (see below for some PAML
examples)

However, since there's no real standardization to the use of the phylip
format, not all programs pay attention to these flags. In my own work,
I've used TCoffee to generate interleaved alignments and then I have to
add in the 'I' after the fact. As another example, the current Biopython
PhylipIO would not recognize a header line with options as a valid
header line, since there would be more than 2 "parts".

So, if some programs can take options flags (at least PHYLIP and PAML
programs) while other programs may not like their inclusion, they would
need to be treated specially. I would suggest that the PhylipIterator
classes be modified to recognize the existence of options, but not
necessarily do anything with them, and that the PhylipWriter classes be
modified to optionally take a string containing option flags to append
to the header line, ie 'I', 'GC', etc. 

As for the supplementary information for the options, I'm not sure if
those complicate matters beyond the scope of Biopython's intended
functionality, or whether there should be yet another optional string
argument to the writer. The PhylipIterators would then need to be
modified to handle the possible existence of these supplementary lines
as well.

Anyway, I don't think this is an immediate concern and I personally
wouldn't approach it until I start working on the idea of better
integrating the PAML module with the rest of Biopython.

-brandon

Here are some examples:
5 895 G
G 4 3
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
1231231231231231231231231231231231231
444444444444444444444444444444444444444444444444444444444444
444444444444444444444444444444444444444444444444444444444444
444444444444444444444444444444444444444444444444444444444444
444444444444444444
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
12312312312312312312312312312312312312312312312312312312312 
Human
AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGACTTACATCCTCATTACTATT
CTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCATAATC........
Chimpanzee .........

"The first line of the file contains the option character G. The second
line begins with a G at the first column, followed by the number of site
classes. The following lines contain the site marks, one for each site
in the sequence (or each codon in the case of
codonml). The site mark specifies which class each site is from. If
there are g classes, the marks should be 1, 2, ..., g, and if g > 9, the
marks need to be separated by spaces. The total number of
marks must be equal to the total number of sites in each sequence."

********

5 1000 G
G 4 100 200 300 400 
Sequence 1
TCGATAGATAGGTTTTAGGGGGGGGGGTAAAAAAAAA.......

"This [alignment has 5 sequences of] 1000 nucleotides from 4 genes,
obtained from concatenating four genes with 100, 200, 300, and 400
nucleotides from genes 1, 2, 3, and 4, respectively. The"

********

5 855 GC 
human          GTG CTG TCT CCT ...

5 sequences, 855 nucleotides, length must be a multiple of three
********

5 300 G 
G2 40 60

sequence1
.....

"This data set has 5 sequences, each of 300 nucleotides (100 codons),
which are partitioned into two genes, with the first gene having 40
codons and the second gene 60 codons."


From p.j.a.cock at googlemail.com  Tue Oct 11 05:37:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 10:37:48 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318325593.3137.51.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>

On Tue, Oct 11, 2011 at 10:33 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> Do we need to write the "I" flag in our PHYLIP output?
>
> It took me a while to hunt down information on PHYLIP flags. I found
> this link which mentions them:
> http://www.no.embnet.org/phylipdoc/
> They're only used by the program which is using the alignment as input,
> corresponding to the PHYLIP programs' menu options. In general, they
> have no affect on the format of the alignment (aside from the
> 'S'/sequential vs 'I'/interleaved flags). However, some of them might
> require extra information immediately below the header line, before the
> alignment starts. This complicates things. (see below for some PAML
> examples)

Some of those examples don't really look like PHYLIP anymore to me.

If there is any simple change to allow the current parser to cope
with (but ignore) any extra meta data like this, that sounds sensible
(with unit tests of course - grin).

Peter

From b.invergo at gmail.com  Tue Oct 11 06:01:59 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 12:01:59 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
Message-ID: <1318327319.3137.70.camel@localhost.localdomain>


> Some of those examples don't really look like PHYLIP anymore to me.
> 
> If there is any simple change to allow the current parser to cope
> with (but ignore) any extra meta data like this, that sounds sensible
> (with unit tests of course - grin).

Agreed, it can get quite messy, though look at the link I provided; even
the PHYLIP-specific example that they give includes some supplementary
info at the top, as well as a tree at the bottom:

 4   40   W					
W         0101001111 0101110101 0101110011	
	  1101010110
dmras1    GTCGTCGTTG GACCTGGAGG CGTGGGCAAG	

spras     GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
scras1    GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
scras2    GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
	  TCCGCGCTCA
	  AGTGCTTTGA
	  TCTGCTTTAA
	  TCTGCTTTGA
1						
((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));


I agree that trying to shoehorn that functionality into Biopython as
written would be a mess. Another option that I can think of, however,
would be to shift such extra formatting duties to the Biopython
application interface which needs them, since that's the only place
they're relevant. So I could, for example, make a PAML-specific subclass
of PhylipWriter which handles all these weird PAML-specific options. Or
if there were to be a PHYLIP interface and the program took that above
example as input, it would be the duty of the interface to write a file
with those options, the alignment and the tree all together. 
Just a thought.

For the short term, though, when I implement the sequential format, I'll
go ahead and update the code to at least handle flags in the header
line. To handle the supp. info should be straight forward, since I
believe that each supp. line must begin with the option flag that
requires the info; if the option flag exists in the header, ignore any
following lines which begin with that flag character. 

Unit tests will abound.

-brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 06:13:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 11:13:03 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318327319.3137.70.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>

On Tue, Oct 11, 2011 at 11:01 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>
>> Some of those examples don't really look like PHYLIP anymore to me.
>>
>> If there is any simple change to allow the current parser to cope
>> with (but ignore) any extra meta data like this, that sounds sensible
>> (with unit tests of course - grin).
>
> Agreed, it can get quite messy, though look at the link I provided; even
> the PHYLIP-specific example that they give includes some supplementary
> info at the top, as well as a tree at the bottom:
>
> ?4 ? 40 ? W
> W ? ? ? ? 0101001111 0101110101 0101110011
> ? ? ? ? ?1101010110
> dmras1 ? ?GTCGTCGTTG GACCTGGAGG CGTGGGCAAG
>
> spras ? ? GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
> scras1 ? ?GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
> scras2 ? ?GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
> ? ? ? ? ?TCCGCGCTCA
> ? ? ? ? ?AGTGCTTTGA
> ? ? ? ? ?TCTGCTTTAA
> ? ? ? ? ?TCTGCTTTGA
> 1
> ((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));
>

I would consider that to be a meta file containing a PHYLIP
alignment and a tree, but in itself it isn't a PHYLIP alignment.

That looks like exactly the kind of issue NEXUS was designed
to solve: how to embed alignments, trees and other stuff into
a single plain text file for input into a phylogenetic tool.

Doesn't PHYLIP have an XML format these days? Trying
to parse something like that text (without a formal standard)
seems like a painful exercise and long term maintenance
headache.

Peter


From b.invergo at gmail.com  Tue Oct 11 06:37:39 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 12:37:39 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
Message-ID: <1318329459.3137.82.camel@localhost.localdomain>

> I would consider that to be a meta file containing a PHYLIP
> alignment and a tree, but in itself it isn't a PHYLIP alignment.
> 
> That looks like exactly the kind of issue NEXUS was designed
> to solve: how to embed alignments, trees and other stuff into
> a single plain text file for input into a phylogenetic tool.
> 
> Doesn't PHYLIP have an XML format these days? Trying
> to parse something like that text (without a formal standard)
> seems like a painful exercise and long term maintenance
> headache.

I'm not suggesting that Biopython parse and store the information
because I agree that it would be an unmaintainable nightmare. To bring
myself out of the clouds a bit and back to the basics of my original
intent: if I work on better integrating the PAML module so that the user
can pass a MultipleSeqAlignment object, I will need a way to write that
alignment to a file with potentially more information than the default
PhylipWriter allows. So, just as simple as that, Bio.Phylo.PAML would
need its own alignment writer....something I'm not going to worry about
right now.

With this mentality, then yes, anything containing such option flags and
info is no longer a PHYLIP alignment but is rather an input file to some
program. As such, the existing PhylipIO module should *not* be modified
to handle this metadata. Please ignore all my other half-baked ideas.

So, current, phylip-related tasks:
- implement SequentialPhylipWriter and SequentialPhylipIterator classes
in PhylipIO


That's it, I think. I'll revisit this alignment-writing stuff at some
other point. One task at a time...

-brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 07:05:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 12:05:48 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318329459.3137.82.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
	<1318329459.3137.82.camel@localhost.localdomain>
Message-ID: <CAKVJ-_7R4KHUFc3oOhQQ_nrorKey0GDeACAEZVv-OWn0scnuvg@mail.gmail.com>

On Tue, Oct 11, 2011 at 11:37 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> I would consider that to be a meta file containing a PHYLIP
>> alignment and a tree, but in itself it isn't a PHYLIP alignment.
>>
>> That looks like exactly the kind of issue NEXUS was designed
>> to solve: how to embed alignments, trees and other stuff into
>> a single plain text file for input into a phylogenetic tool.
>>
>> Doesn't PHYLIP have an XML format these days? Trying
>> to parse something like that text (without a formal standard)
>> seems like a painful exercise and long term maintenance
>> headache.
>
> I'm not suggesting that Biopython parse and store the information
> because I agree that it would be an unmaintainable nightmare. To bring
> myself out of the clouds a bit and back to the basics of my original
> intent: if I work on better integrating the PAML module so that the user
> can pass a MultipleSeqAlignment object, I will need a way to write that
> alignment to a file with potentially more information than the default
> PhylipWriter allows. So, just as simple as that, Bio.Phylo.PAML would
> need its own alignment writer....something I'm not going to worry about
> right now.
>
> With this mentality, then yes, anything containing such option flags and
> info is no longer a PHYLIP alignment but is rather an input file to some
> program. As such, the existing PhylipIO module should *not* be modified
> to handle this metadata. Please ignore all my other half-baked ideas.

What you could think about is having the Bio.Phylo.PAML create this
file, and call the existing PhylipIO module with the handle to write the
alignment part - and perhaps the Bio.Phylo module with the handle to
write any tree.

> So, current, phylip-related tasks:
> - implement SequentialPhylipWriter and SequentialPhylipIterator classes
> in PhylipIO
>
> That's it, I think. I'll revisit this alignment-writing stuff at some
> other point. One task at a time...
>
> -brandon

That sounds like a manageable step to start with :)

Peter

From chapmanb at 50mail.com  Tue Oct 11 07:20:31 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 11 Oct 2011 07:20:31 -0400
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
In-Reply-To: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
References: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
Message-ID: <8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>


Peter;

> https://github.com/peterjc/biopython/tree/f_loc2
> 
> It moves the strand, ref and db_ref properties from
> the SeqFeature object to the FeatureLocation object,
> implementing read/write proxy methods for backward
> compatibility.

Thanks for the integer work and for this. I'm agreed that this is a more
logical way to store the strand (and cross-ref) information. +1 from me
on checking it in,
Brad

From p.j.a.cock at googlemail.com  Tue Oct 11 07:28:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 12:28:35 +0100
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
In-Reply-To: <8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>
References: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
	<8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>
Message-ID: <CAKVJ-_75KRXMX9ABK=Wk=mJA_w_PSsUUGPR3BTA9YCKy5ZGPVw@mail.gmail.com>

On Tue, Oct 11, 2011 at 12:20 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> https://github.com/peterjc/biopython/tree/f_loc2
>>
>> It moves the strand, ref and db_ref properties from
>> the SeqFeature object to the FeatureLocation object,
>> implementing read/write proxy methods for backward
>> compatibility.
>
> Thanks for the integer work and for this. I'm agreed that this is a more
> logical way to store the strand (and cross-ref) information. +1 from me
> on checking it in,
> Brad

OK, that's done. Cheers Brad.

As I said before, if anyone doesn't like the new printing
of the FeatureLocation with how I present the strand and
database reference, we can change that. There are
examples in the SeqFeature.py and SeqRecord.py docstrings.

Regards,

Peter

From eric.talevich at gmail.com  Tue Oct 11 08:55:57 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 11 Oct 2011 08:55:57 -0400
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
Message-ID: <CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>

On Tue, Oct 11, 2011 at 6:13 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> That looks like exactly the kind of issue NEXUS was designed
> to solve: how to embed alignments, trees and other stuff into
> a single plain text file for input into a phylogenetic tool.
>
> Doesn't PHYLIP have an XML format these days? Trying
> to parse something like that text (without a formal standard)
> seems like a painful exercise and long term maintenance
> headache.
>
>
The Phylip programs seqboot and retree have XML formats that look almost
like SeqXML and phyloXML, but they're not quite compatible, e.g. attribute
names are slightly different.

This is probably because they were written before those standard formats
existed -- pretty sure the retree XML format, sort of described in Inferring
Phylogenies (2004) as an example of how a future XML tree format might look,
was an inspiration for phyloXML. There hasn't been much development on these
parts of the Phylip codebase lately, though. If someone wanted to write a
patch to bring these formats into compliance with the closest standards, I
bet Joe would accept the patch.

Discussion:
https://www.facebook.com/permalink.php?story_fbid=256082801069968&id=115402811804635

-E

From p.j.a.cock at googlemail.com  Tue Oct 11 09:04:20 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 14:04:20 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
	<CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>
Message-ID: <CAKVJ-_7k1wwsrhWYO7UBdpFvVE_z2cm4qnNgE90T34_+5xtJhA@mail.gmail.com>

On Tue, Oct 11, 2011 at 1:55 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 11, 2011 at 6:13 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> That looks like exactly the kind of issue NEXUS was designed
>> to solve: how to embed alignments, trees and other stuff into
>> a single plain text file for input into a phylogenetic tool.
>>
>> Doesn't PHYLIP have an XML format these days? Trying
>> to parse something like that text (without a formal standard)
>> seems like a painful exercise and long term maintenance
>> headache.
>>
>
> The Phylip programs seqboot and retree have XML formats that look almost
> like SeqXML and phyloXML, but they're not quite compatible, e.g. attribute
> names are slightly different.
>
> This is probably because they were written before those standard formats
> existed -- pretty sure the retree XML format, sort of described in Inferring
> Phylogenies (2004) as an example of how a future XML tree format might look,
> was an inspiration for phyloXML. There hasn't been much development on these
> parts of the Phylip codebase lately, though. If someone wanted to write a
> patch to bring these formats into compliance with the closest standards, I
> bet Joe would accept the patch.
>
> Discussion:
> https://www.facebook.com/permalink.php?story_fbid=256082801069968&id=115402811804635
>
> -E

Good plan - anyone here familiar with the PHYLIP code base?

Peter

From chapmanb at 50mail.com  Thu Oct 13 10:05:57 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 13 Oct 2011 10:05:57 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
Message-ID: <871uuhm1fe.fsf@fastmail.fm>


Hi all;
Biopython's setup.py currently has an interactive question/answer
session to remind users to optionally install NumPy if it's not
present. This is useful for by-hand installations, but problematic with
automated installers.

One useful feature of setuptools is the 'install_requires' attribute in
setup.py. This allows your programs to define the requirements and have
them automatically installed from PyPi. It's a great way to include
useful libraries without having to fret excessively about users
installing dependencies.

Unfortunately if you use install_requires with Biopython, and NumPy is
not installed, automated scripts will get stuck in the question/answer
dialog. To resolve this issue, I wrote a small patch that adds NumPy to
Biopython's install_requires and skips the Q/A only in cases where it is
installed via pip or easy_install:

https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b

If someone is able to review this, it would be great to get it into
Biopython for the next release.

Brad

From p.j.a.cock at googlemail.com  Thu Oct 13 10:20:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 13 Oct 2011 15:20:46 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <871uuhm1fe.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5506i1UmzScq+fikCattsYktOpsWeFkE-56L1BcNvsXA@mail.gmail.com>

On Thu, Oct 13, 2011 at 3:05 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Hi all;
> Biopython's setup.py currently has an interactive question/answer
> session to remind users to optionally install NumPy if it's not
> present. This is useful for by-hand installations, but problematic with
> automated installers.
>
> One useful feature of setuptools is the 'install_requires' attribute in
> setup.py. This allows your programs to define the requirements and have
> them automatically installed from PyPi. It's a great way to include
> useful libraries without having to fret excessively about users
> installing dependencies.
>
> Unfortunately if you use install_requires with Biopython, and NumPy is
> not installed, automated scripts will get stuck in the question/answer
> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
> Biopython's install_requires and skips the Q/A only in cases where it is
> installed via pip or easy_install:
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> If someone is able to review this, it would be great to get it into
> Biopython for the next release.
>
> Brad

I can appreciate the usefulness of this, but don't know enough
about pip and easy_install to comment on the implementation.

Anyone else?

Peter

From eric.talevich at gmail.com  Thu Oct 13 14:00:22 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 13 Oct 2011 14:00:22 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <871uuhm1fe.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
Message-ID: <CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>

On Thu, Oct 13, 2011 at 10:05 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Hi all;
> Biopython's setup.py currently has an interactive question/answer
> session to remind users to optionally install NumPy if it's not
> present. This is useful for by-hand installations, but problematic with
> automated installers.
>
> One useful feature of setuptools is the 'install_requires' attribute in
> setup.py. This allows your programs to define the requirements and have
> them automatically installed from PyPi. It's a great way to include
> useful libraries without having to fret excessively about users
> installing dependencies.
>
> Unfortunately if you use install_requires with Biopython, and NumPy is
> not installed, automated scripts will get stuck in the question/answer
> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
> Biopython's install_requires and skips the Q/A only in cases where it is
> installed via pip or easy_install:
>
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> If someone is able to review this, it would be great to get it into
> Biopython for the next release.
>
>
Hi Brad,

Looks cool to me, except the sys.argv parsing gets a little gritty
(understandably):

Line 115:

    if dist_dir.find("egg-dist-tmp") >= 0:

Could this be `if 'egg-dist-tmp' in dist_dir`?


Line 118:

    if sys.argv in [["-c", "develop", "--no-deps"],
                    ["-c", "egg_info"]]:

Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
valid?
If so, should that be added as a third item in the list-of-args?


-Eric

From chapmanb at 50mail.com  Fri Oct 14 06:00:37 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 14 Oct 2011 06:00:37 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
Message-ID: <87hb3b51ve.fsf@fastmail.fm>


Eric and Peter;
Thanks much for taking a look at this patch.

> Looks cool to me, except the sys.argv parsing gets a little gritty
> (understandably):

Absolutely. Unfortunately the python installation space is pretty
messy. Neither pip not easy_install gives any formal declaration so you
have to resort to these hacks to infer that they are doing the
install. Luckily I don't think any of these options are something people
would do directly from the command line.

> Line 115:
> 
>     if dist_dir.find("egg-dist-tmp") >= 0:
> 
> Could this be `if 'egg-dist-tmp' in dist_dir`?

> Line 118:
> 
>     if sys.argv in [["-c", "develop", "--no-deps"],
>                     ["-c", "egg_info"]]:
> 
> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
> valid?
> If so, should that be added as a third item in the list-of-args?

Awesome, thanks for the suggestions. I checked both of these in.

Thanks again,
Brad

From p.j.a.cock at googlemail.com  Fri Oct 14 06:53:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Oct 2011 11:53:42 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87hb3b51ve.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>

On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Awesome, thanks for the suggestions. I checked both of these in.
>

I'll test the branch today, and merge it to the trunk if it looks good
on Python 2 / 3 / Jython / PyPy.

Peter

From p.j.a.cock at googlemail.com  Fri Oct 14 06:55:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Oct 2011 11:55:52 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
Message-ID: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>

On Fri, Oct 14, 2011 at 11:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Awesome, thanks for the suggestions. I checked both of these in.
>>
>
> I'll test the branch today, and merge it to the trunk if it looks good
> on Python 2 / 3 / Jython / PyPy.
>

$ jython setup.py install
/Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
...


That's with Jython 2.5.2 under Mac OS X Snow Leopard. Same with pypy 1.6,

$ pypy setup.py install
/Users/pjcock/Downloads/Software/pypy-1.6/lib-python/modified-2.7/distutils/dist.py:267:
UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
...

Can we avoid that warning?

Peter

From chapmanb at 50mail.com  Fri Oct 14 08:26:06 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 14 Oct 2011 08:26:06 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
Message-ID: <87ehyf4v4x.fsf@fastmail.fm>


Peter;
Thanks for testing this and helping with the merge

> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
>   warnings.warn(msg)
[...]
> Can we avoid that warning?

This is a warning from distutils, so you would also see this on regular
ol' Python without setuptools installed. Likewise it should go away on
jython or pypy if they have setuptools or distribute installed.

Unfortunately I don't have a way around it since this is an argument to
setup. Most modern installations should have setuptools and can take
advantage of install_requires.

If it's a problem we could use 'warnings' to ignore it.

Brad

From cmccoy at fhcrc.org  Fri Oct 14 13:11:15 2011
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Fri, 14 Oct 2011 10:11:15 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
Message-ID: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>

Hi Brad, Eric, and Peter,

Sorry to jump in. Regarding the install_requires warnings:

If you're interested, you can include the distribute_setup.py file
from http://python-distribute.org/distribute_setup.py in BioPython,
and add a short conditional import:

try:
    from setuptools import setup, find_packages
except ImportError:
    import distribute_setup
    distribute_setup.use_setuptools()
    from setuptools import setup, find_packages

Which will download and install distribute if it isn't available in
the python installation; the remainder of the setup can assume
setuptools is available.  Sphinx
(https://bitbucket.org/birkenfeld/sphinx/src/f1f641602bb2/setup.py)
and some other projects use this.

Connor

On Fri, Oct 14, 2011 at 9:00 AM,
<biopython-dev-request at lists.open-bio.org> wrote:
> Send Biopython-dev mailing list submissions to
> ? ? ? ?biopython-dev at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> ? ? ? ?http://lists.open-bio.org/mailman/listinfo/biopython-dev
> or, via email, send a message with subject or body 'help' to
> ? ? ? ?biopython-dev-request at lists.open-bio.org
>
> You can reach the person managing the list at
> ? ? ? ?biopython-dev-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython-dev digest..."
>
>
> Today's Topics:
>
> ? 1. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Eric Talevich)
> ? 2. Re: NumPy dialog when Biopython installed from ? ?automated
> ? ? ?programs (Brad Chapman)
> ? 3. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Peter Cock)
> ? 4. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Peter Cock)
> ? 5. Re: NumPy dialog when Biopython installed from ? ?automated
> ? ? ?programs (Brad Chapman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 13 Oct 2011 14:00:22 -0400
> From: Eric Talevich <eric.talevich at gmail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Oct 13, 2011 at 10:05 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>>
>> Hi all;
>> Biopython's setup.py currently has an interactive question/answer
>> session to remind users to optionally install NumPy if it's not
>> present. This is useful for by-hand installations, but problematic with
>> automated installers.
>>
>> One useful feature of setuptools is the 'install_requires' attribute in
>> setup.py. This allows your programs to define the requirements and have
>> them automatically installed from PyPi. It's a great way to include
>> useful libraries without having to fret excessively about users
>> installing dependencies.
>>
>> Unfortunately if you use install_requires with Biopython, and NumPy is
>> not installed, automated scripts will get stuck in the question/answer
>> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
>> Biopython's install_requires and skips the Q/A only in cases where it is
>> installed via pip or easy_install:
>>
>>
>> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>>
>> If someone is able to review this, it would be great to get it into
>> Biopython for the next release.
>>
>>
> Hi Brad,
>
> Looks cool to me, except the sys.argv parsing gets a little gritty
> (understandably):
>
> Line 115:
>
> ? ?if dist_dir.find("egg-dist-tmp") >= 0:
>
> Could this be `if 'egg-dist-tmp' in dist_dir`?
>
>
> Line 118:
>
> ? ?if sys.argv in [["-c", "develop", "--no-deps"],
> ? ? ? ? ? ? ? ? ? ?["-c", "egg_info"]]:
>
> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
> valid?
> If so, should that be added as a third item in the list-of-args?
>
>
> -Eric
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 14 Oct 2011 06:00:37 -0400
> From: Brad Chapman <chapmanb at 50mail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from ? ?automated programs
> To: Eric Talevich <eric.talevich at gmail.com>
> Cc: , Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID: <87hb3b51ve.fsf at fastmail.fm>
> Content-Type: text/plain; charset=us-ascii
>
>
> Eric and Peter;
> Thanks much for taking a look at this patch.
>
>> Looks cool to me, except the sys.argv parsing gets a little gritty
>> (understandably):
>
> Absolutely. Unfortunately the python installation space is pretty
> messy. Neither pip not easy_install gives any formal declaration so you
> have to resort to these hacks to infer that they are doing the
> install. Luckily I don't think any of these options are something people
> would do directly from the command line.
>
>> Line 115:
>>
>> ? ? if dist_dir.find("egg-dist-tmp") >= 0:
>>
>> Could this be `if 'egg-dist-tmp' in dist_dir`?
>
>> Line 118:
>>
>> ? ? if sys.argv in [["-c", "develop", "--no-deps"],
>> ? ? ? ? ? ? ? ? ? ? ["-c", "egg_info"]]:
>>
>> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
>> valid?
>> If so, should that be added as a third item in the list-of-args?
>
> Awesome, thanks for the suggestions. I checked both of these in.
>
> Thanks again,
> Brad
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 14 Oct 2011 11:53:42 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Awesome, thanks for the suggestions. I checked both of these in.
>>
>
> I'll test the branch today, and merge it to the trunk if it looks good
> on Python 2 / 3 / Jython / PyPy.
>
> Peter
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 14 Oct 2011 11:55:52 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Fri, Oct 14, 2011 at 11:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>>
>>> Awesome, thanks for the suggestions. I checked both of these in.
>>>
>>
>> I'll test the branch today, and merge it to the trunk if it looks good
>> on Python 2 / 3 / Jython / PyPy.
>>
>
> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
> ?warnings.warn(msg)
> running install
> running build
> running build_py
> ...
>
>
> That's with Jython 2.5.2 under Mac OS X Snow Leopard. Same with pypy 1.6,
>
> $ pypy setup.py install
> /Users/pjcock/Downloads/Software/pypy-1.6/lib-python/modified-2.7/distutils/dist.py:267:
> UserWarning: Unknown distribution option: 'install_requires'
> ?warnings.warn(msg)
> running install
> running build
> running build_py
> ...
>
> Can we avoid that warning?
>
> Peter
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 14 Oct 2011 08:26:06 -0400
> From: Brad Chapman <chapmanb at 50mail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from ? ?automated programs
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: , Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID: <87ehyf4v4x.fsf at fastmail.fm>
> Content-Type: text/plain; charset=us-ascii
>
>
> Peter;
> Thanks for testing this and helping with the merge
>
>> $ jython setup.py install
>> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
>> Unknown distribution option: 'install_requires'
>> ? warnings.warn(msg)
> [...]
>> Can we avoid that warning?
>
> This is a warning from distutils, so you would also see this on regular
> ol' Python without setuptools installed. Likewise it should go away on
> jython or pypy if they have setuptools or distribute installed.
>
> Unfortunately I don't have a way around it since this is an argument to
> setup. Most modern installations should have setuptools and can take
> advantage of install_requires.
>
> If it's a problem we could use 'warnings' to ignore it.
>
> Brad
>
>
> ------------------------------
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> End of Biopython-dev Digest, Vol 105, Issue 12
> **********************************************
>


From carlcrott at gmail.com  Sun Oct 16 21:24:27 2011
From: carlcrott at gmail.com (carl crott)
Date: Sun, 16 Oct 2011 21:24:27 -0400
Subject: [Biopython-dev] fixes on the tutorials
Message-ID: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>

So the tutorials I'm running through have some bugs in them ...

would anyone like me to fix these?


tutorial 2.4.1 should be something like:

from Bio import SeqIO
handle = open("ls_orchid.fasta", "rU")
for seq_record in SeqIO.parse(handle, "fasta"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)
handle.close()


and tutorial 2.4.2:

from Bio import SeqIO
handle = open("ls_orchid.gbk", "rU")
for seq_record in SeqIO.parse(handle, "genbank"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)
handle.close()

From chapmanb at 50mail.com  Sun Oct 16 21:29:49 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 16 Oct 2011 21:29:49 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>
References: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>
Message-ID: <8739eso16a.fsf@fastmail.fm>


Connor;
Thanks for the idea on the auto-install of setuptools/distribute. I'm
open to this or sticking with the warning, whichever everyone
prefers. Traditionally the setup has tried to be lightweight so you
could install Biopython without anything else, but having distribute
installed is pretty useful so it might be nice to encourage this.

Brad

> Sorry to jump in. Regarding the install_requires warnings:
> 
> If you're interested, you can include the distribute_setup.py file
> from http://python-distribute.org/distribute_setup.py in BioPython,
> and add a short conditional import:
> 
> try:
>     from setuptools import setup, find_packages
> except ImportError:
>     import distribute_setup
>     distribute_setup.use_setuptools()
>     from setuptools import setup, find_packages
> 
> Which will download and install distribute if it isn't available in
> the python installation; the remainder of the setup can assume
> setuptools is available.  Sphinx
> (https://bitbucket.org/birkenfeld/sphinx/src/f1f641602bb2/setup.py)
> and some other projects use this.

From p.j.a.cock at googlemail.com  Mon Oct 17 03:55:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 08:55:54 +0100
Subject: [Biopython-dev] fixes on the tutorials
In-Reply-To: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
References: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
Message-ID: <CAKVJ-_4w+R-=PeUdCSXOkG7G4XxPNyP2G+rnW8bcgsumKW9PXA@mail.gmail.com>

On Mon, Oct 17, 2011 at 2:24 AM, carl crott <carlcrott at gmail.com> wrote:
> So the tutorials I'm running through have some bugs in them ...
>
> would anyone like me to fix these?
>

Hi Carl,

What's the bug?
>
> tutorial 2.4.1 should be something like:
>
> from Bio import SeqIO
> handle = open("ls_orchid.fasta", "rU")
> for seq_record in SeqIO.parse(handle, "fasta"):
> ? ?print seq_record.id
> ? ?print repr(seq_record.seq)
> ? ?print len(seq_record)
> handle.close()
>

Your example above looks fine (and the tutorial
used to say that), but the current version is shorter:

from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)

We could alternatively (now that we've dropped
Python 2.4) open the handle with a with statement.
The same applies to the GenBank example.

Perhaps you are using an old version of Biopython
(where Bio.SeqIO.parse(...) does not accept a
filename)?

Could you clarify please,

Thanks,

Peter


From p.j.a.cock at googlemail.com  Mon Oct 17 06:10:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 11:10:54 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
Message-ID: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>

Hi Joshua and everyone,

It looks like Joshua's email (below) got lost in the spam filter (possibly
due to the attachment). The core of his patch was as follows (there
were also lots of white space changes).


@@ -694,6 +714,15 @@ class FeatureLocation(object):
             for i in range(self._start, self._end):
                 yield i

+    def __eq__(self, other):
+        """Compares a FeatureLocation for equality"""
+        if not isinstance(other, FeatureLocation):
+            return False
+        if self.start() == other.start() and \
+                self.end() == other.end():
+            return True
+        return False
+


@@ -255,6 +255,26 @@ class SeqFeature(object):
             qualifiers = dict(self.qualifiers.iteritems()),
             sub_features = [f._flip(length) for f in self.sub_features[::-1]])

+    def __eq__(self, other):
+        """Compare between this SeqFeature and other.
+
+        ref, ref_db and qualifiers are not needed for comparison"""
+        if not isinstance(other, SeqFeature):
+            return False
+        if (self.id != "<uknown id>"
+              and other.id != "<uknown id>" and
+              self.id == other.id):
+            return True         # Can we trust this?
+        for x in ('location', 'type', 'strand', 'location_operator'):
+            if (getattr(self, x) and getattr(other, x) and \
+                    getattr(self, x) != getattr(other, x)):
+                return False
+        for f in self.sub_features:
+            if f not in other.sub_features:
+                return False
+        else:
+            return True
+
     def extract(self, parent_sequence):
         """Extract feature sequence from the supplied parent sequence.

Note the patch will not apply to the trunk, perhaps it is against
the current release?

First (logically), is defining __eq__ for the FeatureLocation,
and second is defining __eq__ for the SeqFeature. This
hides the fact that we need to compare position objects,
e.g. is BeforePosition(5) == ExactPosition(5)?, the answer
is yes, which I have now clarified in the docstrings:

https://github.com/biopython/biopython/commit/55feea75f7ab55eac4ef4e320567d746ce41120a

Other than the fact that I think the ref and ref_db should be
checked when comparing locations, adding location comparison
seems like a good idea. Note that with the recent changes on
the trunk, the strand, ref and ref_db now belong to the
FeatureLocation not the SeqFeature.

Extending this to cover the SeqFeature leaves the ID, type,
etc and is fiddly: Particularly the question of annotation.
These are essentially the same reasons why we don't support
SeqRecord equality.

Joshua - would you like to update your patch against the
code in github, just for the FeatureLocation __eq__ method,
to include the strand, ref and red_db properties?

Thanks,

Peter


---------- Forwarded message ----------
From:?"Joshua Ismael Haase Hern?ndez" <hahj87 at gmail.com>
To:?biopython-dev at biopython.org
Date:?Mon, 17 Oct 2011 01:06:17 -0500
Subject:?[patch] SeqFeature comparison for equality
Hi there.

I was working on a testcase for a custom program
which should extract the same features I had planned.

Since SeqFeature lacs comparison method, there is no
easy way to test

for feature in test_gene.features:
? ?self.asserIn(feature, myparser(file).features)

So I added comparison methods and they work fine.

Patch attached. My changes are under Biopython license.


From p.j.a.cock at googlemail.com  Mon Oct 17 11:03:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 16:03:42 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
Message-ID: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>

Hi Michiel,

Regarding code using Bio.File, which you asked about
deprecating last month:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html

I objected at the time because I was using it for the
TogoWS code I was working on,

On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> If the server could be relied on to always give an
>>> HTTP error code this wouldn't be needed:
>>>
>>> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
>>>
>
> ...
>
> [Some of those TogoWS checks are probably superfluous
> right now, I'm still polishing the error handling - some of
> which will rely on TogoWS itself catching more conditions]

I've updated my TogoWS to rely on the HTTP error codes,
and removed the heuristic error detection which required
Bio.File for the UndoHandle. That seems to be working fine
now.

That leaves Bio/SCOP/__init__.py as the only existing or
imminent code using Bio.File, so if we can sort that out,
we can deprecate Bio.File as you suggested.

Regards,

Peter

From anaryin at gmail.com  Mon Oct 17 11:13:37 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 17:13:37 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
Message-ID: <CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>

Hey Peter, all,

Sorry to peek in. I was going over some code lately together with Eric and
he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.

What is this deprecation about then?

Cheers,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/17 Peter Cock <p.j.a.cock at googlemail.com>

> Hi Michiel,
>
> Regarding code using Bio.File, which you asked about
> deprecating last month:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
>
> I objected at the time because I was using it for the
> TogoWS code I was working on,
>
> On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>> If the server could be relied on to always give an
> >>> HTTP error code this wouldn't be needed:
> >>>
> >>>
> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
> >>>
> >
> > ...
> >
> > [Some of those TogoWS checks are probably superfluous
> > right now, I'm still polishing the error handling - some of
> > which will rely on TogoWS itself catching more conditions]
>
> I've updated my TogoWS to rely on the HTTP error codes,
> and removed the heuristic error detection which required
> Bio.File for the UndoHandle. That seems to be working fine
> now.
>
> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Mon Oct 17 11:44:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 16:44:35 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
Message-ID: <CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>

On Mon, Oct 17, 2011 at 4:13 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter, all,
> Sorry to peek in. I was going over some code lately together with Eric and
> he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.
> What is this deprecation about then?
> Cheers,

Hi Jo?o,

Perhaps you misunderstood Eric, Bio.File is not used widely at all.
See Michiel's email at the start of this thread:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html

Peter


From anaryin at gmail.com  Mon Oct 17 12:10:40 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 18:10:40 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
Message-ID: <CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>

Hi Peter,

To be honest, I didn't see much of a point to use the module but for
consistency's sake.

I grep'ed Bio.File in my biopython dir and I got a few more modules with
Bio.File, don't know if you were aware.

Bio/Application/__init__.py:from Bio import File
Bio/Blast/NCBIStandalone.py:from Bio import File
Bio/PDB/parse_pdb_header.py:from Bio import File
Bio/Phylo/_io.py:from Bio import File
Bio/SCOP/__init__.py:    from Bio import File


Just wanting to clear my doubts about this, thanks!

Cheers,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/17 Peter Cock <p.j.a.cock at googlemail.com>

> On Mon, Oct 17, 2011 at 4:13 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> > Hey Peter, all,
> > Sorry to peek in. I was going over some code lately together with Eric
> and
> > he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.
> > What is this deprecation about then?
> > Cheers,
>
> Hi Jo?o,
>
> Perhaps you misunderstood Eric, Bio.File is not used widely at all.
> See Michiel's email at the start of this thread:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Oct 17 12:26:14 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 17:26:14 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
	<CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
Message-ID: <CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>

On Mon, Oct 17, 2011 at 5:10 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hi Peter,
> To be honest, I didn't see much of a point to use the module but for
> consistency's sake.

Michiel's point was [at that time] there was very little useful
code if any in Bio.File, so could we deprecate it?

> I grep'ed Bio.File in my biopython dir and I got a few more modules
> with Bio.File, don't know if you were aware.
>
> Bio/Application/__init__.py:from Bio import File
> Bio/Blast/NCBIStandalone.py:from Bio import File
> Bio/PDB/parse_pdb_header.py:from Bio import File
> Bio/Phylo/_io.py:from Bio import File
> Bio/SCOP/__init__.py: ? ?from Bio import File
>
> Just wanting to clear my doubts about this, thanks!
> Cheers,

Oh - I remember now. We recently added the as_handle
context manager to Bio.File, and that is a useful bit of
functionality of general interest.

At the time I had forgotten about Michiel's suggestion
we deprecate Bio.File, which is unfortunate, but we
can still change this before our next release.

So, should we keep Bio.File for as_handle (even if
everything else in Bio.File is to be deprecated), or
should we move the new as_handle functionality
somewhere else and deprecate all of Bio.File.

Thanks for double checking Jo?o,

Peter


From anaryin at gmail.com  Mon Oct 17 13:21:28 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 19:21:28 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
	<CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
	<CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>
Message-ID: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>

>
> At the time I had forgotten about Michiel's suggestion
>
we deprecate Bio.File, which is unfortunate, but we
> can still change this before our next release.
>
> So, should we keep Bio.File for as_handle (even if
> everything else in Bio.File is to be deprecated), or
> should we move the new as_handle functionality
> somewhere else and deprecate all of Bio.File.
>

I think it doesn't make sense to keep the module for 5 lines of code.

    if isinstance(handleish, basestring):
        with open(handleish, mode) as fp:
            yield fp
    else:
        yield handleish

I'd either place them in __init__.py or just insert them in all Bio.*IO
modules wherever needed. If we had more snippets in common with all *IOs, it
would be valuable and understandable to have a separate module, but as is
it's a bit unnecessary IMHO.


>
> Thanks for double checking Jo?o,
>

No problem.

Cheers,

Jo?o


From hahj87 at gmail.com  Mon Oct 17 13:57:53 2011
From: hahj87 at gmail.com (=?ISO-8859-1?Q?Joshua_Ismael_Haase_Hern=E1ndez?=)
Date: Mon, 17 Oct 2011 12:57:53 -0500
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
Message-ID: <CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>

El 17 de octubre de 2011 12:15, Peter Cock <p.j.a.cock at googlemail.com>escribi?:

> Hi Joshua,
>
> Could you CC the biopython-dev mailing list, unless you
> specifically want to discuss something in private?
>

Sorry about that, I thought i was answering to mailin list.

>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> > I'm on it.
> >
> > Will add __eq__ to FeatureLocation on trunk.
>
> Great.
>
> In the short term, you can just work on it directly with a copy of the
> official repository and send me a patch (use git patch > file.patch)
>
> The "best" way is to fork biopython on github, and create your
> own branch with these changes.
>
> > I think BeforeLocation should check if the second is before,
> > After check if it is after, etc, and this can be done in locations.
> >
> > Before I implement those: do you agree?
> >
> > In that case, AbstractLocation instances
> > should check if ExactLocation instances are
> > inside their range, and AbstractLocation
> > instances to be exactly the same.
>
>
This positions would be the same:

OneOfPosition(5, 11, 15),
ExactPosition(11),
AfterPosition(4),
BeforePosition(16),
WithinPosition(5, 16),


> No. Having tried this myself, it is very complicated.
>

I think I'm missing something, why is it hard?,
I see it as a cases listing.


> Also, there are constraints with the Python language
> about equality, hashing and comparisons (e.g. for
> membership in lists, or use as dictionary keys).
>

I don't think anyone should use Features as dictionary keys,
they will use Feature Id for that, but maybe someona wants a
set of features (which just now is like a list of all sequences)...

I which cases that should be a problem? (I'm biothechnology
engineer, so I don't see all caveats, and i don't really have
deep undestanding about how python works)

The current behaviour of simple comparison of
> the positions as an integer is at least simple.
>
> > About SeqFeature, I think they should be
> > the same if they share all locations.
>
> You don't care about feature type and ID?  ;)
>

maybe not, a comparison could skip iterating
the locations if we have the same type and id,
still not sure that's a good method (thus the comment
?# Can we trust this?? on my patch) but a feature
'CDS' is sometimes equivalent to feature 'mRNA',
in that case ID and type would both be different
in seqfeatures.

>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Oct 17 14:07:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 19:07:27 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
Message-ID: <CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>

2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> ...
>
> This positions would be the same:
>
> OneOfPosition(5, 11, 15),
> ExactPosition(11),
> AfterPosition(4),
> BeforePosition(16),
> WithinPosition(5, 16),

I don't understand what you are asking here. Those
positions do not look the same to me.

>>
>> No. Having tried this myself, it is very complicated.
>
> I think I'm missing something, why is it hard?,
> I see it as a cases listing.

Well, try it and write lots of unit tests, and I'll review it.

>>
>> Also, there are constraints with the Python language
>> about equality, hashing and comparisons (e.g. for
>> membership in lists, or use as dictionary keys).
>
> I don't think anyone should use Features as dictionary keys,
> they will use Feature Id for that, but maybe someona wants a
> set of features (which just now is like a list of all sequences)...
>
> I which cases that should be a problem? (I'm biothechnology
> engineer, so I don't see all caveats, and i don't really have
> deep undestanding about how python works)

Using positions as dictionary keys seems reasonable.

Using a SeqFeature as a key is not possible as they
are mutable objects.

>> The current behaviour of simple comparison of
>> the positions as an integer is at least simple.
>>
>> > About SeqFeature, I think they should be
>> > the same if they share all locations.
>>
>> You don't care about feature type and ID? ?;)
>
> maybe not, a comparison could skip iterating
> the locations if we have the same type and id,
> still not sure that's a good method (thus the comment
> ?# Can we trust this?? on my patch) but a feature
> 'CDS' is sometimes equivalent to feature 'mRNA',
> in that case ID and type would both be different
> in seqfeatures.

A gene, mRNA and CDS might all have the same
position, but they are different features.

Peter


From hahj87 at gmail.com  Mon Oct 17 14:27:19 2011
From: hahj87 at gmail.com (=?ISO-8859-1?Q?Joshua_Ismael_Haase_Hern=E1ndez?=)
Date: Mon, 17 Oct 2011 13:27:19 -0500
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
Message-ID: <CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>

El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>escribi?:

> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> > ...
> >
> > This positions would be the same:
> >
> > OneOfPosition(5, 11, 15),
> > ExactPosition(11),
> > AfterPosition(4),
> > BeforePosition(16),
> > WithinPosition(5, 16),
>
> I don't understand what you are asking here. Those
> positions do not look the same to me.
>
>
They are not *exactly* the same, but besides
AfterPosition and BeforePosition,
ExactPosition(11) is included in OneOfPosition(5, 11, 15),
ExactPosition(11) is after AfterPosition(4)
ExactPosition(11) is before BeforePosition(16)
ExactPosition(11) is included in WithinPosition(5, 16)
All positions in OneOfPosition are before BeforePosition,
after AfterPosition, within WithinPosition, and includes
ExactPosition.
Al positions in WithinPosition are after AfterPosition,
before BeforePosition.

BeforePosition and AfterPosition can't be equal.

How should I name the TestCases?


From p.j.a.cock at googlemail.com  Mon Oct 17 15:03:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 20:03:15 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
	<CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>
Message-ID: <CAKVJ-_6DRB5aN0WzuVoNF7iufFdamkOdEhZxJot=yYWR5dMZwg@mail.gmail.com>

2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>
>
> El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>
> escribi?:
>>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > ...
>> >
>> > This positions would be the same:
>> >
>> > OneOfPosition(5, 11, 15),
>> > ExactPosition(11),
>> > AfterPosition(4),
>> > BeforePosition(16),
>> > WithinPosition(5, 16),
>>
>> I don't understand what you are asking here. Those
>> positions do not look the same to me.
>>
>
> They are not *exactly* the same, but besides
> AfterPosition and BeforePosition,
> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
> ExactPosition(11) is after AfterPosition(4)
> ExactPosition(11) is before BeforePosition(16)
> ExactPosition(11) is included in WithinPosition(5, 16)
> All positions in OneOfPosition are before BeforePosition,
> after AfterPosition, within WithinPosition, and includes
> ExactPosition.
> Al positions in WithinPosition are after AfterPosition,
> before BeforePosition.
> BeforePosition and AfterPosition can't be equal.
>

It might help it you wrote these out explicitly,
e.g. currently:

    >>> from Bio.SeqFeature import *
    >>> a = BeforePosition(10)
    >>> b = AfterPosition(10)
    >>> a == b == 10
    True

Currently BeforePosition and AfterPosition act like
the integer position for comparison etc. I find this
reasonable given we have to treat them as the
integer for things like extracting the sequence.

> How should I name the TestCases?
>

Something like test_SeqFeature.py and using
unittest. Most existing tests in this area are in
doctests and test_SeqIO_feature.py

Peter


From andrea at biocomp.unibo.it  Tue Oct 18 08:59:05 2011
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Tue, 18 Oct 2011 14:59:05 +0200 (CEST)
Subject: [Biopython-dev] SeqFeature comparison for equality
In-Reply-To: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
References: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
Message-ID: <bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>

Hi,
I don't know if this can help,
but I've been subclassing seqfeature and seqrecord objects to assert
equalities.
I've attached the very simple code for the seqfeature equality
Handling complex location equalities with a given set of rules could be
misleading.
a feature starting in position 11 is different, for me, from one located
at position 12.

Andrea


> ------------------------------
>
> Message: 4
> Date: Mon, 17 Oct 2011 12:57:53 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 12:15, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> Hi Joshua,
>>
>> Could you CC the biopython-dev mailing list, unless you
>> specifically want to discuss something in private?
>>
>
> Sorry about that, I thought i was answering to mailin list.
>
>>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > I'm on it.
>> >
>> > Will add __eq__ to FeatureLocation on trunk.
>>
>> Great.
>>
>> In the short term, you can just work on it directly with a copy of the
>> official repository and send me a patch (use git patch > file.patch)
>>
>> The "best" way is to fork biopython on github, and create your
>> own branch with these changes.
>>
>> > I think BeforeLocation should check if the second is before,
>> > After check if it is after, etc, and this can be done in locations.
>> >
>> > Before I implement those: do you agree?
>> >
>> > In that case, AbstractLocation instances
>> > should check if ExactLocation instances are
>> > inside their range, and AbstractLocation
>> > instances to be exactly the same.
>>
>>
> This positions would be the same:
>
> OneOfPosition(5, 11, 15),
> ExactPosition(11),
> AfterPosition(4),
> BeforePosition(16),
> WithinPosition(5, 16),
>
>
>> No. Having tried this myself, it is very complicated.
>>
>
> I think I'm missing something, why is it hard?,
> I see it as a cases listing.
>
>
>> Also, there are constraints with the Python language
>> about equality, hashing and comparisons (e.g. for
>> membership in lists, or use as dictionary keys).
>>
>
> I don't think anyone should use Features as dictionary keys,
> they will use Feature Id for that, but maybe someona wants a
> set of features (which just now is like a list of all sequences)...
>
> I which cases that should be a problem? (I'm biothechnology
> engineer, so I don't see all caveats, and i don't really have
> deep undestanding about how python works)
>
> The current behaviour of simple comparison of
>> the positions as an integer is at least simple.
>>
>> > About SeqFeature, I think they should be
>> > the same if they share all locations.
>>
>> You don't care about feature type and ID?  ;)
>>
>
> maybe not, a comparison could skip iterating
> the locations if we have the same type and id,
> still not sure that's a good method (thus the comment
> ?# Can we trust this?? on my patch) but a feature
> 'CDS' is sometimes equivalent to feature 'mRNA',
> in that case ID and type would both be different
> in seqfeatures.
>
>>
>> Peter
>>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 17 Oct 2011 19:07:27 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> ...
>>
>> This positions would be the same:
>>
>> OneOfPosition(5, 11, 15),
>> ExactPosition(11),
>> AfterPosition(4),
>> BeforePosition(16),
>> WithinPosition(5, 16),
>
> I don't understand what you are asking here. Those
> positions do not look the same to me.
>
>>>
>>> No. Having tried this myself, it is very complicated.
>>
>> I think I'm missing something, why is it hard?,
>> I see it as a cases listing.
>
> Well, try it and write lots of unit tests, and I'll review it.
>
>>>
>>> Also, there are constraints with the Python language
>>> about equality, hashing and comparisons (e.g. for
>>> membership in lists, or use as dictionary keys).
>>
>> I don't think anyone should use Features as dictionary keys,
>> they will use Feature Id for that, but maybe someona wants a
>> set of features (which just now is like a list of all sequences)...
>>
>> I which cases that should be a problem? (I'm biothechnology
>> engineer, so I don't see all caveats, and i don't really have
>> deep undestanding about how python works)
>
> Using positions as dictionary keys seems reasonable.
>
> Using a SeqFeature as a key is not possible as they
> are mutable objects.
>
>>> The current behaviour of simple comparison of
>>> the positions as an integer is at least simple.
>>>
>>> > About SeqFeature, I think they should be
>>> > the same if they share all locations.
>>>
>>> You don't care about feature type and ID? ?;)
>>
>> maybe not, a comparison could skip iterating
>> the locations if we have the same type and id,
>> still not sure that's a good method (thus the comment
>> ?# Can we trust this?? on my patch) but a feature
>> 'CDS' is sometimes equivalent to feature 'mRNA',
>> in that case ID and type would both be different
>> in seqfeatures.
>
> A gene, mRNA and CDS might all have the same
> position, but they are different features.
>
> Peter
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 17 Oct 2011 13:27:19 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 13:07, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > ...
>> >
>> > This positions would be the same:
>> >
>> > OneOfPosition(5, 11, 15),
>> > ExactPosition(11),
>> > AfterPosition(4),
>> > BeforePosition(16),
>> > WithinPosition(5, 16),
>>
>> I don't understand what you are asking here. Those
>> positions do not look the same to me.
>>
>>
> They are not *exactly* the same, but besides
> AfterPosition and BeforePosition,
> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
> ExactPosition(11) is after AfterPosition(4)
> ExactPosition(11) is before BeforePosition(16)
> ExactPosition(11) is included in WithinPosition(5, 16)
> All positions in OneOfPosition are before BeforePosition,
> after AfterPosition, within WithinPosition, and includes
> ExactPosition.
> Al positions in WithinPosition are after AfterPosition,
> before BeforePosition.
>
> BeforePosition and AfterPosition can't be equal.
>
> How should I name the TestCases?
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 17 Oct 2011 20:03:15 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_6DRB5aN0WzuVoNF7iufFdamkOdEhZxJot=yYWR5dMZwg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>
>>
>> El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>
>> escribi?:
>>>
>>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>> > ...
>>> >
>>> > This positions would be the same:
>>> >
>>> > OneOfPosition(5, 11, 15),
>>> > ExactPosition(11),
>>> > AfterPosition(4),
>>> > BeforePosition(16),
>>> > WithinPosition(5, 16),
>>>
>>> I don't understand what you are asking here. Those
>>> positions do not look the same to me.
>>>
>>
>> They are not *exactly* the same, but besides
>> AfterPosition and BeforePosition,
>> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
>> ExactPosition(11) is after AfterPosition(4)
>> ExactPosition(11) is before BeforePosition(16)
>> ExactPosition(11) is included in WithinPosition(5, 16)
>> All positions in OneOfPosition are before BeforePosition,
>> after AfterPosition, within WithinPosition, and includes
>> ExactPosition.
>> Al positions in WithinPosition are after AfterPosition,
>> before BeforePosition.
>> BeforePosition and AfterPosition can't be equal.
>>
>
> It might help it you wrote these out explicitly,
> e.g. currently:
>
>     >>> from Bio.SeqFeature import *
>     >>> a = BeforePosition(10)
>     >>> b = AfterPosition(10)
>     >>> a == b == 10
>     True
>
> Currently BeforePosition and AfterPosition act like
> the integer position for comparison etc. I find this
> reasonable given we have to treat them as the
> integer for things like extracting the sequence.
>
>> How should I name the TestCases?
>>
>
> Something like test_SeqFeature.py and using
> unittest. Most existing tests in this area are in
> doctests and test_SeqIO_feature.py
>
> Peter
>
>
>
> ------------------------------
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> End of Biopython-dev Digest, Vol 105, Issue 15
> **********************************************
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqfeature_eq.py
Type: text/x-python-script
Size: 1505 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20111018/d7133fd9/attachment.bin>

From p.j.a.cock at googlemail.com  Tue Oct 18 09:20:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 18 Oct 2011 14:20:34 +0100
Subject: [Biopython-dev] SeqFeature comparison for equality
In-Reply-To: <bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>
References: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
	<bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>
Message-ID: <CAKVJ-_6o6b1L4fYB25JV-CJWXO+MiMJn-FoNGwjLJ5tP4SfAzw@mail.gmail.com>

On Tue, Oct 18, 2011 at 1:59 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
> Hi,
> I don't know if this can help,
> but I've been subclassing seqfeature and seqrecord objects to assert
> equalities.
> I've attached the very simple code for the seqfeature equality
> Handling complex location equalities with a given set of rules could be
> misleading.
> a feature starting in position 11 is different, for me, from one located
> at position 12.
>
> Andrea

That looks reasonable for basic SeqFeature comparison, although
comparing the annotations in the qualifiers dict is debatable
(as with SeqRecord object's annotation).

Given the way join locations (etc) are currently handled, it
would be important to also compare the sub-features.

I think it would be more practical to first (and perhaps only)
implement equality testing for FeatureLocation (checking
start, end, strand, ref and db_ref), then you can compare
the location of a SeqFeature easily with:

f1.location == f2.location.

Peter

From carlcrott at gmail.com  Tue Oct 18 12:18:39 2011
From: carlcrott at gmail.com (carl crott)
Date: Tue, 18 Oct 2011 12:18:39 -0400
Subject: [Biopython-dev] fixes on the tutorials
In-Reply-To: <CAKVJ-_5y8Fn6xFJn0uU_X=9Vn+oCA6t3TFNs+GDNHVXwd-jkZQ@mail.gmail.com>
References: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
	<CAKVJ-_4w+R-=PeUdCSXOkG7G4XxPNyP2G+rnW8bcgsumKW9PXA@mail.gmail.com>
	<CAE2MQmB3OuuVHkFcoQmujccK0aKqBfFy308BkjJumsP0vVCrzg@mail.gmail.com>
	<CAKVJ-_6yUFwQ0U6XYEDvGS5OxXN7TQqD1Vwd3qB7QTWTiYoKxw@mail.gmail.com>
	<CAKVJ-_5y8Fn6xFJn0uU_X=9Vn+oCA6t3TFNs+GDNHVXwd-jkZQ@mail.gmail.com>
Message-ID: <CAE2MQmDTSCnoSEKGNLBb79fL87U+rZT0rOi3c_q9BLEJqmYHNg@mail.gmail.com>

Peter and other devs,

I'm deeply interested in any kind of HMM applications ... As I'm not quite a
biologist if you guys wanted to 'sic me' on any particular bug related to
these let me know .. however as far as the GIT stuff .. that would be more
of the control for updates and merging all the code that you guys work on
separately.

toodles!

-Carl


On Tue, Oct 18, 2011 at 5:36 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Oct 17, 2011 at 2:34 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > ...
> >
> > P.S. Don't forget to CC the mailing list ;)
>
> Apologies for posting that to the wrong development mailing list
> (samtools rather than biopython), I need to be more careful with
> autocomplete.
>
> Peter
>


-- 
Carl Crott
Web Applications Engineer
www.black-glass.com
412-610-0600

From mjldehoon at yahoo.com  Tue Oct 18 22:39:53 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 18 Oct 2011 19:39:53 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
Message-ID: <1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>

Hi Peter,

> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.

In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used in the _open function, which is an internal function used in the "search" function in Bio.SCOP. The UndoHandle is used to wrap a handle returned by urllib.urlopen.

This search function returns a handle to data in HTML format. I don't think we have a parser for it. This suggests that there is no specific purpose for UndoHandle in Bio.SCOP._open.

So I would suggest to just remove the UndoHandle from Bio.SCOP._open and return the urllib.urlopen handle directly.
Any objections?

--Michiel.

--- On Mon, 10/17/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Bio.File
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Monday, October 17, 2011, 11:03 AM
> Hi Michiel,
> 
> Regarding code using Bio.File, which you asked about
> deprecating last month:
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
> 
> I objected at the time because I was using it for the
> TogoWS code I was working on,
> 
> On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>> If the server could be relied on to always
> give an
> >>> HTTP error code this wouldn't be needed:
> >>>
> >>> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
> >>>
> >
> > ...
> >
> > [Some of those TogoWS checks are probably superfluous
> > right now, I'm still polishing the error handling -
> some of
> > which will rely on TogoWS itself catching more
> conditions]
> 
> I've updated my TogoWS to rely on the HTTP error codes,
> and removed the heuristic error detection which required
> Bio.File for the UndoHandle. That seems to be working fine
> now.
> 
> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.
> 
> Regards,
> 
> Peter
> 

From mjldehoon at yahoo.com  Tue Oct 18 22:46:33 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 18 Oct 2011 19:46:33 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
Message-ID: <1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>

I agree that it doesn't make sense to have a separate module for this. Even if we put it in Bio/__init__.py, people are likely to forget about it, and we will end up with some modules that use this code in Bio/__init__.py and other modules that copy this code in their source code. As this code is very short, I would just copy it into the modules that use it.

Best,
--Michiel.


--- On Mon, 10/17/11, Jo?o Rodrigues <anaryin at gmail.com> wrote:
I think it doesn't make sense to keep the module for 5 lines of code.?
? ? if isinstance(handleish, basestring):

? ? ? ? with open(handleish, mode) as fp:? ? ? ? ? ? yield fp

? ? else:? ? ? ? yield handleish


I'd either place them in __init__.py or just insert them in all Bio.*IO modules wherever needed. If we had more snippets in common with all *IOs, it would be valuable and understandable to have a separate module, but as is it's a bit unnecessary IMHO.


From p.j.a.cock at googlemail.com  Wed Oct 19 04:49:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 09:49:27 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>
References: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>
Message-ID: <CAKVJ-_7ufiA5ibrqCP1WqO+3Tt_b1JeajXJCcx9V0g+xRepLFA@mail.gmail.com>

On Wed, Oct 19, 2011 at 3:39 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
>> That leaves Bio/SCOP/__init__.py as the only existing or
>> imminent code using Bio.File, so if we can sort that out,
>> we can deprecate Bio.File as you suggested.
>
> In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used in the _open
> function, which is an internal function used in the "search" function
> in Bio.SCOP. The UndoHandle is used to wrap a handle returned
> by urllib.urlopen.

Should we change that to use urllib2 for better error handling,
as in Bio.Entrez's _open?

> This search function returns a handle to data in HTML format.
> I don't think we have a parser for it. This suggests that there is
> no specific purpose for UndoHandle in Bio.SCOP._open.

I wonder if that is a sign of URL rot, it would make more sense
to get plain text back. Sadly there were no unit tests for this at
all until now, and I don't yet do anything with the handle other
than confirm we get one!

https://github.com/biopython/biopython/commit/10b94a7b5611edde5fe05f95406d927e5a6a02d9

> So I would suggest to just remove the UndoHandle from
> Bio.SCOP._open and return the urllib.urlopen handle directly.
>
> Any objections?

Sounds fine.

Peter

From p.j.a.cock at googlemail.com  Wed Oct 19 04:53:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 09:53:25 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
Message-ID: <CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>

2011/10/19 Michiel de Hoon <mjldehoon at yahoo.com>
>
> I agree that it doesn't make sense to have a separate module for this.

For just the one little function, maybe not. I suspect we may want
more "File related" things like this for Python 3, what with text vs
binary handles and so on, in which case keeping Bio/File.py is
sensible.

> Even if we put it in Bio/__init__.py, people are likely to forget about
> it, and we will end up with some modules that use this code in
> Bio/__init__.py and other modules that copy this code in their
> source code. As this code is very short, I would just copy it into
> the modules that use it.

It may be short, but duplicating this function all over the place
seems like a very bad idea. I think we should just be vigilant in
making sure it is used uniformly wherever we want to accept
either a handle or a filename. Perhaps some of the historically
handle-only parsers should start using it now?

Peter

From anaryin at gmail.com  Wed Oct 19 07:46:26 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 19 Oct 2011 13:46:26 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
Message-ID: <CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>

Hey Peter,


> For just the one little function, maybe not. I suspect we may want
> more "File related" things like this for Python 3, what with text vs
> binary handles and so on, in which case keeping Bio/File.py is
> sensible.
>

What kind of "things" are we talking about here? Could they be anticipated?


>
> It may be short, but duplicating this function all over the place
> seems like a very bad idea. I think we should just be vigilant in
> making sure it is used uniformly wherever we want to accept
> either a handle or a filename. Perhaps some of the historically
> handle-only parsers should start using it now?
>

Duplicating is not a beautiful solution I must agree, but keeping a module
and adding an import statement in every parser for only 5 lines isn't
neither.

I suggest we keep Bio.File, deprecating all the other functions, and
meanwhile look at which changes we could include due to Py3.

From p.j.a.cock at googlemail.com  Wed Oct 19 08:28:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 13:28:03 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
	<CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>
Message-ID: <CAKVJ-_4iSZAFH8=W71R8obECEsLpNFOm9KcCnwMH5e2q5jedcA@mail.gmail.com>

On Wed, Oct 19, 2011 at 12:46 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter,
>
>>
>> For just the one little function, maybe not. I suspect we may want
>> more "File related" things like this for Python 3, what with text vs
>> binary handles and so on, in which case keeping Bio/File.py is
>> sensible.
>
> What kind of "things" are we talking about here? Could they be
>?anticipated?
>

For instance, in Python 3 it might be useful for a parsing text
files efficiently to use binary mode (i.e. byte strings not unicode)
but also have universal newlines (which I think happens for
you automatically in Python 3 for text mode, i.e. unicode).

Surprisingly open(filename, "rbU") is accepted in Python 3,
but it acts like "rb", typical binary read mode.

>> It may be short, but duplicating this function all over the place
>> seems like a very bad idea. I think we should just be vigilant in
>> making sure it is used uniformly wherever we want to accept
>> either a handle or a filename. Perhaps some of the historically
>> handle-only parsers should start using it now?
>
> Duplicating is not a beautiful solution I must agree, but keeping
> a module and adding an import statement in every parser for
> only 5 lines isn't neither.
> I suggest we keep Bio.File, deprecating all the other functions, and
> meanwhile look at which changes we could include due to Py3.

Yes, that's what I am suggesting.

Peter


From mjldehoon at yahoo.com  Sat Oct 22 08:17:58 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 22 Oct 2011 05:17:58 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_7ufiA5ibrqCP1WqO+3Tt_b1JeajXJCcx9V0g+xRepLFA@mail.gmail.com>
Message-ID: <1319285878.88223.YahooMailClassic@web161206.mail.bf1.yahoo.com>

OK, done.

Best,
--Michiel

--- On Wed, 10/19/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Bio.File
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Wednesday, October 19, 2011, 4:49 AM
> On Wed, Oct 19, 2011 at 3:39 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Peter,
> >
> >> That leaves Bio/SCOP/__init__.py as the only
> existing or
> >> imminent code using Bio.File, so if we can sort
> that out,
> >> we can deprecate Bio.File as you suggested.
> >
> > In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used
> in the _open
> > function, which is an internal function used in the
> "search" function
> > in Bio.SCOP. The UndoHandle is used to wrap a handle
> returned
> > by urllib.urlopen.
> 
> Should we change that to use urllib2 for better error
> handling,
> as in Bio.Entrez's _open?
> 
> > This search function returns a handle to data in HTML
> format.
> > I don't think we have a parser for it. This suggests
> that there is
> > no specific purpose for UndoHandle in Bio.SCOP._open.
> 
> I wonder if that is a sign of URL rot, it would make more
> sense
> to get plain text back. Sadly there were no unit tests for
> this at
> all until now, and I don't yet do anything with the handle
> other
> than confirm we get one!
> 
> https://github.com/biopython/biopython/commit/10b94a7b5611edde5fe05f95406d927e5a6a02d9
> 
> > So I would suggest to just remove the UndoHandle from
> > Bio.SCOP._open and return the urllib.urlopen handle
> directly.
> >
> > Any objections?
> 
> Sounds fine.
> 
> Peter
> 

From p.j.a.cock at googlemail.com  Wed Oct 26 07:11:57 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 26 Oct 2011 12:11:57 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
Message-ID: <CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>

On Wed, Oct 26, 2011 at 12:02 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter,
> Thanks for the answer. How do I pass the matrix and which format should it
> be on? Is there an example I could read?
> Jo?o [...] Rodrigues
> http://nmr.chem.uu.nl/~joao

Not that I know of, but adding one to the docstrings and test_pairwise2.py
would be great. I think you use it with a score matrix as a dictionary from
Bio.SubsMat.MatrixInfo

Peter


From eric.talevich at gmail.com  Wed Oct 26 09:27:17 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 26 Oct 2011 09:27:17 -0400
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
Message-ID: <CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>

On Wed, Oct 26, 2011 at 7:11 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Oct 26, 2011 at 12:02 PM, Jo?o Rodrigues <anaryin at gmail.com>
> wrote:
> > Hey Peter,
> > Thanks for the answer. How do I pass the matrix and which format should
> it
> > be on? Is there an example I could read?
> > Jo?o [...] Rodrigues
> > http://nmr.chem.uu.nl/~joao
>
> Not that I know of, but adding one to the docstrings and test_pairwise2.py
> would be great. I think you use it with a score matrix as a dictionary from
> Bio.SubsMat.MatrixInfo
>
> Peter
>
>
Here's an example:

from Bio import pairwise2, SeqIO
from Bio.SubsMat.MatrixInfo import blosum62

# pairwise2 works with raw strings, not SeqRecords
seq1 = str(SeqIO.read("seq1.fa", "fasta"))
seq2 = str(SeqIO.read("seq2.fa", "fasta"))

results = pairwise2.align.globalds(seq1, seq2, blosum62, -10, -0.5)
# Returns a tuple: (seqA, seqB, score, begin, end)
return results[0][2]


From anaryin at gmail.com  Wed Oct 26 09:31:29 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 26 Oct 2011 15:31:29 +0200
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
Message-ID: <CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>

Hello all,

Coming back after lunch...

I managed to load a matrix using this:

from Bio import pairwise2
from Bio.SubsMat import MatrixInfo as m
#print dir(m)
matrix = m.blosum60
pairwise2.align.localdx(seqA, seqB, matrix)


Thanks a lot for the help, it was simple after all, just a bit hard to start
with..

From redmine at redmine.open-bio.org  Thu Oct 27 00:55:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 27 Oct 2011 04:55:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3308] (New) SeqIO FastaIO: Blank
	Descriptor causes Indes Out of Range
Message-ID: <redmine.issue-3308.20111027045553@redmine.open-bio.org>


Issue #3308 has been reported by Darren Cullerne.

----------------------------------------
Bug #3308: SeqIO FastaIO: Blank Descriptor causes Indes Out of Range
https://redmine.open-bio.org/issues/3308

Author: Darren Cullerne
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Entering a FASTA sequence with a blank descriptor:

">"
"ACTAGTACTAGATCAGACTACAGTACAGAGAGGACATCTATACTACGAGAGACATACTACTCAGCATACGATAC"

Causes the following error:
  File "C:\Python27\lib\site-packages\Bio\SeqIO\__init__.py", line 532, in parse
    for r in i:
  File "C:\Python27\lib\site-packages\Bio\SeqIO\FastaIO.py", line 49, in FastaIterator
    id   = descr.split()[0]
IndexError: list index out of range

Please let me know if there is any further information you require.

Thanks,


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Oct 27 10:03:42 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 27 Oct 2011 14:03:42 +0000
Subject: [Biopython-dev] [Biopython - Bug #3309] (New) GenBank Scanner
	expects sequence lines to start at position 9
Message-ID: <redmine.issue-3309.20111027140342@redmine.open-bio.org>


Issue #3309 has been reported by Liam Childs.

----------------------------------------
Bug #3309: GenBank Scanner expects sequence lines to start at position 9
https://redmine.open-bio.org/issues/3309

Author: Liam Childs
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 1.57
URL: 


Some programs (eg. Vector NTI and Lasegene) produce GenBank files where the sequences start at an index on the line other than index 9. I don't know how tightly defined the GenBank file format is, but if the indent for the start of the sequence can be variable, it seems to me there is a simple fix.

Current version (Bio/GenBank/Scanner.py:904):
line = self.line
... 15 lines
if len(line) > 9 and line[9:10]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

Simple fix 1 (variable per file):
line = self.line
idx = line.find('1') + 1
... 15 lines
if len(line) > idx and line[idx:idx + 1]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

The index can be obtained in any number of ways, this was the simplest I could think of off the top of my head. If sequences are allowed to start at a position other than '1', then maybe a regular expression should be used instead.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Oct 27 10:46:08 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 27 Oct 2011 15:46:08 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
Message-ID: <CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>

On Wed, Oct 26, 2011 at 2:31 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hello all,
> Coming back after lunch...
> I managed to load a matrix using this:
>
> from Bio import pairwise2
> from Bio.SubsMat import MatrixInfo as m
> #print dir(m)
> matrix = m.blosum60
> pairwise2.align.localdx(seqA, seqB, matrix)
>
> Thanks a lot for the help, it was simple after all, just a bit hard to start
> with..

Hi Jo?o,

Could you write a little documentation for the pairwise2 docstring?
Just something short based on the above example would be great
(ideally as a doctest).

Thanks,

Peter


From anaryin at gmail.com  Thu Oct 27 10:52:25 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Thu, 27 Oct 2011 16:52:25 +0200
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
Message-ID: <CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>

Sure thing. The docstring is actually pretty explicit, it's just missing the
part that you can get the matrices from SubsMat. Or at least, not that
clear. I'll go over it this weekend, maybe earlier.

Best,

Jo?o


From p.j.a.cock at googlemail.com  Fri Oct 28 12:15:36 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 28 Oct 2011 17:15:36 +0100
Subject: [Biopython-dev] Fwd: [Utilities-announce] Upcoming Release of NCBI
	EFetch version 2.0
In-Reply-To: <A9D8BF3D8A74DF4A925FB541C0F39D2A164585570F@NIHMLBX15.nih.gov>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A164585570F@NIHMLBX15.nih.gov>
Message-ID: <CAKVJ-_5yMwCAm1=NjxbK+4YBKNiunXE==M18jiC8a4Yc-X=wyg@mail.gmail.com>

Hi all,

We may need to update Bio.Entrez for EFetch v2.0 soon, although at first
glance there is nothing that will obviously cause trouble...

Peter

---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Fri, Oct 28, 2011 at 4:15 PM
Subject: [Utilities-announce] Upcoming Release of NCBI EFetch version 2.0
To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>


Upcoming Release of EFetch version 2.0


In November 2011 NCBI plans to release version 2.0 of EFetch. The
major changes and updates are as follows:

????????? EFetch now supports the following databases: biosample,
biosystems and sra

????????? EFetch now has defined default values for &retmode and
&rettype for all supported databases (please see Table 1 for all
supported values of these parameters)

????????? EFetch no longer supports &retmode=html; requests containing
&retmode=html will return data using the default &retmode value for
the specified database (&db)

????????? EFetch requests including &rettype=docsum will return XML
data equivalent to ESummary output

Details about EFetch can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch


An updated, complete listing of supported &rettype and &retmode values
can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report=objectonly


Release notes about this and future releases can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Release_Notes


Please write to info at ncbi.nlm.nih.gov if you have any questions about
these changes.


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
-------------- next part --------------
_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

From redmine at redmine.open-bio.org  Fri Oct 28 19:45:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 28 Oct 2011 23:45:53 +0000
Subject: [Biopython-dev] [Biopython - Feature #3310] (New) HMMER parser
Message-ID: <redmine.issue-3310.20111028234553@redmine.open-bio.org>


Issue #3310 has been reported by J M.

----------------------------------------
Feature #3310: HMMER parser
https://redmine.open-bio.org/issues/3310

Author: J M
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


This is a parser for the output of hmmsearch from the HMMER package.  Given the output of the hmmsearch, this program can retrieve information for each of the alignments including the expected values, the starting and ending positions of each alignment, as well as insert, deletion and mismatch information for each alignment.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Fri Oct 28 22:00:07 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sat, 29 Oct 2011 02:00:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3311] (New) GFF parser fails to
	intelligently break lines
Message-ID: <redmine.issue-3311.20111029020007@redmine.open-bio.org>


Issue #3311 has been reported by gahoo lee.

----------------------------------------
Bug #3311: GFF parser fails to intelligently break lines
https://redmine.open-bio.org/issues/3311

Author: gahoo lee
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Move from "BioStar":http://biostar.stackexchange.com/questions/13651/gff-parsing-in-python-is-not-so-perfect

I use BCBio.GFF to parse "chr01.gff3":ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/chr01.dir/chr01.gff3 and "all.gff3":ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/all.dir/all.gff3 . But things didn't work out as I expect. Here's the code:

@from BCBio import GFF
limits = dict(gff_type = ["gene","mRNA","CDS"])
gff_handle = open('chr01.gff3')
for rec in GFF.parse(gff_handle,target_lines=1000,limit_info=limits):
    #Chromosome seq level
    for gene_feature in rec.features:
        #gene level
        for mRNA_feature in gene_feature.sub_features:
            #mRNA level
            print mRNA_feature.type
            print mRNA_feature.qualifiers['Alias']@

And I got:

@Traceback (most recent call last):
  File "R:\Untitled 1.py", line 14, in <module>
    print mRNA_feature.qualifiers['Alias']
KeyError: 'Alias'@

And the 'type' is "CDS" which is not correct. When parsing without

@target_lines=1000@

everything is ok. But parsing all.gff3 came to the same problem. Maybe all.gff3 is too huge to parse.

The problem might be due to the parser did not recognise the entry boudary correctly.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Oct  3 11:20:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 3 Oct 2011 12:20:21 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
Message-ID: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>

Hi Brad (et al),

You might have seen on Twitter at the end of last week I mentioned
some work to extend Brad's Bio.Graphics.BasicChromosome to allow
features within a chromosome segment, optionally with labels.

The branch is here:
https://github.com/peterjc/biopython/tree/chr_diag

I put together a non-trivial example of showing the tRNA genes in
Arabidopsis as a unit test in test_GraphicsChromosome.py - this is
deliberately showing too many features in order to check the label
placement algorithm:

http://twitpic.com/6sgr1m

This kind of figure is also used for showing SNP placement and genetic
marker loci used in breeding etc.

If I had put more (or a more uniform set of) features you'd get
something worthy of the nickname "millipede diagram", looking like a
segmented body (the chromosome) with thousands of legs (the lines for
the labels).

This isn't quite backwards compatible - the old code draws the
chromosomes left aligned within their allocated space, while I put
them centrally in order to draw labels on either side.

Iddo sounded enthusiastic on Twitter. Does this look worth including
as is? Would someone (doesn't have to be Brad) like to test/review it
please?

Thanks,

Peter


From bioinformed at gmail.com  Mon Oct  3 21:28:21 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Mon, 3 Oct 2011 17:28:21 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
Message-ID: <CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> You might have seen on Twitter at the end of last week I mentioned
> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> features within a chromosome segment, optionally with labels.
>
>

This looks to be extremely useful.  Is there any support for layouts to
stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
don't fit as well in linear displays.  I also think supporting chromosome
bands would be extremely useful.  These could include full cytobands,
centromeres, euchromatic vs hetrochromatic regions, user configurable bands
(e.g. linkage regions, IBD blocks, etc.)

The figure shows off what I'm thinking about the banding and layout, even
though it uses colored circles instead of text labels:
http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf

If there is interest, I may have some time to work on these features once
the basic infrastructure is stable.

Best regards,
-Kevin


From p.j.a.cock at googlemail.com  Mon Oct  3 22:24:12 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 3 Oct 2011 23:24:12 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
Message-ID: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>

On Monday, October 3, 2011, Kevin Jacobs &lt;jacobs at bioinformed.com&gt; <
bioinformed at gmail.com> wrote:
> On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock at googlemail.com
>wrote:
>
>> You might have seen on Twitter at the end of last week I mentioned
>> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> features within a chromosome segment, optionally with labels.
>>
>>
>
> This looks to be extremely useful.  Is there any support for layouts to
> stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
> don't fit as well in linear displays.  I also think supporting chromosome
> bands would be extremely useful.  These could include full cytobands,
> centromeres, euchromatic vs hetrochromatic regions, user configurable
bands
> (e.g. linkage regions, IBD blocks, etc.)
>
> The figure shows off what I'm thinking about the banding and layout, even
> though it uses colored circles instead of text labels:
> http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf
>
> If there is interest, I may have some time to work on these features once
> the basic infrastructure is stable.
>
> Best regards,
> -Kevin

Hi Kevin,

I'm glad to hear there is some interest in this :)

That example you linked to is interesting - there are several
things of specific interest - and helps as I'm not yet
familiar with all the technical terms you used.

Notches in the chromosome which I assume are centromeres
(I can see how that might be added to the Bio code as another
segment type, similar to the telomeres).

Coloured background regions in the chromosome (should be
able to do this already), some of which are hatched (not possible
right now... would have to look into ReportLab's capabilities here).
This is what you meant by banding?

Multiple coloured dots for labels. Doable, but a nice API might
be tricky.

For layout did you mean the fact this isn't just a row of
chromosomes left to right, but here there are two rows?
I'm inclined to say the user should just move things in
the PDF for a final version using Adobe of Inkscape ;)

Regards,

Peter


From keith.hughitt at gmail.com  Tue Oct  4 11:31:51 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Tue, 4 Oct 2011 07:31:51 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
Message-ID: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>

Hi all,

I was thinking recently that it would be nice if the FASTA file reader were
able to check for known formats (e.g. NCBI) and then use that information to
choose better values for name, id, etc.

After some discussion with Peter Cock on GitHub, however, he convinced me
that this would be problematic in terms of backwards compatibility, and that
instead a better approach might be to add a new sub-format ("fasta-ncbi") to
the list of supported format readers.

This could go something like:

1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
files. Add it the the mapping of iterators.
2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
the result by assigning a new id, name, etc (other suggestions?)
3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)
4. Modify code that interacts with NCBI services which return FASTA files
and have it return a NCBIFasterIterator (First use a deprecation/warning to
let users know of the pending change?)

Does this sound like it would be a useful feature? What about the basic
approach outlined above? Any suggestions?

Keith


From p.j.a.cock at googlemail.com  Tue Oct  4 11:46:19 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 4 Oct 2011 12:46:19 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
Message-ID: <CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>

On Tue, Oct 4, 2011 at 12:31 PM, Keith Hughitt <keith.hughitt at gmail.com> wrote:
> Hi all,
>
> I was thinking recently that it would be nice if the FASTA file reader were
> able to check for known formats (e.g. NCBI) and then use that information to
> choose better values for name, id, etc.
>
> After some discussion with Peter Cock on GitHub, however, he convinced me
> that this would be problematic in terms of backwards compatibility, and that
> instead a better approach might be to add a new sub-format ("fasta-ncbi") to
> the list of supported format readers.
>
> This could go something like:
>
> 1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
> files. Add it the the mapping of iterators.

Yes.

> 2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
> the result by assigning a new id, name, etc (other suggestions?)

Store the GI number in the SeqRecord's annotation under key "gi"
to match the GenBank parser. There may be other things like this.

If the FASTA header does not match the NCBI style, that should
probably trigger an exception.

> 3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)

This will be harder, but yes in principle.

> 4. Modify code that interacts with NCBI services which return FASTA files
> and have it return a NCBIFasterIterator (First use a deprecation/warning to
> let users know of the pending change?)

No need. I'm pretty sure all the NCBI code (like Bio.Entrez) returns
handles so it is up to the end user to decide what to do with the
data, e.g. parse it with the current SeqIO "fasta" format, or save it
straight to disk.

> Does this sound like it would be a useful feature? What about the basic
> approach outlined above? Any suggestions?
>
> Keith

Yes, it sounds useful. I'm not sure where the most current NCBI
documentation is, but this is a good start:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html

Peter


From chapmanb at 50mail.com  Wed Oct  5 12:03:31 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 05 Oct 2011 08:03:31 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
Message-ID: <87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>


Peter;

> >> You might have seen on Twitter at the end of last week I mentioned
> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> >> features within a chromosome segment, optionally with labels.

This is awesome, thanks for extending it. All of your tweaks are good
improvements, and I'm +1 for including it in the next release. Please
improve away.

Thanks much,
Brad


From bioinformed at gmail.com  Wed Oct  5 13:16:56 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Wed, 5 Oct 2011 09:16:56 -0400
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
Message-ID: <CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>

On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Notches in the chromosome which I assume are centromeres
> (I can see how that might be added to the Bio code as another
> segment type, similar to the telomeres).
>

Yes-- although the visual style for centromeres need not be precisely as
shown in my example.


> Coloured background regions in the chromosome (should be
> able to do this already), some of which are hatched (not possible
> right now... would have to look into ReportLab's capabilities here).
> This is what you meant by banding?
>

Yes-- being able to show cytobands and custom bands to designate regions
will be very useful for me.  As before, I'm not wed to the cross-hatching,
in fact the standard displays use only grayscale.

Multiple coloured dots for labels. Doable, but a nice API might
> be tricky.
>

I don't much care about those -- I'd be happy with text labels.


> For layout did you mean the fact this isn't just a row of
> chromosomes left to right, but here there are two rows?
> I'm inclined to say the user should just move things in
> the PDF for a final version using Adobe of Inkscape ;)
>

Correct.  I'd prefer to have some programmatic control of layout, since I'd
hate to have to manually edit every whole-genome plot.  Since I'm working
exclusively with human data for now, it would be possible to pre-specify a
few standard layouts and avoid the trouble of supporting dynamic features.

Just let me know when the code is stable enough to start poking around.
 I'll float a proposal for what I think could be done to obtain feedback
before I commit much time to coding.

Thanks,
-Kevin


From p.j.a.cock at googlemail.com  Wed Oct  5 13:32:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 14:32:34 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
Message-ID: <CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>

On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>
>> Notches in the chromosome which I assume are centromeres
>> (I can see how that might be added to the Bio code as another
>> segment type, similar to the telomeres).
>
> Yes-- although the visual style for centromeres need not be precisely as
> shown in my example.
>
>>
>> Coloured background regions in the chromosome (should be
>> able to do this already), some of which are hatched (not possible
>> right now... would have to look into ReportLab's capabilities here).
>> This is what you meant by banding?
>
> Yes-- being able to show cytobands and custom bands to designate regions
> will be very useful for me. ?As before, I'm not wed to the cross-hatching,
> in fact the standard displays use only grayscale.

OK - simple colours are easy, I can add that to the test case example.

>>
>> Multiple coloured dots for labels. Doable, but a nice API might
>> be tricky.
>
> I don't much care about those -- I'd be happy with text labels.
>

Good.

>>
>> For layout did you mean the fact this isn't just a row of
>> chromosomes left to right, but here there are two rows?
>> I'm inclined to say the user should just move things in
>> the PDF for a final version using Adobe of Inkscape ;)
>
> Correct. ?I'd prefer to have some?programmatic?control of layout, since I'd
> hate to have to manually edit every whole-genome plot. ?Since I'm working
> exclusively with human data for now, it would be possible to pre-specify a
> few standard layouts and avoid the trouble of supporting dynamic features.
> Just let me know when the code is stable enough to start poking around.
> ?I'll float a proposal for what I think could be done to obtain feedback
> before I commit much time to coding.

Would an option for using multiple rows be enough? It
wouldn't be quite as compact as the tweaked human
example you showed - but probably good enough to
print on a single page.

Another option is to do the PDF editing programmatically,
for example with ReportLab. You can embed multiple
(smaller) PDF files within a larger container. Its a bit
fiddly, but would be worth the effort for a major pipeline
where you always use the same (few) organism(s).

Peter


From p.j.a.cock at googlemail.com  Wed Oct  5 14:40:56 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 15:40:56 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
Message-ID: <CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>

On Wed, Oct 5, 2011 at 1:03 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> >> You might have seen on Twitter at the end of last week I mentioned
>> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> >> features within a chromosome segment, optionally with labels.
>
> This is awesome, thanks for extending it. All of your tweaks are good
> improvements, and I'm +1 for including it in the next release. Please
> improve away.

Awesome. I've applied the current branch to the trunk, although
I'm not promising there won't be changes to the new stuff between
now and the next release.

In particular, doing the labels (and their placement) for the whole
of a chromosome (and not just for a segment) would allow us to
squeeze in more labels (e.g. in example I showed using the
vertical space currently reserved for the telomeres).

Peter


From p.j.a.cock at googlemail.com  Wed Oct  5 21:17:38 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 22:17:38 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<CAD=vDipujFdqOSm5VmdDV+_HD3NwL=E=611Fvrz-6Z5AXNy2cQ@mail.gmail.com>
	<CAKVJ-_4G2LDL-t5x-MtoQ6wDT1C4Z4_LwDW3H5TO2ae_9keUrg@mail.gmail.com>
Message-ID: <CAKVJ-_4TZGr4KoZmfr_7cCPcyZ1XLRdQg3ytY4JynFSz==n6SQ@mail.gmail.com>

On Wed, Oct 5, 2011 at 2:32 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
>> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>> Coloured background regions in the chromosome (should be
>>> able to do this already), some of which are hatched (not possible
>>> right now... would have to look into ReportLab's capabilities here).
>>> This is what you meant by banding?
>>
>> Yes-- being able to show cytobands and custom bands to designate regions
>> will be very useful for me. ?As before, I'm not wed to the cross-hatching,
>> in fact the standard displays use only grayscale.
>
> OK - simple colours are easy, I can add that to the test case example.

Done, using some random placements - I didn't manage to find
the real Arabidopsis cytoband data which would have been nicer.
https://github.com/biopython/biopython/commit/24deaca63ba55e28519a4c85650ad74e849f203e

Peter


From p.j.a.cock at googlemail.com  Wed Oct  5 22:31:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 5 Oct 2011 23:31:18 +0100
Subject: [Biopython-dev] Enhancements to Bio.Graphics.BasicChromosome
In-Reply-To: <CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>
References: <CAKVJ-_5qY_D9ki+FTikqWugfRTivNypjaQXJTVRkE896gfH4hA@mail.gmail.com>
	<CAD=vDipyUuJmV_MtpQNVRiWSivSOy-J7BjQuqn-jfyCh1p-XWQ@mail.gmail.com>
	<CAKVJ-_4cQkgqptt2x_rgmumoJ0o6Jkema-Dga_gykj39-Xt3mA@mail.gmail.com>
	<87k48j8x2k.fsf@sobchak.i-did-not-set--mail-host-address--so-tickle-me>
	<CAKVJ-_7miX-3Ou8DBHKdFJftv8CWbrsouyopbeDNKwHNdkbBNA@mail.gmail.com>
Message-ID: <CAKVJ-_73kxfxFGB2ugeUteGSGVt9ZycRyWERB4J41HX2kneJiw@mail.gmail.com>

On Wed, Oct 5, 2011 at 3:40 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> In particular, doing the labels (and their placement) for the whole
> of a chromosome (and not just for a segment) would allow us to
> squeeze in more labels (e.g. in example I showed using the
> vertical space currently reserved for the telomeres).
>

Done,

https://github.com/biopython/biopython/commit/d3d19440bdbaabbf4cd305e43dea627f68cf6ecf

We may want to review how chromosome segment labels work -
probably simplest to add them to the dynamically placed label
list, otherwise the two can overlap.

Peter


From tiagoantao at gmail.com  Thu Oct  6 16:17:40 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Oct 2011 17:17:40 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
Message-ID: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>

Hi,

This might be a red herring but:
http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html :
sprot_search_ful(text, make_wild=None, swissprot=1, trembl=None,
cgi='http://www.expasy.ch/cgi-bin/sprot-search-ful')

That cgi does not exist...

Tiago
-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From p.j.a.cock at googlemail.com  Thu Oct  6 16:23:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 6 Oct 2011 17:23:03 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
In-Reply-To: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
References: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
Message-ID: <CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>

2011/10/6 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> This might be a red herring but:
> http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html :
> sprot_search_ful(text, make_wild=None, swissprot=1, trembl=None,
> cgi='http://www.expasy.ch/cgi-bin/sprot-search-ful')
>
> That cgi does not exist...
>
> Tiago

Looks like they've changed the URL or turned off a redirect :(

If you can work out what they should be, please go ahead an fix it.
A working unit test would be good (mark it as requires internet).

Peter


From tiagoantao at gmail.com  Thu Oct  6 16:33:11 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 6 Oct 2011 17:33:11 +0100
Subject: [Biopython-dev] bio.expasy potential bug?
In-Reply-To: <CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>
References: <CAA9RGEOpKfhYcWY3K9f49KtuR-BKER=e_KGRDP77P8sdGHz_Lg@mail.gmail.com>
	<CAKVJ-_6yw5D0aVwRmZ20uHOXxfi4qz2OWV_tnRHaZciOiG3rmQ@mail.gmail.com>
Message-ID: <CAA9RGEOTkzX6+UY019sxi2GTYM9rB3-HG8+z=KyRpQhGoM6p-A@mail.gmail.com>

2011/10/6 Peter Cock <p.j.a.cock at googlemail.com>:
> Looks like they've changed the URL or turned off a redirect :(
>
> If you can work out what they should be, please go ahead an fix it.
> A working unit test would be good (mark it as requires internet).

I will add the bug to redmine. I currently am pressed on time to sort
this out :(

I can have a look next week.


From redmine at redmine.open-bio.org  Thu Oct  6 17:06:26 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 6 Oct 2011 17:06:26 +0000
Subject: [Biopython-dev] [Biopython - Bug #3301] (New) Bio.ExPASy
	sprot_search_ful has wrong cgi address
Message-ID: <redmine.issue-3301.20111006170626@redmine.open-bio.org>


Issue #3301 has been reported by Tiago Antao.

----------------------------------------
Bug #3301: Bio.ExPASy sprot_search_ful has wrong cgi address
https://redmine.open-bio.org/issues/3301

Author: Tiago Antao
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


The Bio.ExPASy sprot_search_ful has a cgi of http://www.expasy.ch/cgi-bin/sprot-search-ful , but that URL is not available anymore.
See:
http://biopython.org/DIST/docs/api/Bio.ExPASy-module.html 


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From keith.hughitt at gmail.com  Fri Oct  7 11:18:10 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Fri, 7 Oct 2011 07:18:10 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
Message-ID: <CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>

Okay, I took at stab at it. The code is in the master branch of my fork:
https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73

I wasn't sure what the best choices are for id/name so for now I stored the
gid in id (and also in the annotations), and the accession as name. Any
suggestions?

I also haven't written any test code yet. Should I parameterize
TitleFunctions.simple_check and multi_check, or is there another approach
you would advise?

Keith


From p.j.a.cock at googlemail.com  Fri Oct  7 12:49:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 13:49:30 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
Message-ID: <CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>

On Fri, Oct 7, 2011 at 12:18 PM, Keith Hughitt <keith.hughitt at gmail.com> wrote:
> Okay, I took at stab at it. The code is in the master branch of my
> fork:?https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73

You are only handling gi|<gi_num>|ref|<accession>|<description>
whereas the NCBI have a *lot* of other variations to consider:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html

This is quite an open ended bit of work...

> I wasn't sure what the best choices are for id/name so for now I stored the
> gid in id (and also in the annotations), and the accession as name. Any
> suggestions?

I suggest collecting a selection of matched NCBI FASTA and
GenBank/GenPept files, and how Biopython handles the
GenBank/GenPept version (format name "genbank" alias "gb"
in Bio.SeqIO) and try to make handling the FASTA version as
"fasta-ncbi" do the same.

e.g. From our unit tests (from the NCBI FTP site), these are
a pair:

Tests/GenBank/NC_005816.gb
Tests/GenBank/NC_005816.fna

> I also haven't written any test code yet. Should I parameterize
> TitleFunctions.simple_check and multi_check, or is there
> another approach you would advise?
> Keith

Probably write some completely new tests. e.g. Use the
existing test files mentioned above, and verify that both
the "genbank" and the "fasta-ncbi" parser give the same
results (ignoring things not in the FASTA file of course).

Peter


From andrew.sczesnak at med.nyu.edu  Fri Oct  7 15:38:04 2011
From: andrew.sczesnak at med.nyu.edu (Andrew Sczesnak)
Date: Fri, 07 Oct 2011 11:38:04 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
Message-ID: <4E8F1CDC.8090500@med.nyu.edu>

Adding my unsolicited opinion here, what do y'all think of this 
NCBIFasta parser being a more general "callback" parser, where a 
function passed to read() or write() translates some arbitrary 
delimited-text into an (id, name, description) tuple, as in:

def x(seqrec):
     # gi|<gi_num>|ref|<accession>|<description>
     y = seqrec.description.strip().split("|")

     #       gi     acc  desc
     return (y[1], y[3]. y[4])

# calls x on every record in the FASTA
for seqrec in SeqIO.parse(fp, "fasta", x):
     print seqrec

This would be similar to key_function in SeqIO.to_dict() and would shift 
the responsibility of handling variation in formats to the user. 
Alternatively, a few functions to parse different styles of description 
lines could be included in the module.


Andrew

On 10/07/2011 08:49 AM, Peter Cock wrote:
> On Fri, Oct 7, 2011 at 12:18 PM, Keith Hughitt<keith.hughitt at gmail.com>  wrote:
>> Okay, I took at stab at it. The code is in the master branch of my
>> fork: https://github.com/khughitt/biopython/blob/75be77cf28d376329577adf5ec41a8880b7faf5c/Bio/SeqIO/FastaIO.py#L73
>
> You are only handling gi|<gi_num>|ref|<accession>|<description>
> whereas the NCBI have a *lot* of other variations to consider:
>
> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/formatdb_fastacmd.html
>
> This is quite an open ended bit of work...
>
>> I wasn't sure what the best choices are for id/name so for now I stored the
>> gid in id (and also in the annotations), and the accession as name. Any
>> suggestions?
>
> I suggest collecting a selection of matched NCBI FASTA and
> GenBank/GenPept files, and how Biopython handles the
> GenBank/GenPept version (format name "genbank" alias "gb"
> in Bio.SeqIO) and try to make handling the FASTA version as
> "fasta-ncbi" do the same.
>
> e.g. From our unit tests (from the NCBI FTP site), these are
> a pair:
>
> Tests/GenBank/NC_005816.gb
> Tests/GenBank/NC_005816.fna
>
>> I also haven't written any test code yet. Should I parameterize
>> TitleFunctions.simple_check and multi_check, or is there
>> another approach you would advise?
>> Keith
>
> Probably write some completely new tests. e.g. Use the
> existing test files mentioned above, and verify that both
> the "genbank" and the "fasta-ncbi" parser give the same
> results (ignoring things not in the FASTA file of course).
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From p.j.a.cock at googlemail.com  Fri Oct  7 16:00:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 17:00:52 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <4E8F1CDC.8090500@med.nyu.edu>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
Message-ID: <CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>

On Fri, Oct 7, 2011 at 4:38 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> Adding my unsolicited opinion here, what do y'all think of this NCBIFasta
> parser being a more general "callback" parser, where a function passed to
> read() or write() translates some arbitrary delimited-text into ...
>
> This would be similar to key_function in SeqIO.to_dict() and would shift the
> responsibility of handling variation in formats to the user. Alternatively,
> a few functions to parse different styles of description lines could be
> included in the module.
>
> Andrew

Hi Andrew,

Interesting idea, although it doesn't fit that well with the current
(deliberately) simple high level Bio.SeqIO.parse/read API,
that doesn't mean we can't do it (see Bio.Phylo.parse).

In this case I fail to see what benefit this gives over the current
situation, where the user can do this themselves with the
current FASTA parser,

e.g. With a function and a generator expression,

records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))

or more simply within a loop:

for record in SeqIO.parse(filename, "fasta")):
    do_ncbi_my_way(record)
    #Do stuff with record

etc.

Maybe it is down to personal preference of coding style?

I would much prefer a new "fasta-ncbi" parser in SeqIO
that handled all the documented NCBI FASTA identifiers.

I'm being negative here - but please don't let that deter you
from posting ideas. This is a public list and we/I welcome
constructive criticism and alternative ideas to the table.

Regards,

Peter


From p.j.a.cock at googlemail.com  Fri Oct  7 16:16:55 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 7 Oct 2011 17:16:55 +0100
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <4E8F239D.30504@med.nyu.edu>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
	<4E8F239D.30504@med.nyu.edu>
Message-ID: <CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>

On Fri, Oct 7, 2011 at 5:06 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
>>
>> Maybe it is down to personal preference of coding style?
>
> I agree, there isn't much difference between specifying the callback
> function in parse() or within the loop. To me, this points out that
> re-implementing a FASTA parser simply for a format of description
> line seems unnecessary.
>
> If a user is interesting in extracting a particular piece of information
> from a FASTA description and knows the input format of the file, how
> difficult is it for them to split() it on their own? What exactly are the
> advantages of a separate parser?

Not enough of an advantage for me personally to have gone
and written it myself ;)

I can see some benefits in extracting information from the
NCBI identifier and storing them in the SeqRecord's dbxref
list and annotation dictionary (as consistently with our other
parsers as possible) if you are going to want to use those
fields yourself.

Perhaps Keith can explain his interest with some examples?

Peter


From andrew.sczesnak at med.nyu.edu  Fri Oct  7 16:06:53 2011
From: andrew.sczesnak at med.nyu.edu (Andrew Sczesnak)
Date: Fri, 07 Oct 2011 12:06:53 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
Message-ID: <4E8F239D.30504@med.nyu.edu>

On 10/07/2011 12:00 PM, Peter Cock wrote:
> Hi Andrew,
>
> Interesting idea, although it doesn't fit that well with the current
> (deliberately) simple high level Bio.SeqIO.parse/read API,
> that doesn't mean we can't do it (see Bio.Phylo.parse).
>
> In this case I fail to see what benefit this gives over the current
> situation, where the user can do this themselves with the
> current FASTA parser,
>
> e.g. With a function and a generator expression,
>
> records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))
>
> or more simply within a loop:
>
> for record in SeqIO.parse(filename, "fasta")):
>      do_ncbi_my_way(record)
>      #Do stuff with record
>
> etc.
>
> Maybe it is down to personal preference of coding style?

I agree, there isn't much difference between specifying the callback 
function in parse() or within the loop. To me, this points out that 
re-implementing a FASTA parser simply for a format of description line 
seems unnecessary.

If a user is interesting in extracting a particular piece of information 
from a FASTA description and knows the input format of the file, how 
difficult is it for them to split() it on their own? What exactly are 
the advantages of a separate parser?

> I would much prefer a new "fasta-ncbi" parser in SeqIO
> that handled all the documented NCBI FASTA identifiers.
>
> I'm being negative here - but please don't let that deter you
> from posting ideas. This is a public list and we/I welcome
> constructive criticism and alternative ideas to the table.
>
> Regards,
>
> Peter


From keith.hughitt at gmail.com  Fri Oct  7 17:02:30 2011
From: keith.hughitt at gmail.com (Keith Hughitt)
Date: Fri, 7 Oct 2011 13:02:30 -0400
Subject: [Biopython-dev] Creating a NCBIFastaIterator
In-Reply-To: <CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>
References: <CAOJcpR8F_syzYF7gk9F-9dFUSVtFgXvTVo4Zh77N5cPFLG7-ow@mail.gmail.com>
	<CAKVJ-_4CxFvUUH9VOWhm2fnqcZgkygFZrq6u-MgeOx_w3xV-nw@mail.gmail.com>
	<CAOJcpR9AbCUyzhekMZr9YaLRxNT5-As9M8Qasu7en6XmqWGtvg@mail.gmail.com>
	<CAKVJ-_4aSXSq8Kqope2szZJMudH4SyiGK_jTmvAdx5ioiveYuw@mail.gmail.com>
	<4E8F1CDC.8090500@med.nyu.edu>
	<CAKVJ-_62x-X5nMXpt5cSEwA502REOVs_SUUa8NQ59-EnSNzXJA@mail.gmail.com>
	<4E8F239D.30504@med.nyu.edu>
	<CAKVJ-_7Cah-dweUk-9Vmdn0n7xa3wUL9HCtkewdLZ--Kh8+hnQ@mail.gmail.com>
Message-ID: <CAOJcpR_XMp2+fOwP6fcYtfv-ix+c8noSSJSU+rEQyC17MZi_Nw@mail.gmail.com>

It's really just meant to be a bit of "polish." Originally I was thinking
not about having a separate parser but simply extending the existing FASTA
parser to recognize common formats (e.g. NCBI) and choose better ids,
annotations, etc.

Since that would create problems in terms of backwards compatibility,
however, adding a new parser seemed like the next best option.

Part of the goal, personally, was also just to find a small but useful task
I could work on to begin to learn the code and contribute some. It shouldn't
be forced though, so I don't want to contribute something unless it's
actually an improvement.

Keith

On Fri, Oct 7, 2011 at 12:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Fri, Oct 7, 2011 at 5:06 PM, Andrew Sczesnak
> <andrew.sczesnak at med.nyu.edu> wrote:
> >>
> >> Maybe it is down to personal preference of coding style?
> >
> > I agree, there isn't much difference between specifying the callback
> > function in parse() or within the loop. To me, this points out that
> > re-implementing a FASTA parser simply for a format of description
> > line seems unnecessary.
> >
> > If a user is interesting in extracting a particular piece of information
> > from a FASTA description and knows the input format of the file, how
> > difficult is it for them to split() it on their own? What exactly are the
> > advantages of a separate parser?
>
> Not enough of an advantage for me personally to have gone
> and written it myself ;)
>
> I can see some benefits in extracting information from the
> NCBI identifier and storing them in the SeqRecord's dbxref
> list and annotation dictionary (as consistently with our other
> parsers as possible) if you are going to want to use those
> fields yourself.
>
> Perhaps Keith can explain his interest with some examples?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From b.invergo at gmail.com  Mon Oct 10 10:36:47 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 10 Oct 2011 12:36:47 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
Message-ID: <1318243007.12974.16.camel@localhost.localdomain>

Hi all,
I've received a request to implement the parsing of the main
supplementary output files of the PAML programs ('rst' files). I can't
submit a bug on Bugzilla, so I'll just announce my intention to work on
this here on the list.

One question though. The rst file for baseml includes an alignment which
is in the Phylip sequential format. I thought that it would be nice to
parse that directly into a Biopython MultipleSeqAlignment. It's my
understanding that Biopython only supports the interleaved format. Would
it be worth it for me to extend that functionality to include the
sequential format or would it be preferable to convert the alignments to
be interleaved within the parser itself?

Regards,
Brandon Invergo


From p.j.a.cock at googlemail.com  Mon Oct 10 12:21:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 13:21:52 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318243007.12974.16.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
Message-ID: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>

On Mon, Oct 10, 2011 at 11:36 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi all,
> I've received a request to implement the parsing of the main
> supplementary output files of the PAML programs ('rst' files). I can't
> submit a bug on Bugzilla, so I'll just announce my intention to work on
> this here on the list.

That's because we moved to RedMine, there should have
been a link on the old Bugzilla page, but anyway its here:
https://redmine.open-bio.org/projects/biopython

> One question though. The rst file for baseml includes an alignment which
> is in the Phylip sequential format. I thought that it would be nice to
> parse that directly into a Biopython MultipleSeqAlignment. It's my
> understanding that Biopython only supports the interleaved format. Would
> it be worth it for me to extend that functionality to include the
> sequential format or would it be preferable to convert the alignments to
> be interleaved within the parser itself?
>
> Regards,
> Brandon Invergo

If you can extend the current PHYLIP parser (strict or relaxed)
to cover interleaved and sequential, that would be nice. For
strict mode at least, we can in principle follow whatever the
original PHYLIP tools do to detect this automatically. It may
be safer to make it explicit though - from what I recall without
seeing the PHYLIP implementation's source code it was not
obvious how to do this reliably.

Peter


From b.invergo at gmail.com  Mon Oct 10 13:22:18 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 10 Oct 2011 15:22:18 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
Message-ID: <1318252938.12974.54.camel@localhost.localdomain>

Hi Peter

> That's because we moved to RedMine, there should have
> been a link on the old Bugzilla page, but anyway its here:
> https://redmine.open-bio.org/projects/biopython

Ok, I'll file an enhancement request there. I didn't see a link on the
Bugzilla page and there are still some links to Bugzilla on the wiki,
like in the "What's being worked on" section. I missed the Issue Tracker
link on the left (incidentally, I think this is a design problem of the
typical wiki layout and not Biopython-specific...I never notice the
contents of that list), so it might be advisable to include the link
under the Contribute heading of the main page.

> If you can extend the current PHYLIP parser (strict or relaxed)
> to cover interleaved and sequential, that would be nice. For
> strict mode at least, we can in principle follow whatever the
> original PHYLIP tools do to detect this automatically. It may
> be safer to make it explicit though - from what I recall without
> seeing the PHYLIP implementation's source code it was not
> obvious how to do this reliably.

Ok, I'll take a look at the PHYLIP source code to see how they do it
there.

I'll report back with problems/notable progress/questions.

Cheers,
Brandon


From redmine at redmine.open-bio.org  Mon Oct 10 13:29:47 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 10 Oct 2011 13:29:47 +0000
Subject: [Biopython-dev] [Biopython - Feature #3303] (New) Support PHYLIP
	sequential alignment format in AlignIO
Message-ID: <redmine.issue-3303.20111010132947@redmine.open-bio.org>


Issue #3303 has been reported by Brandon Invergo.

----------------------------------------
Feature #3303: Support PHYLIP sequential alignment format in AlignIO
https://redmine.open-bio.org/issues/3303

Author: Brandon Invergo
Status: New
Priority: Normal
Assignee: Brandon Invergo
Category: 
Target version: 
URL: 


Currently only PHYLIP alignments in the interleaved format can be read by AlignIO however since some programs still work on the sequential format it would be helpful to be able to support that as well.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Oct 10 13:31:13 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 10 Oct 2011 13:31:13 +0000
Subject: [Biopython-dev] [Biopython - Feature #3304] (New) Parse PAML
	supplementary (rst) output files
Message-ID: <redmine.issue-3304.20111010133113@redmine.open-bio.org>


Issue #3304 has been reported by Brandon Invergo.

----------------------------------------
Feature #3304: Parse PAML supplementary (rst) output files
https://redmine.open-bio.org/issues/3304

Author: Brandon Invergo
Status: New
Priority: Normal
Assignee: Brandon Invergo
Category: 
Target version: 
URL: 


PAML programs create several output files, the main one of which is already parsed by the Bio.Phylo.PAML modules. The primary supplementary output files ('rst' files) also contain information that is useful for some users so they should be parsed as well.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Oct 10 16:35:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 17:35:15 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318252938.12974.54.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318252938.12974.54.camel@localhost.localdomain>
Message-ID: <CAKVJ-_58zyuqwy8J9EbM1xDWHj6vXX35HokF8aC=KN_8f9WD7g@mail.gmail.com>

On Mon, Oct 10, 2011 at 2:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi Peter
>
>> That's because we moved to RedMine, there should have
>> been a link on the old Bugzilla page, but anyway its here:
>> https://redmine.open-bio.org/projects/biopython
>
> Ok, I'll file an enhancement request there. I didn't see a link on the
> Bugzilla page and there are still some links to Bugzilla on the wiki,
> like in the "What's being worked on" section.

Fixed, thanks.

> I missed the Issue Tracker
> link on the left (incidentally, I think this is a design problem of the
> typical wiki layout and not Biopython-specific...I never notice the
> contents of that list), so it might be advisable to include the link
> under the Contribute heading of the main page.

Good idea, done.

Peter


From p.j.a.cock at googlemail.com  Mon Oct 10 21:47:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 10 Oct 2011 22:47:03 +0100
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
Message-ID: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>

This was on the "SeqFeature start/end and making positions act
like ints" thread last month:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009183.html

On Mon, Sep 19, 2011 at 10:03 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> Well, slightly easier - I have some more dramatic changes to
>> the SeqFeature and FeatureLocation objects planned, but I'm
>> still playing with this.
>
> One of the key changes (which can be done without
> really changing the API) is to move the database &
> accession and the strand from the SeqFeature to the
> FeatureLocation. These are intimately connected with
> the location, as much as the start/end.
>
> This is one of the things I've been working on here:
> https://github.com/peterjc/biopython/commits/f_loc
>
> The other key change on that experimental branch
> is moving away from sub_features for join locations
> (etc). Here I was trying a new CoupoundLocation
> object, but am still wondering if this should be done
> in the SeqFeature or FeatureLocation object instead
> (or if SeqFeature should subclass FeatureLocation).
>
> Peter

That branch needs some manual merge conflict
resolution with the integer subclassing position
changes that landed on the trunk, which I've started:

https://github.com/peterjc/biopython/tree/f_loc2

Would someone like to review that please?

It moves the strand, ref and db_ref properties from
the SeqFeature object to the FeatureLocation object,
implementing read/write proxy methods for backward
compatibility.

Other than the commit which changes the __str__
method (the fine details of which I am happy to tweak
with discussion) this should be almost 100% back
compatible:

https://github.com/peterjc/biopython/commit/fed003821d0d223a7b3042ccc3bdf8442348f043

The one break I am aware of is you can't now create a
SeqFeature with an empty location and then try to set
the strand or db regs before setting the location object.
(which is what the GenBank parser was doing).

The motivation is that the strand and (optional) database
reference for which the location start/end apply are both
essential parts of the location information, and I feel never
should have been attached to the SeqFeature directly.

Furthermore, this separation is useful as a step towards
reworking the current use of the SeqFeature's sub_feature
list for multi-part locations (e.g. joins in GenBank/EMBL),
more on this later.

Thanks,

Peter


From b.invergo at gmail.com  Tue Oct 11 07:51:26 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 09:51:26 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
Message-ID: <1318319486.3137.19.camel@localhost.localdomain>

> If you can extend the current PHYLIP parser (strict or relaxed)
> to cover interleaved and sequential, that would be nice. For
> strict mode at least, we can in principle follow whatever the
> original PHYLIP tools do to detect this automatically. It may
> be safer to make it explicit though - from what I recall without
> seeing the PHYLIP implementation's source code it was not
> obvious how to do this reliably.
> 
I checked out the PHYLIP code and yes it's not really obvious how the
mode is detected. In fact, it seems that many of the programs ask for
user input to specify the format of the alignment.

So, regarding making it explicit, I'm not sure if this is what you meant
but I was thinking it might be simplest to add another Iterator/Writer
pair in the PhylipIO module for SequentialPhylip which inherit from the
basic Phylip classes, overriding the next() method in the iterator and
the write_alignment() method in the writer, much in the way that the
RelaxedPhylip classes work. 

This would mean that there would be no flexibility in the naming rules
(ie relaxed vs strict) for the SequentialPhylip format, unless I were to
also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
sequence name length restriction to 30 characters and since the whole
reason for embarking on this exercise was to support PAML's output of
PHYLIP alignments, if only one naming convention is to be implemented I
think it would be best to default to the relaxed rules.


Slightly unrelated musings: I was thinking that with Biopython's support
for reading PHYLIP alignments and Newick trees into objects, at some
point it would probably be convenient to make the Bio.Phylo.PAML package
more integrated by allowing the user to pass in such objects as
arguments rather than writing them to files first; the PAML module could
write them to temp files itself. I think some minor changes might have
to be made in places (ie for PAML to accept interleaved alignments, the
header line must contain an 'I' flag after the seq # and seq len
integers) and I'd have to think about how best to allow passing such
objects while still retaining the ability to specify filenames without
using kludgy, non-pythonic type-checking. Anyway, another task for
another day, but I thought I'd throw it out there.

Regards,
Brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 08:20:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 09:20:52 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318319486.3137.19.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
Message-ID: <CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>

On Tue, Oct 11, 2011 at 8:51 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> If you can extend the current PHYLIP parser (strict or relaxed)
>> to cover interleaved and sequential, that would be nice. For
>> strict mode at least, we can in principle follow whatever the
>> original PHYLIP tools do to detect this automatically. It may
>> be safer to make it explicit though - from what I recall without
>> seeing the PHYLIP implementation's source code it was not
>> obvious how to do this reliably.
>>
> I checked out the PHYLIP code and yes it's not really obvious how the
> mode is detected. In fact, it seems that many of the programs ask for
> user input to specify the format of the alignment.
>
> So, regarding making it explicit, I'm not sure if this is what you meant
> but I was thinking it might be simplest to add another Iterator/Writer
> pair in the PhylipIO module for SequentialPhylip which inherit from the
> basic Phylip classes, overriding the next() method in the iterator and
> the write_alignment() method in the writer, much in the way that the
> RelaxedPhylip classes work.

Something like that as a new format variant, yes.

> This would mean that there would be no flexibility in the naming rules
> (ie relaxed vs strict) for the SequentialPhylip format, unless I were to
> also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
> sequence name length restriction to 30 characters and since the whole
> reason for embarking on this exercise was to support PAML's output of
> PHYLIP alignments, if only one naming convention is to be implemented I
> think it would be best to default to the relaxed rules.

Practical.

> Slightly unrelated musings: I was thinking that with Biopython's support
> for reading PHYLIP alignments and Newick trees into objects, at some
> point it would probably be convenient to make the Bio.Phylo.PAML package
> more integrated by allowing the user to pass in such objects as
> arguments rather than writing them to files first; the PAML module could
> write them to temp files itself. I think some minor changes might have
> to be made in places (ie for PAML to accept interleaved alignments, the
> header line must contain an 'I' flag after the seq # and seq len
> integers) and I'd have to think about how best to allow passing such
> objects while still retaining the ability to specify filenames without
> using kludgy, non-pythonic type-checking. Anyway, another task for
> another day, but I thought I'd throw it out there.

Do we need to write the "I" flag in our PHYLIP output?

Peter


From b.invergo at gmail.com  Tue Oct 11 09:33:13 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 11:33:13 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
Message-ID: <1318325593.3137.51.camel@localhost.localdomain>


> Something like that as a new format variant, yes.
> 
> > ...
> 
> Practical.
> 
Ok, I'll start working on that then.


> Do we need to write the "I" flag in our PHYLIP output?

It took me a while to hunt down information on PHYLIP flags. I found
this link which mentions them:
http://www.no.embnet.org/phylipdoc/
They're only used by the program which is using the alignment as input,
corresponding to the PHYLIP programs' menu options. In general, they
have no affect on the format of the alignment (aside from the
'S'/sequential vs 'I'/interleaved flags). However, some of them might
require extra information immediately below the header line, before the
alignment starts. This complicates things. (see below for some PAML
examples)

However, since there's no real standardization to the use of the phylip
format, not all programs pay attention to these flags. In my own work,
I've used TCoffee to generate interleaved alignments and then I have to
add in the 'I' after the fact. As another example, the current Biopython
PhylipIO would not recognize a header line with options as a valid
header line, since there would be more than 2 "parts".

So, if some programs can take options flags (at least PHYLIP and PAML
programs) while other programs may not like their inclusion, they would
need to be treated specially. I would suggest that the PhylipIterator
classes be modified to recognize the existence of options, but not
necessarily do anything with them, and that the PhylipWriter classes be
modified to optionally take a string containing option flags to append
to the header line, ie 'I', 'GC', etc. 

As for the supplementary information for the options, I'm not sure if
those complicate matters beyond the scope of Biopython's intended
functionality, or whether there should be yet another optional string
argument to the writer. The PhylipIterators would then need to be
modified to handle the possible existence of these supplementary lines
as well.

Anyway, I don't think this is an immediate concern and I personally
wouldn't approach it until I start working on the idea of better
integrating the PAML module with the rest of Biopython.

-brandon

Here are some examples:
5 895 G
G 4 3
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
1231231231231231231231231231231231231
444444444444444444444444444444444444444444444444444444444444
444444444444444444444444444444444444444444444444444444444444
444444444444444444444444444444444444444444444444444444444444
444444444444444444
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
123123123123123123123123123123123123123123123123123123123123
12312312312312312312312312312312312312312312312312312312312 
Human
AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGACTTACATCCTCATTACTATT
CTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCATAATC........
Chimpanzee .........

"The first line of the file contains the option character G. The second
line begins with a G at the first column, followed by the number of site
classes. The following lines contain the site marks, one for each site
in the sequence (or each codon in the case of
codonml). The site mark specifies which class each site is from. If
there are g classes, the marks should be 1, 2, ..., g, and if g > 9, the
marks need to be separated by spaces. The total number of
marks must be equal to the total number of sites in each sequence."

********

5 1000 G
G 4 100 200 300 400 
Sequence 1
TCGATAGATAGGTTTTAGGGGGGGGGGTAAAAAAAAA.......

"This [alignment has 5 sequences of] 1000 nucleotides from 4 genes,
obtained from concatenating four genes with 100, 200, 300, and 400
nucleotides from genes 1, 2, 3, and 4, respectively. The"

********

5 855 GC 
human          GTG CTG TCT CCT ...

5 sequences, 855 nucleotides, length must be a multiple of three
********

5 300 G 
G2 40 60

sequence1
.....

"This data set has 5 sequences, each of 300 nucleotides (100 codons),
which are partitioned into two genes, with the first gene having 40
codons and the second gene 60 codons."


From p.j.a.cock at googlemail.com  Tue Oct 11 09:37:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 10:37:48 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318325593.3137.51.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>

On Tue, Oct 11, 2011 at 10:33 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> Do we need to write the "I" flag in our PHYLIP output?
>
> It took me a while to hunt down information on PHYLIP flags. I found
> this link which mentions them:
> http://www.no.embnet.org/phylipdoc/
> They're only used by the program which is using the alignment as input,
> corresponding to the PHYLIP programs' menu options. In general, they
> have no affect on the format of the alignment (aside from the
> 'S'/sequential vs 'I'/interleaved flags). However, some of them might
> require extra information immediately below the header line, before the
> alignment starts. This complicates things. (see below for some PAML
> examples)

Some of those examples don't really look like PHYLIP anymore to me.

If there is any simple change to allow the current parser to cope
with (but ignore) any extra meta data like this, that sounds sensible
(with unit tests of course - grin).

Peter


From b.invergo at gmail.com  Tue Oct 11 10:01:59 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 12:01:59 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
Message-ID: <1318327319.3137.70.camel@localhost.localdomain>


> Some of those examples don't really look like PHYLIP anymore to me.
> 
> If there is any simple change to allow the current parser to cope
> with (but ignore) any extra meta data like this, that sounds sensible
> (with unit tests of course - grin).

Agreed, it can get quite messy, though look at the link I provided; even
the PHYLIP-specific example that they give includes some supplementary
info at the top, as well as a tree at the bottom:

 4   40   W					
W         0101001111 0101110101 0101110011	
	  1101010110
dmras1    GTCGTCGTTG GACCTGGAGG CGTGGGCAAG	

spras     GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
scras1    GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
scras2    GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
	  TCCGCGCTCA
	  AGTGCTTTGA
	  TCTGCTTTAA
	  TCTGCTTTGA
1						
((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));


I agree that trying to shoehorn that functionality into Biopython as
written would be a mess. Another option that I can think of, however,
would be to shift such extra formatting duties to the Biopython
application interface which needs them, since that's the only place
they're relevant. So I could, for example, make a PAML-specific subclass
of PhylipWriter which handles all these weird PAML-specific options. Or
if there were to be a PHYLIP interface and the program took that above
example as input, it would be the duty of the interface to write a file
with those options, the alignment and the tree all together. 
Just a thought.

For the short term, though, when I implement the sequential format, I'll
go ahead and update the code to at least handle flags in the header
line. To handle the supp. info should be straight forward, since I
believe that each supp. line must begin with the option flag that
requires the info; if the option flag exists in the header, ignore any
following lines which begin with that flag character. 

Unit tests will abound.

-brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 10:13:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 11:13:03 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318327319.3137.70.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>

On Tue, Oct 11, 2011 at 11:01 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>
>> Some of those examples don't really look like PHYLIP anymore to me.
>>
>> If there is any simple change to allow the current parser to cope
>> with (but ignore) any extra meta data like this, that sounds sensible
>> (with unit tests of course - grin).
>
> Agreed, it can get quite messy, though look at the link I provided; even
> the PHYLIP-specific example that they give includes some supplementary
> info at the top, as well as a tree at the bottom:
>
> ?4 ? 40 ? W
> W ? ? ? ? 0101001111 0101110101 0101110011
> ? ? ? ? ?1101010110
> dmras1 ? ?GTCGTCGTTG GACCTGGAGG CGTGGGCAAG
>
> spras ? ? GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
> scras1 ? ?GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
> scras2 ? ?GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
> ? ? ? ? ?TCCGCGCTCA
> ? ? ? ? ?AGTGCTTTGA
> ? ? ? ? ?TCTGCTTTAA
> ? ? ? ? ?TCTGCTTTGA
> 1
> ((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));
>

I would consider that to be a meta file containing a PHYLIP
alignment and a tree, but in itself it isn't a PHYLIP alignment.

That looks like exactly the kind of issue NEXUS was designed
to solve: how to embed alignments, trees and other stuff into
a single plain text file for input into a phylogenetic tool.

Doesn't PHYLIP have an XML format these days? Trying
to parse something like that text (without a formal standard)
seems like a painful exercise and long term maintenance
headache.

Peter


From b.invergo at gmail.com  Tue Oct 11 10:37:39 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 11 Oct 2011 12:37:39 +0200
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
Message-ID: <1318329459.3137.82.camel@localhost.localdomain>

> I would consider that to be a meta file containing a PHYLIP
> alignment and a tree, but in itself it isn't a PHYLIP alignment.
> 
> That looks like exactly the kind of issue NEXUS was designed
> to solve: how to embed alignments, trees and other stuff into
> a single plain text file for input into a phylogenetic tool.
> 
> Doesn't PHYLIP have an XML format these days? Trying
> to parse something like that text (without a formal standard)
> seems like a painful exercise and long term maintenance
> headache.

I'm not suggesting that Biopython parse and store the information
because I agree that it would be an unmaintainable nightmare. To bring
myself out of the clouds a bit and back to the basics of my original
intent: if I work on better integrating the PAML module so that the user
can pass a MultipleSeqAlignment object, I will need a way to write that
alignment to a file with potentially more information than the default
PhylipWriter allows. So, just as simple as that, Bio.Phylo.PAML would
need its own alignment writer....something I'm not going to worry about
right now.

With this mentality, then yes, anything containing such option flags and
info is no longer a PHYLIP alignment but is rather an input file to some
program. As such, the existing PhylipIO module should *not* be modified
to handle this metadata. Please ignore all my other half-baked ideas.

So, current, phylip-related tasks:
- implement SequentialPhylipWriter and SequentialPhylipIterator classes
in PhylipIO


That's it, I think. I'll revisit this alignment-writing stuff at some
other point. One task at a time...

-brandon


From p.j.a.cock at googlemail.com  Tue Oct 11 11:05:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 12:05:48 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <1318329459.3137.82.camel@localhost.localdomain>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
	<1318329459.3137.82.camel@localhost.localdomain>
Message-ID: <CAKVJ-_7R4KHUFc3oOhQQ_nrorKey0GDeACAEZVv-OWn0scnuvg@mail.gmail.com>

On Tue, Oct 11, 2011 at 11:37 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> I would consider that to be a meta file containing a PHYLIP
>> alignment and a tree, but in itself it isn't a PHYLIP alignment.
>>
>> That looks like exactly the kind of issue NEXUS was designed
>> to solve: how to embed alignments, trees and other stuff into
>> a single plain text file for input into a phylogenetic tool.
>>
>> Doesn't PHYLIP have an XML format these days? Trying
>> to parse something like that text (without a formal standard)
>> seems like a painful exercise and long term maintenance
>> headache.
>
> I'm not suggesting that Biopython parse and store the information
> because I agree that it would be an unmaintainable nightmare. To bring
> myself out of the clouds a bit and back to the basics of my original
> intent: if I work on better integrating the PAML module so that the user
> can pass a MultipleSeqAlignment object, I will need a way to write that
> alignment to a file with potentially more information than the default
> PhylipWriter allows. So, just as simple as that, Bio.Phylo.PAML would
> need its own alignment writer....something I'm not going to worry about
> right now.
>
> With this mentality, then yes, anything containing such option flags and
> info is no longer a PHYLIP alignment but is rather an input file to some
> program. As such, the existing PhylipIO module should *not* be modified
> to handle this metadata. Please ignore all my other half-baked ideas.

What you could think about is having the Bio.Phylo.PAML create this
file, and call the existing PhylipIO module with the handle to write the
alignment part - and perhaps the Bio.Phylo module with the handle to
write any tree.

> So, current, phylip-related tasks:
> - implement SequentialPhylipWriter and SequentialPhylipIterator classes
> in PhylipIO
>
> That's it, I think. I'll revisit this alignment-writing stuff at some
> other point. One task at a time...
>
> -brandon

That sounds like a manageable step to start with :)

Peter


From chapmanb at 50mail.com  Tue Oct 11 11:20:31 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 11 Oct 2011 07:20:31 -0400
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
In-Reply-To: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
References: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
Message-ID: <8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>


Peter;

> https://github.com/peterjc/biopython/tree/f_loc2
> 
> It moves the strand, ref and db_ref properties from
> the SeqFeature object to the FeatureLocation object,
> implementing read/write proxy methods for backward
> compatibility.

Thanks for the integer work and for this. I'm agreed that this is a more
logical way to store the strand (and cross-ref) information. +1 from me
on checking it in,
Brad


From p.j.a.cock at googlemail.com  Tue Oct 11 11:28:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 12:28:35 +0100
Subject: [Biopython-dev] Moving strand & db ref from SeqFeature to
	FeatureLocation
In-Reply-To: <8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>
References: <CAKVJ-_5crqEb4bGz4z1StTaugtp1uhzT6FDKNJqgzfwsJFTK-w@mail.gmail.com>
	<8739ez4vwg.fsf@kunkel.i-did-not-set--mail-host-address--so-tickle-me>
Message-ID: <CAKVJ-_75KRXMX9ABK=Wk=mJA_w_PSsUUGPR3BTA9YCKy5ZGPVw@mail.gmail.com>

On Tue, Oct 11, 2011 at 12:20 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> https://github.com/peterjc/biopython/tree/f_loc2
>>
>> It moves the strand, ref and db_ref properties from
>> the SeqFeature object to the FeatureLocation object,
>> implementing read/write proxy methods for backward
>> compatibility.
>
> Thanks for the integer work and for this. I'm agreed that this is a more
> logical way to store the strand (and cross-ref) information. +1 from me
> on checking it in,
> Brad

OK, that's done. Cheers Brad.

As I said before, if anyone doesn't like the new printing
of the FeatureLocation with how I present the strand and
database reference, we can change that. There are
examples in the SeqFeature.py and SeqRecord.py docstrings.

Regards,

Peter


From eric.talevich at gmail.com  Tue Oct 11 12:55:57 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 11 Oct 2011 08:55:57 -0400
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
Message-ID: <CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>

On Tue, Oct 11, 2011 at 6:13 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

>
> That looks like exactly the kind of issue NEXUS was designed
> to solve: how to embed alignments, trees and other stuff into
> a single plain text file for input into a phylogenetic tool.
>
> Doesn't PHYLIP have an XML format these days? Trying
> to parse something like that text (without a formal standard)
> seems like a painful exercise and long term maintenance
> headache.
>
>
The Phylip programs seqboot and retree have XML formats that look almost
like SeqXML and phyloXML, but they're not quite compatible, e.g. attribute
names are slightly different.

This is probably because they were written before those standard formats
existed -- pretty sure the retree XML format, sort of described in Inferring
Phylogenies (2004) as an example of how a future XML tree format might look,
was an inspiration for phyloXML. There hasn't been much development on these
parts of the Phylip codebase lately, though. If someone wanted to write a
patch to bring these formats into compliance with the closest standards, I
bet Joe would accept the patch.

Discussion:
https://www.facebook.com/permalink.php?story_fbid=256082801069968&id=115402811804635

-E


From p.j.a.cock at googlemail.com  Tue Oct 11 13:04:20 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 11 Oct 2011 14:04:20 +0100
Subject: [Biopython-dev] Parsing PAML supplementary output
In-Reply-To: <CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>
References: <1318243007.12974.16.camel@localhost.localdomain>
	<CAKVJ-_5+nd+cxgb3UcO88KEr+H+AP7=B9eshc_UrTnDBE8VByA@mail.gmail.com>
	<1318319486.3137.19.camel@localhost.localdomain>
	<CAKVJ-_7FJEeSxR9eP8b2_ePB--Adf8uCZdXM_jK3P8ko6NHCzw@mail.gmail.com>
	<1318325593.3137.51.camel@localhost.localdomain>
	<CAKVJ-_4=xD8GrmA+azHvN4frJtDN9b44EEdh_j1e-Rb_p=7wWQ@mail.gmail.com>
	<1318327319.3137.70.camel@localhost.localdomain>
	<CAKVJ-_6mVPbYnOAui-cEpcy0SGu1cUzBj7RCWoW4pHM8GjTmng@mail.gmail.com>
	<CAMC681mD_ibzzNrjiUuMWhimK20L7ZceW-OvTkRtr9W3-RYooA@mail.gmail.com>
Message-ID: <CAKVJ-_7k1wwsrhWYO7UBdpFvVE_z2cm4qnNgE90T34_+5xtJhA@mail.gmail.com>

On Tue, Oct 11, 2011 at 1:55 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Tue, Oct 11, 2011 at 6:13 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> That looks like exactly the kind of issue NEXUS was designed
>> to solve: how to embed alignments, trees and other stuff into
>> a single plain text file for input into a phylogenetic tool.
>>
>> Doesn't PHYLIP have an XML format these days? Trying
>> to parse something like that text (without a formal standard)
>> seems like a painful exercise and long term maintenance
>> headache.
>>
>
> The Phylip programs seqboot and retree have XML formats that look almost
> like SeqXML and phyloXML, but they're not quite compatible, e.g. attribute
> names are slightly different.
>
> This is probably because they were written before those standard formats
> existed -- pretty sure the retree XML format, sort of described in Inferring
> Phylogenies (2004) as an example of how a future XML tree format might look,
> was an inspiration for phyloXML. There hasn't been much development on these
> parts of the Phylip codebase lately, though. If someone wanted to write a
> patch to bring these formats into compliance with the closest standards, I
> bet Joe would accept the patch.
>
> Discussion:
> https://www.facebook.com/permalink.php?story_fbid=256082801069968&id=115402811804635
>
> -E

Good plan - anyone here familiar with the PHYLIP code base?

Peter


From chapmanb at 50mail.com  Thu Oct 13 14:05:57 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 13 Oct 2011 10:05:57 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
Message-ID: <871uuhm1fe.fsf@fastmail.fm>


Hi all;
Biopython's setup.py currently has an interactive question/answer
session to remind users to optionally install NumPy if it's not
present. This is useful for by-hand installations, but problematic with
automated installers.

One useful feature of setuptools is the 'install_requires' attribute in
setup.py. This allows your programs to define the requirements and have
them automatically installed from PyPi. It's a great way to include
useful libraries without having to fret excessively about users
installing dependencies.

Unfortunately if you use install_requires with Biopython, and NumPy is
not installed, automated scripts will get stuck in the question/answer
dialog. To resolve this issue, I wrote a small patch that adds NumPy to
Biopython's install_requires and skips the Q/A only in cases where it is
installed via pip or easy_install:

https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b

If someone is able to review this, it would be great to get it into
Biopython for the next release.

Brad


From p.j.a.cock at googlemail.com  Thu Oct 13 14:20:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 13 Oct 2011 15:20:46 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <871uuhm1fe.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5506i1UmzScq+fikCattsYktOpsWeFkE-56L1BcNvsXA@mail.gmail.com>

On Thu, Oct 13, 2011 at 3:05 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Hi all;
> Biopython's setup.py currently has an interactive question/answer
> session to remind users to optionally install NumPy if it's not
> present. This is useful for by-hand installations, but problematic with
> automated installers.
>
> One useful feature of setuptools is the 'install_requires' attribute in
> setup.py. This allows your programs to define the requirements and have
> them automatically installed from PyPi. It's a great way to include
> useful libraries without having to fret excessively about users
> installing dependencies.
>
> Unfortunately if you use install_requires with Biopython, and NumPy is
> not installed, automated scripts will get stuck in the question/answer
> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
> Biopython's install_requires and skips the Q/A only in cases where it is
> installed via pip or easy_install:
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> If someone is able to review this, it would be great to get it into
> Biopython for the next release.
>
> Brad

I can appreciate the usefulness of this, but don't know enough
about pip and easy_install to comment on the implementation.

Anyone else?

Peter


From eric.talevich at gmail.com  Thu Oct 13 18:00:22 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Thu, 13 Oct 2011 14:00:22 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <871uuhm1fe.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
Message-ID: <CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>

On Thu, Oct 13, 2011 at 10:05 AM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Hi all;
> Biopython's setup.py currently has an interactive question/answer
> session to remind users to optionally install NumPy if it's not
> present. This is useful for by-hand installations, but problematic with
> automated installers.
>
> One useful feature of setuptools is the 'install_requires' attribute in
> setup.py. This allows your programs to define the requirements and have
> them automatically installed from PyPi. It's a great way to include
> useful libraries without having to fret excessively about users
> installing dependencies.
>
> Unfortunately if you use install_requires with Biopython, and NumPy is
> not installed, automated scripts will get stuck in the question/answer
> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
> Biopython's install_requires and skips the Q/A only in cases where it is
> installed via pip or easy_install:
>
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> If someone is able to review this, it would be great to get it into
> Biopython for the next release.
>
>
Hi Brad,

Looks cool to me, except the sys.argv parsing gets a little gritty
(understandably):

Line 115:

    if dist_dir.find("egg-dist-tmp") >= 0:

Could this be `if 'egg-dist-tmp' in dist_dir`?


Line 118:

    if sys.argv in [["-c", "develop", "--no-deps"],
                    ["-c", "egg_info"]]:

Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
valid?
If so, should that be added as a third item in the list-of-args?


-Eric


From chapmanb at 50mail.com  Fri Oct 14 10:00:37 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 14 Oct 2011 06:00:37 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
Message-ID: <87hb3b51ve.fsf@fastmail.fm>


Eric and Peter;
Thanks much for taking a look at this patch.

> Looks cool to me, except the sys.argv parsing gets a little gritty
> (understandably):

Absolutely. Unfortunately the python installation space is pretty
messy. Neither pip not easy_install gives any formal declaration so you
have to resort to these hacks to infer that they are doing the
install. Luckily I don't think any of these options are something people
would do directly from the command line.

> Line 115:
> 
>     if dist_dir.find("egg-dist-tmp") >= 0:
> 
> Could this be `if 'egg-dist-tmp' in dist_dir`?

> Line 118:
> 
>     if sys.argv in [["-c", "develop", "--no-deps"],
>                     ["-c", "egg_info"]]:
> 
> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
> valid?
> If so, should that be added as a third item in the list-of-args?

Awesome, thanks for the suggestions. I checked both of these in.

Thanks again,
Brad


From p.j.a.cock at googlemail.com  Fri Oct 14 10:53:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Oct 2011 11:53:42 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87hb3b51ve.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>

On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Awesome, thanks for the suggestions. I checked both of these in.
>

I'll test the branch today, and merge it to the trunk if it looks good
on Python 2 / 3 / Jython / PyPy.

Peter


From p.j.a.cock at googlemail.com  Fri Oct 14 10:55:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 14 Oct 2011 11:55:52 +0100
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
Message-ID: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>

On Fri, Oct 14, 2011 at 11:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Awesome, thanks for the suggestions. I checked both of these in.
>>
>
> I'll test the branch today, and merge it to the trunk if it looks good
> on Python 2 / 3 / Jython / PyPy.
>

$ jython setup.py install
/Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
...


That's with Jython 2.5.2 under Mac OS X Snow Leopard. Same with pypy 1.6,

$ pypy setup.py install
/Users/pjcock/Downloads/Software/pypy-1.6/lib-python/modified-2.7/distutils/dist.py:267:
UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
...

Can we avoid that warning?

Peter


From chapmanb at 50mail.com  Fri Oct 14 12:26:06 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 14 Oct 2011 08:26:06 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
Message-ID: <87ehyf4v4x.fsf@fastmail.fm>


Peter;
Thanks for testing this and helping with the merge

> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
>   warnings.warn(msg)
[...]
> Can we avoid that warning?

This is a warning from distutils, so you would also see this on regular
ol' Python without setuptools installed. Likewise it should go away on
jython or pypy if they have setuptools or distribute installed.

Unfortunately I don't have a way around it since this is an argument to
setup. Most modern installations should have setuptools and can take
advantage of install_requires.

If it's a problem we could use 'warnings' to ignore it.

Brad


From cmccoy at fhcrc.org  Fri Oct 14 17:11:15 2011
From: cmccoy at fhcrc.org (Connor McCoy)
Date: Fri, 14 Oct 2011 10:11:15 -0700
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
Message-ID: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>

Hi Brad, Eric, and Peter,

Sorry to jump in. Regarding the install_requires warnings:

If you're interested, you can include the distribute_setup.py file
from http://python-distribute.org/distribute_setup.py in BioPython,
and add a short conditional import:

try:
    from setuptools import setup, find_packages
except ImportError:
    import distribute_setup
    distribute_setup.use_setuptools()
    from setuptools import setup, find_packages

Which will download and install distribute if it isn't available in
the python installation; the remainder of the setup can assume
setuptools is available.  Sphinx
(https://bitbucket.org/birkenfeld/sphinx/src/f1f641602bb2/setup.py)
and some other projects use this.

Connor

On Fri, Oct 14, 2011 at 9:00 AM,
<biopython-dev-request at lists.open-bio.org> wrote:
> Send Biopython-dev mailing list submissions to
> ? ? ? ?biopython-dev at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> ? ? ? ?http://lists.open-bio.org/mailman/listinfo/biopython-dev
> or, via email, send a message with subject or body 'help' to
> ? ? ? ?biopython-dev-request at lists.open-bio.org
>
> You can reach the person managing the list at
> ? ? ? ?biopython-dev-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython-dev digest..."
>
>
> Today's Topics:
>
> ? 1. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Eric Talevich)
> ? 2. Re: NumPy dialog when Biopython installed from ? ?automated
> ? ? ?programs (Brad Chapman)
> ? 3. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Peter Cock)
> ? 4. Re: NumPy dialog when Biopython installed from automated
> ? ? ?programs (Peter Cock)
> ? 5. Re: NumPy dialog when Biopython installed from ? ?automated
> ? ? ?programs (Brad Chapman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 13 Oct 2011 14:00:22 -0400
> From: Eric Talevich <eric.talevich at gmail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Thu, Oct 13, 2011 at 10:05 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
>>
>> Hi all;
>> Biopython's setup.py currently has an interactive question/answer
>> session to remind users to optionally install NumPy if it's not
>> present. This is useful for by-hand installations, but problematic with
>> automated installers.
>>
>> One useful feature of setuptools is the 'install_requires' attribute in
>> setup.py. This allows your programs to define the requirements and have
>> them automatically installed from PyPi. It's a great way to include
>> useful libraries without having to fret excessively about users
>> installing dependencies.
>>
>> Unfortunately if you use install_requires with Biopython, and NumPy is
>> not installed, automated scripts will get stuck in the question/answer
>> dialog. To resolve this issue, I wrote a small patch that adds NumPy to
>> Biopython's install_requires and skips the Q/A only in cases where it is
>> installed via pip or easy_install:
>>
>>
>> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>>
>> If someone is able to review this, it would be great to get it into
>> Biopython for the next release.
>>
>>
> Hi Brad,
>
> Looks cool to me, except the sys.argv parsing gets a little gritty
> (understandably):
>
> Line 115:
>
> ? ?if dist_dir.find("egg-dist-tmp") >= 0:
>
> Could this be `if 'egg-dist-tmp' in dist_dir`?
>
>
> Line 118:
>
> ? ?if sys.argv in [["-c", "develop", "--no-deps"],
> ? ? ? ? ? ? ? ? ? ?["-c", "egg_info"]]:
>
> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
> valid?
> If so, should that be added as a third item in the list-of-args?
>
>
> -Eric
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 14 Oct 2011 06:00:37 -0400
> From: Brad Chapman <chapmanb at 50mail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from ? ?automated programs
> To: Eric Talevich <eric.talevich at gmail.com>
> Cc: , Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID: <87hb3b51ve.fsf at fastmail.fm>
> Content-Type: text/plain; charset=us-ascii
>
>
> Eric and Peter;
> Thanks much for taking a look at this patch.
>
>> Looks cool to me, except the sys.argv parsing gets a little gritty
>> (understandably):
>
> Absolutely. Unfortunately the python installation space is pretty
> messy. Neither pip not easy_install gives any formal declaration so you
> have to resort to these hacks to infer that they are doing the
> install. Luckily I don't think any of these options are something people
> would do directly from the command line.
>
>> Line 115:
>>
>> ? ? if dist_dir.find("egg-dist-tmp") >= 0:
>>
>> Could this be `if 'egg-dist-tmp' in dist_dir`?
>
>> Line 118:
>>
>> ? ? if sys.argv in [["-c", "develop", "--no-deps"],
>> ? ? ? ? ? ? ? ? ? ? ["-c", "egg_info"]]:
>>
>> Does pip allow rearranging arguments? Would `--no-deps -c develop` also be
>> valid?
>> If so, should that be added as a third item in the list-of-args?
>
> Awesome, thanks for the suggestions. I checked both of these in.
>
> Thanks again,
> Brad
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 14 Oct 2011 11:53:42 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Awesome, thanks for the suggestions. I checked both of these in.
>>
>
> I'll test the branch today, and merge it to the trunk if it looks good
> on Python 2 / 3 / Jython / PyPy.
>
> Peter
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 14 Oct 2011 11:55:52 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from automated programs
> To: Brad Chapman <chapmanb at 50mail.com>
> Cc: Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID:
> ? ? ? ?<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Fri, Oct 14, 2011 at 11:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> On Fri, Oct 14, 2011 at 11:00 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>>
>>> Awesome, thanks for the suggestions. I checked both of these in.
>>>
>>
>> I'll test the branch today, and merge it to the trunk if it looks good
>> on Python 2 / 3 / Jython / PyPy.
>>
>
> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
> ?warnings.warn(msg)
> running install
> running build
> running build_py
> ...
>
>
> That's with Jython 2.5.2 under Mac OS X Snow Leopard. Same with pypy 1.6,
>
> $ pypy setup.py install
> /Users/pjcock/Downloads/Software/pypy-1.6/lib-python/modified-2.7/distutils/dist.py:267:
> UserWarning: Unknown distribution option: 'install_requires'
> ?warnings.warn(msg)
> running install
> running build
> running build_py
> ...
>
> Can we avoid that warning?
>
> Peter
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 14 Oct 2011 08:26:06 -0400
> From: Brad Chapman <chapmanb at 50mail.com>
> Subject: Re: [Biopython-dev] NumPy dialog when Biopython installed
> ? ? ? ?from ? ?automated programs
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: , Biopython-Dev Mailing List <biopython-dev at biopython.org>
> Message-ID: <87ehyf4v4x.fsf at fastmail.fm>
> Content-Type: text/plain; charset=us-ascii
>
>
> Peter;
> Thanks for testing this and helping with the merge
>
>> $ jython setup.py install
>> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
>> Unknown distribution option: 'install_requires'
>> ? warnings.warn(msg)
> [...]
>> Can we avoid that warning?
>
> This is a warning from distutils, so you would also see this on regular
> ol' Python without setuptools installed. Likewise it should go away on
> jython or pypy if they have setuptools or distribute installed.
>
> Unfortunately I don't have a way around it since this is an argument to
> setup. Most modern installations should have setuptools and can take
> advantage of install_requires.
>
> If it's a problem we could use 'warnings' to ignore it.
>
> Brad
>
>
> ------------------------------
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> End of Biopython-dev Digest, Vol 105, Issue 12
> **********************************************
>


From carlcrott at gmail.com  Mon Oct 17 01:24:27 2011
From: carlcrott at gmail.com (carl crott)
Date: Sun, 16 Oct 2011 21:24:27 -0400
Subject: [Biopython-dev] fixes on the tutorials
Message-ID: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>

So the tutorials I'm running through have some bugs in them ...

would anyone like me to fix these?


tutorial 2.4.1 should be something like:

from Bio import SeqIO
handle = open("ls_orchid.fasta", "rU")
for seq_record in SeqIO.parse(handle, "fasta"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)
handle.close()


and tutorial 2.4.2:

from Bio import SeqIO
handle = open("ls_orchid.gbk", "rU")
for seq_record in SeqIO.parse(handle, "genbank"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)
handle.close()


From chapmanb at 50mail.com  Mon Oct 17 01:29:49 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 16 Oct 2011 21:29:49 -0400
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>
References: <CAChfGK0N2XcW_ht_tqwyAh9JQDmcKNJUtmaZo4tQoNjYaXw8Uw@mail.gmail.com>
Message-ID: <8739eso16a.fsf@fastmail.fm>


Connor;
Thanks for the idea on the auto-install of setuptools/distribute. I'm
open to this or sticking with the warning, whichever everyone
prefers. Traditionally the setup has tried to be lightweight so you
could install Biopython without anything else, but having distribute
installed is pretty useful so it might be nice to encourage this.

Brad

> Sorry to jump in. Regarding the install_requires warnings:
> 
> If you're interested, you can include the distribute_setup.py file
> from http://python-distribute.org/distribute_setup.py in BioPython,
> and add a short conditional import:
> 
> try:
>     from setuptools import setup, find_packages
> except ImportError:
>     import distribute_setup
>     distribute_setup.use_setuptools()
>     from setuptools import setup, find_packages
> 
> Which will download and install distribute if it isn't available in
> the python installation; the remainder of the setup can assume
> setuptools is available.  Sphinx
> (https://bitbucket.org/birkenfeld/sphinx/src/f1f641602bb2/setup.py)
> and some other projects use this.


From p.j.a.cock at googlemail.com  Mon Oct 17 07:55:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 08:55:54 +0100
Subject: [Biopython-dev] fixes on the tutorials
In-Reply-To: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
References: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
Message-ID: <CAKVJ-_4w+R-=PeUdCSXOkG7G4XxPNyP2G+rnW8bcgsumKW9PXA@mail.gmail.com>

On Mon, Oct 17, 2011 at 2:24 AM, carl crott <carlcrott at gmail.com> wrote:
> So the tutorials I'm running through have some bugs in them ...
>
> would anyone like me to fix these?
>

Hi Carl,

What's the bug?
>
> tutorial 2.4.1 should be something like:
>
> from Bio import SeqIO
> handle = open("ls_orchid.fasta", "rU")
> for seq_record in SeqIO.parse(handle, "fasta"):
> ? ?print seq_record.id
> ? ?print repr(seq_record.seq)
> ? ?print len(seq_record)
> handle.close()
>

Your example above looks fine (and the tutorial
used to say that), but the current version is shorter:

from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
    print seq_record.id
    print repr(seq_record.seq)
    print len(seq_record)

We could alternatively (now that we've dropped
Python 2.4) open the handle with a with statement.
The same applies to the GenBank example.

Perhaps you are using an old version of Biopython
(where Bio.SeqIO.parse(...) does not accept a
filename)?

Could you clarify please,

Thanks,

Peter


From p.j.a.cock at googlemail.com  Mon Oct 17 10:10:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 11:10:54 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
Message-ID: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>

Hi Joshua and everyone,

It looks like Joshua's email (below) got lost in the spam filter (possibly
due to the attachment). The core of his patch was as follows (there
were also lots of white space changes).


@@ -694,6 +714,15 @@ class FeatureLocation(object):
             for i in range(self._start, self._end):
                 yield i

+    def __eq__(self, other):
+        """Compares a FeatureLocation for equality"""
+        if not isinstance(other, FeatureLocation):
+            return False
+        if self.start() == other.start() and \
+                self.end() == other.end():
+            return True
+        return False
+


@@ -255,6 +255,26 @@ class SeqFeature(object):
             qualifiers = dict(self.qualifiers.iteritems()),
             sub_features = [f._flip(length) for f in self.sub_features[::-1]])

+    def __eq__(self, other):
+        """Compare between this SeqFeature and other.
+
+        ref, ref_db and qualifiers are not needed for comparison"""
+        if not isinstance(other, SeqFeature):
+            return False
+        if (self.id != "<uknown id>"
+              and other.id != "<uknown id>" and
+              self.id == other.id):
+            return True         # Can we trust this?
+        for x in ('location', 'type', 'strand', 'location_operator'):
+            if (getattr(self, x) and getattr(other, x) and \
+                    getattr(self, x) != getattr(other, x)):
+                return False
+        for f in self.sub_features:
+            if f not in other.sub_features:
+                return False
+        else:
+            return True
+
     def extract(self, parent_sequence):
         """Extract feature sequence from the supplied parent sequence.

Note the patch will not apply to the trunk, perhaps it is against
the current release?

First (logically), is defining __eq__ for the FeatureLocation,
and second is defining __eq__ for the SeqFeature. This
hides the fact that we need to compare position objects,
e.g. is BeforePosition(5) == ExactPosition(5)?, the answer
is yes, which I have now clarified in the docstrings:

https://github.com/biopython/biopython/commit/55feea75f7ab55eac4ef4e320567d746ce41120a

Other than the fact that I think the ref and ref_db should be
checked when comparing locations, adding location comparison
seems like a good idea. Note that with the recent changes on
the trunk, the strand, ref and ref_db now belong to the
FeatureLocation not the SeqFeature.

Extending this to cover the SeqFeature leaves the ID, type,
etc and is fiddly: Particularly the question of annotation.
These are essentially the same reasons why we don't support
SeqRecord equality.

Joshua - would you like to update your patch against the
code in github, just for the FeatureLocation __eq__ method,
to include the strand, ref and red_db properties?

Thanks,

Peter


---------- Forwarded message ----------
From:?"Joshua Ismael Haase Hern?ndez" <hahj87 at gmail.com>
To:?biopython-dev at biopython.org
Date:?Mon, 17 Oct 2011 01:06:17 -0500
Subject:?[patch] SeqFeature comparison for equality
Hi there.

I was working on a testcase for a custom program
which should extract the same features I had planned.

Since SeqFeature lacs comparison method, there is no
easy way to test

for feature in test_gene.features:
? ?self.asserIn(feature, myparser(file).features)

So I added comparison methods and they work fine.

Patch attached. My changes are under Biopython license.


From p.j.a.cock at googlemail.com  Mon Oct 17 15:03:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 16:03:42 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
Message-ID: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>

Hi Michiel,

Regarding code using Bio.File, which you asked about
deprecating last month:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html

I objected at the time because I was using it for the
TogoWS code I was working on,

On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> If the server could be relied on to always give an
>>> HTTP error code this wouldn't be needed:
>>>
>>> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
>>>
>
> ...
>
> [Some of those TogoWS checks are probably superfluous
> right now, I'm still polishing the error handling - some of
> which will rely on TogoWS itself catching more conditions]

I've updated my TogoWS to rely on the HTTP error codes,
and removed the heuristic error detection which required
Bio.File for the UndoHandle. That seems to be working fine
now.

That leaves Bio/SCOP/__init__.py as the only existing or
imminent code using Bio.File, so if we can sort that out,
we can deprecate Bio.File as you suggested.

Regards,

Peter


From anaryin at gmail.com  Mon Oct 17 15:13:37 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 17:13:37 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
Message-ID: <CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>

Hey Peter, all,

Sorry to peek in. I was going over some code lately together with Eric and
he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.

What is this deprecation about then?

Cheers,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/17 Peter Cock <p.j.a.cock at googlemail.com>

> Hi Michiel,
>
> Regarding code using Bio.File, which you asked about
> deprecating last month:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
>
> I objected at the time because I was using it for the
> TogoWS code I was working on,
>
> On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>> If the server could be relied on to always give an
> >>> HTTP error code this wouldn't be needed:
> >>>
> >>>
> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
> >>>
> >
> > ...
> >
> > [Some of those TogoWS checks are probably superfluous
> > right now, I'm still polishing the error handling - some of
> > which will rely on TogoWS itself catching more conditions]
>
> I've updated my TogoWS to rely on the HTTP error codes,
> and removed the heuristic error detection which required
> Bio.File for the UndoHandle. That seems to be working fine
> now.
>
> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From p.j.a.cock at googlemail.com  Mon Oct 17 15:44:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 16:44:35 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
Message-ID: <CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>

On Mon, Oct 17, 2011 at 4:13 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter, all,
> Sorry to peek in. I was going over some code lately together with Eric and
> he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.
> What is this deprecation about then?
> Cheers,

Hi Jo?o,

Perhaps you misunderstood Eric, Bio.File is not used widely at all.
See Michiel's email at the start of this thread:
http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html

Peter


From anaryin at gmail.com  Mon Oct 17 16:10:40 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 18:10:40 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
Message-ID: <CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>

Hi Peter,

To be honest, I didn't see much of a point to use the module but for
consistency's sake.

I grep'ed Bio.File in my biopython dir and I got a few more modules with
Bio.File, don't know if you were aware.

Bio/Application/__init__.py:from Bio import File
Bio/Blast/NCBIStandalone.py:from Bio import File
Bio/PDB/parse_pdb_header.py:from Bio import File
Bio/Phylo/_io.py:from Bio import File
Bio/SCOP/__init__.py:    from Bio import File


Just wanting to clear my doubts about this, thanks!

Cheers,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/17 Peter Cock <p.j.a.cock at googlemail.com>

> On Mon, Oct 17, 2011 at 4:13 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> > Hey Peter, all,
> > Sorry to peek in. I was going over some code lately together with Eric
> and
> > he suggested I use Bio.File as it was done in plenty of Bio.*IO modules.
> > What is this deprecation about then?
> > Cheers,
>
> Hi Jo?o,
>
> Perhaps you misunderstood Eric, Bio.File is not used widely at all.
> See Michiel's email at the start of this thread:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Oct 17 16:26:14 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 17:26:14 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
	<CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
Message-ID: <CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>

On Mon, Oct 17, 2011 at 5:10 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hi Peter,
> To be honest, I didn't see much of a point to use the module but for
> consistency's sake.

Michiel's point was [at that time] there was very little useful
code if any in Bio.File, so could we deprecate it?

> I grep'ed Bio.File in my biopython dir and I got a few more modules
> with Bio.File, don't know if you were aware.
>
> Bio/Application/__init__.py:from Bio import File
> Bio/Blast/NCBIStandalone.py:from Bio import File
> Bio/PDB/parse_pdb_header.py:from Bio import File
> Bio/Phylo/_io.py:from Bio import File
> Bio/SCOP/__init__.py: ? ?from Bio import File
>
> Just wanting to clear my doubts about this, thanks!
> Cheers,

Oh - I remember now. We recently added the as_handle
context manager to Bio.File, and that is a useful bit of
functionality of general interest.

At the time I had forgotten about Michiel's suggestion
we deprecate Bio.File, which is unfortunate, but we
can still change this before our next release.

So, should we keep Bio.File for as_handle (even if
everything else in Bio.File is to be deprecated), or
should we move the new as_handle functionality
somewhere else and deprecate all of Bio.File.

Thanks for double checking Jo?o,

Peter


From anaryin at gmail.com  Mon Oct 17 17:21:28 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Mon, 17 Oct 2011 19:21:28 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>
References: <CAKVJ-_4RxzutuBZd_iC09=RsamAYY1z6K0Ksrsy9ukgPupvhFQ@mail.gmail.com>
	<1315493349.29125.YahooMailClassic@web161216.mail.bf1.yahoo.com>
	<CAKVJ-_6xwM_Sa4OkqzHaaGia5od_Q4ansD3swPFRBp1v1LUguw@mail.gmail.com>
	<CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<CAJ9sUYNdnM1DGVCJ1WUCTfLT=Skedg=3H5YVyiSww0CbAEnbJg@mail.gmail.com>
	<CAKVJ-_6+3h16+orc3oN47OGoXg2RFNWspxZLpvUivRZy+OKALg@mail.gmail.com>
	<CAJ9sUYOPZy0Lk4PnSBkHb_maAEeBDtN+q-RUynEGxQwq4PK0eA@mail.gmail.com>
	<CAKVJ-_7Luo9wz=cGrnm9kSU_3RNPSfdw-0=quiajUMVE9qQBQw@mail.gmail.com>
Message-ID: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>

>
> At the time I had forgotten about Michiel's suggestion
>
we deprecate Bio.File, which is unfortunate, but we
> can still change this before our next release.
>
> So, should we keep Bio.File for as_handle (even if
> everything else in Bio.File is to be deprecated), or
> should we move the new as_handle functionality
> somewhere else and deprecate all of Bio.File.
>

I think it doesn't make sense to keep the module for 5 lines of code.

    if isinstance(handleish, basestring):
        with open(handleish, mode) as fp:
            yield fp
    else:
        yield handleish

I'd either place them in __init__.py or just insert them in all Bio.*IO
modules wherever needed. If we had more snippets in common with all *IOs, it
would be valuable and understandable to have a separate module, but as is
it's a bit unnecessary IMHO.


>
> Thanks for double checking Jo?o,
>

No problem.

Cheers,

Jo?o


From hahj87 at gmail.com  Mon Oct 17 17:57:53 2011
From: hahj87 at gmail.com (=?ISO-8859-1?Q?Joshua_Ismael_Haase_Hern=E1ndez?=)
Date: Mon, 17 Oct 2011 12:57:53 -0500
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
Message-ID: <CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>

El 17 de octubre de 2011 12:15, Peter Cock <p.j.a.cock at googlemail.com>escribi?:

> Hi Joshua,
>
> Could you CC the biopython-dev mailing list, unless you
> specifically want to discuss something in private?
>

Sorry about that, I thought i was answering to mailin list.

>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> > I'm on it.
> >
> > Will add __eq__ to FeatureLocation on trunk.
>
> Great.
>
> In the short term, you can just work on it directly with a copy of the
> official repository and send me a patch (use git patch > file.patch)
>
> The "best" way is to fork biopython on github, and create your
> own branch with these changes.
>
> > I think BeforeLocation should check if the second is before,
> > After check if it is after, etc, and this can be done in locations.
> >
> > Before I implement those: do you agree?
> >
> > In that case, AbstractLocation instances
> > should check if ExactLocation instances are
> > inside their range, and AbstractLocation
> > instances to be exactly the same.
>
>
This positions would be the same:

OneOfPosition(5, 11, 15),
ExactPosition(11),
AfterPosition(4),
BeforePosition(16),
WithinPosition(5, 16),


> No. Having tried this myself, it is very complicated.
>

I think I'm missing something, why is it hard?,
I see it as a cases listing.


> Also, there are constraints with the Python language
> about equality, hashing and comparisons (e.g. for
> membership in lists, or use as dictionary keys).
>

I don't think anyone should use Features as dictionary keys,
they will use Feature Id for that, but maybe someona wants a
set of features (which just now is like a list of all sequences)...

I which cases that should be a problem? (I'm biothechnology
engineer, so I don't see all caveats, and i don't really have
deep undestanding about how python works)

The current behaviour of simple comparison of
> the positions as an integer is at least simple.
>
> > About SeqFeature, I think they should be
> > the same if they share all locations.
>
> You don't care about feature type and ID?  ;)
>

maybe not, a comparison could skip iterating
the locations if we have the same type and id,
still not sure that's a good method (thus the comment
?# Can we trust this?? on my patch) but a feature
'CDS' is sometimes equivalent to feature 'mRNA',
in that case ID and type would both be different
in seqfeatures.

>
> Peter
>


From p.j.a.cock at googlemail.com  Mon Oct 17 18:07:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 19:07:27 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
Message-ID: <CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>

2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> ...
>
> This positions would be the same:
>
> OneOfPosition(5, 11, 15),
> ExactPosition(11),
> AfterPosition(4),
> BeforePosition(16),
> WithinPosition(5, 16),

I don't understand what you are asking here. Those
positions do not look the same to me.

>>
>> No. Having tried this myself, it is very complicated.
>
> I think I'm missing something, why is it hard?,
> I see it as a cases listing.

Well, try it and write lots of unit tests, and I'll review it.

>>
>> Also, there are constraints with the Python language
>> about equality, hashing and comparisons (e.g. for
>> membership in lists, or use as dictionary keys).
>
> I don't think anyone should use Features as dictionary keys,
> they will use Feature Id for that, but maybe someona wants a
> set of features (which just now is like a list of all sequences)...
>
> I which cases that should be a problem? (I'm biothechnology
> engineer, so I don't see all caveats, and i don't really have
> deep undestanding about how python works)

Using positions as dictionary keys seems reasonable.

Using a SeqFeature as a key is not possible as they
are mutable objects.

>> The current behaviour of simple comparison of
>> the positions as an integer is at least simple.
>>
>> > About SeqFeature, I think they should be
>> > the same if they share all locations.
>>
>> You don't care about feature type and ID? ?;)
>
> maybe not, a comparison could skip iterating
> the locations if we have the same type and id,
> still not sure that's a good method (thus the comment
> ?# Can we trust this?? on my patch) but a feature
> 'CDS' is sometimes equivalent to feature 'mRNA',
> in that case ID and type would both be different
> in seqfeatures.

A gene, mRNA and CDS might all have the same
position, but they are different features.

Peter


From hahj87 at gmail.com  Mon Oct 17 18:27:19 2011
From: hahj87 at gmail.com (=?ISO-8859-1?Q?Joshua_Ismael_Haase_Hern=E1ndez?=)
Date: Mon, 17 Oct 2011 13:27:19 -0500
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
Message-ID: <CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>

El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>escribi?:

> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
> > ...
> >
> > This positions would be the same:
> >
> > OneOfPosition(5, 11, 15),
> > ExactPosition(11),
> > AfterPosition(4),
> > BeforePosition(16),
> > WithinPosition(5, 16),
>
> I don't understand what you are asking here. Those
> positions do not look the same to me.
>
>
They are not *exactly* the same, but besides
AfterPosition and BeforePosition,
ExactPosition(11) is included in OneOfPosition(5, 11, 15),
ExactPosition(11) is after AfterPosition(4)
ExactPosition(11) is before BeforePosition(16)
ExactPosition(11) is included in WithinPosition(5, 16)
All positions in OneOfPosition are before BeforePosition,
after AfterPosition, within WithinPosition, and includes
ExactPosition.
Al positions in WithinPosition are after AfterPosition,
before BeforePosition.

BeforePosition and AfterPosition can't be equal.

How should I name the TestCases?


From p.j.a.cock at googlemail.com  Mon Oct 17 19:03:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 17 Oct 2011 20:03:15 +0100
Subject: [Biopython-dev] [biopython-dev] SeqFeature comparison for
	equality
In-Reply-To: <CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>
References: <CAKVJ-_5qZBLbDaFMqUQ_KegWLK2Pt_BsHU=pPmt_bGLzJGyUiQ@mail.gmail.com>
	<CA+ypG2Z=ZjZkMhuaC9A61dF=oZ2s2hhuLSCi38R2CpkWFGNogQ@mail.gmail.com>
	<CAKVJ-_7dRPfSbwPBW0t4rOjWhy1HzVJJG-Jw47ML1hR8o+GUHw@mail.gmail.com>
	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ@mail.gmail.com>
	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ@mail.gmail.com>
	<CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g@mail.gmail.com>
Message-ID: <CAKVJ-_6DRB5aN0WzuVoNF7iufFdamkOdEhZxJot=yYWR5dMZwg@mail.gmail.com>

2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>
>
> El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>
> escribi?:
>>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > ...
>> >
>> > This positions would be the same:
>> >
>> > OneOfPosition(5, 11, 15),
>> > ExactPosition(11),
>> > AfterPosition(4),
>> > BeforePosition(16),
>> > WithinPosition(5, 16),
>>
>> I don't understand what you are asking here. Those
>> positions do not look the same to me.
>>
>
> They are not *exactly* the same, but besides
> AfterPosition and BeforePosition,
> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
> ExactPosition(11) is after AfterPosition(4)
> ExactPosition(11) is before BeforePosition(16)
> ExactPosition(11) is included in WithinPosition(5, 16)
> All positions in OneOfPosition are before BeforePosition,
> after AfterPosition, within WithinPosition, and includes
> ExactPosition.
> Al positions in WithinPosition are after AfterPosition,
> before BeforePosition.
> BeforePosition and AfterPosition can't be equal.
>

It might help it you wrote these out explicitly,
e.g. currently:

    >>> from Bio.SeqFeature import *
    >>> a = BeforePosition(10)
    >>> b = AfterPosition(10)
    >>> a == b == 10
    True

Currently BeforePosition and AfterPosition act like
the integer position for comparison etc. I find this
reasonable given we have to treat them as the
integer for things like extracting the sequence.

> How should I name the TestCases?
>

Something like test_SeqFeature.py and using
unittest. Most existing tests in this area are in
doctests and test_SeqIO_feature.py

Peter


From andrea at biocomp.unibo.it  Tue Oct 18 12:59:05 2011
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Tue, 18 Oct 2011 14:59:05 +0200 (CEST)
Subject: [Biopython-dev] SeqFeature comparison for equality
In-Reply-To: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
References: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
Message-ID: <bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>

Hi,
I don't know if this can help,
but I've been subclassing seqfeature and seqrecord objects to assert
equalities.
I've attached the very simple code for the seqfeature equality
Handling complex location equalities with a given set of rules could be
misleading.
a feature starting in position 11 is different, for me, from one located
at position 12.

Andrea


> ------------------------------
>
> Message: 4
> Date: Mon, 17 Oct 2011 12:57:53 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2Y9Qo2iXr0HEwZion-6zffSQjB1qAkqGTgG2_JLP+G0jQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 12:15, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> Hi Joshua,
>>
>> Could you CC the biopython-dev mailing list, unless you
>> specifically want to discuss something in private?
>>
>
> Sorry about that, I thought i was answering to mailin list.
>
>>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > I'm on it.
>> >
>> > Will add __eq__ to FeatureLocation on trunk.
>>
>> Great.
>>
>> In the short term, you can just work on it directly with a copy of the
>> official repository and send me a patch (use git patch > file.patch)
>>
>> The "best" way is to fork biopython on github, and create your
>> own branch with these changes.
>>
>> > I think BeforeLocation should check if the second is before,
>> > After check if it is after, etc, and this can be done in locations.
>> >
>> > Before I implement those: do you agree?
>> >
>> > In that case, AbstractLocation instances
>> > should check if ExactLocation instances are
>> > inside their range, and AbstractLocation
>> > instances to be exactly the same.
>>
>>
> This positions would be the same:
>
> OneOfPosition(5, 11, 15),
> ExactPosition(11),
> AfterPosition(4),
> BeforePosition(16),
> WithinPosition(5, 16),
>
>
>> No. Having tried this myself, it is very complicated.
>>
>
> I think I'm missing something, why is it hard?,
> I see it as a cases listing.
>
>
>> Also, there are constraints with the Python language
>> about equality, hashing and comparisons (e.g. for
>> membership in lists, or use as dictionary keys).
>>
>
> I don't think anyone should use Features as dictionary keys,
> they will use Feature Id for that, but maybe someona wants a
> set of features (which just now is like a list of all sequences)...
>
> I which cases that should be a problem? (I'm biothechnology
> engineer, so I don't see all caveats, and i don't really have
> deep undestanding about how python works)
>
> The current behaviour of simple comparison of
>> the positions as an integer is at least simple.
>>
>> > About SeqFeature, I think they should be
>> > the same if they share all locations.
>>
>> You don't care about feature type and ID?  ;)
>>
>
> maybe not, a comparison could skip iterating
> the locations if we have the same type and id,
> still not sure that's a good method (thus the comment
> ?# Can we trust this?? on my patch) but a feature
> 'CDS' is sometimes equivalent to feature 'mRNA',
> in that case ID and type would both be different
> in seqfeatures.
>
>>
>> Peter
>>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 17 Oct 2011 19:07:27 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_4LY9NBQY1TerBprjVWFHS9FYpD50CZRG0XXOoZUNtdpQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> ...
>>
>> This positions would be the same:
>>
>> OneOfPosition(5, 11, 15),
>> ExactPosition(11),
>> AfterPosition(4),
>> BeforePosition(16),
>> WithinPosition(5, 16),
>
> I don't understand what you are asking here. Those
> positions do not look the same to me.
>
>>>
>>> No. Having tried this myself, it is very complicated.
>>
>> I think I'm missing something, why is it hard?,
>> I see it as a cases listing.
>
> Well, try it and write lots of unit tests, and I'll review it.
>
>>>
>>> Also, there are constraints with the Python language
>>> about equality, hashing and comparisons (e.g. for
>>> membership in lists, or use as dictionary keys).
>>
>> I don't think anyone should use Features as dictionary keys,
>> they will use Feature Id for that, but maybe someona wants a
>> set of features (which just now is like a list of all sequences)...
>>
>> I which cases that should be a problem? (I'm biothechnology
>> engineer, so I don't see all caveats, and i don't really have
>> deep undestanding about how python works)
>
> Using positions as dictionary keys seems reasonable.
>
> Using a SeqFeature as a key is not possible as they
> are mutable objects.
>
>>> The current behaviour of simple comparison of
>>> the positions as an integer is at least simple.
>>>
>>> > About SeqFeature, I think they should be
>>> > the same if they share all locations.
>>>
>>> You don't care about feature type and ID? ?;)
>>
>> maybe not, a comparison could skip iterating
>> the locations if we have the same type and id,
>> still not sure that's a good method (thus the comment
>> ?# Can we trust this?? on my patch) but a feature
>> 'CDS' is sometimes equivalent to feature 'mRNA',
>> in that case ID and type would both be different
>> in seqfeatures.
>
> A gene, mRNA and CDS might all have the same
> position, but they are different features.
>
> Peter
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 17 Oct 2011 13:27:19 -0500
> From: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Peter Cock <p.j.a.cock at googlemail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CA+ypG2a8G2+fn3HxNZ62SkKGjAJeZjEpOoVHhxufhwMYd1dQ6g at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> El 17 de octubre de 2011 13:07, Peter Cock
> <p.j.a.cock at googlemail.com>escribi?:
>
>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>> > ...
>> >
>> > This positions would be the same:
>> >
>> > OneOfPosition(5, 11, 15),
>> > ExactPosition(11),
>> > AfterPosition(4),
>> > BeforePosition(16),
>> > WithinPosition(5, 16),
>>
>> I don't understand what you are asking here. Those
>> positions do not look the same to me.
>>
>>
> They are not *exactly* the same, but besides
> AfterPosition and BeforePosition,
> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
> ExactPosition(11) is after AfterPosition(4)
> ExactPosition(11) is before BeforePosition(16)
> ExactPosition(11) is included in WithinPosition(5, 16)
> All positions in OneOfPosition are before BeforePosition,
> after AfterPosition, within WithinPosition, and includes
> ExactPosition.
> Al positions in WithinPosition are after AfterPosition,
> before BeforePosition.
>
> BeforePosition and AfterPosition can't be equal.
>
> How should I name the TestCases?
>
>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 17 Oct 2011 20:03:15 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] [biopython-dev] SeqFeature comparison for
> 	equality
> To: Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>
> Cc: biopython-dev at biopython.org
> Message-ID:
> 	<CAKVJ-_6DRB5aN0WzuVoNF7iufFdamkOdEhZxJot=yYWR5dMZwg at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>
>>
>> El 17 de octubre de 2011 13:07, Peter Cock <p.j.a.cock at googlemail.com>
>> escribi?:
>>>
>>> 2011/10/17 Joshua Ismael Haase Hern?ndez <hahj87 at gmail.com>:
>>> > ...
>>> >
>>> > This positions would be the same:
>>> >
>>> > OneOfPosition(5, 11, 15),
>>> > ExactPosition(11),
>>> > AfterPosition(4),
>>> > BeforePosition(16),
>>> > WithinPosition(5, 16),
>>>
>>> I don't understand what you are asking here. Those
>>> positions do not look the same to me.
>>>
>>
>> They are not *exactly* the same, but besides
>> AfterPosition and BeforePosition,
>> ExactPosition(11) is included in OneOfPosition(5, 11, 15),
>> ExactPosition(11) is after AfterPosition(4)
>> ExactPosition(11) is before BeforePosition(16)
>> ExactPosition(11) is included in WithinPosition(5, 16)
>> All positions in OneOfPosition are before BeforePosition,
>> after AfterPosition, within WithinPosition, and includes
>> ExactPosition.
>> Al positions in WithinPosition are after AfterPosition,
>> before BeforePosition.
>> BeforePosition and AfterPosition can't be equal.
>>
>
> It might help it you wrote these out explicitly,
> e.g. currently:
>
>     >>> from Bio.SeqFeature import *
>     >>> a = BeforePosition(10)
>     >>> b = AfterPosition(10)
>     >>> a == b == 10
>     True
>
> Currently BeforePosition and AfterPosition act like
> the integer position for comparison etc. I find this
> reasonable given we have to treat them as the
> integer for things like extracting the sequence.
>
>> How should I name the TestCases?
>>
>
> Something like test_SeqFeature.py and using
> unittest. Most existing tests in this area are in
> doctests and test_SeqIO_feature.py
>
> Peter
>
>
>
> ------------------------------
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> End of Biopython-dev Digest, Vol 105, Issue 15
> **********************************************
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqfeature_eq.py
Type: text/x-python-script
Size: 1505 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20111018/d7133fd9/attachment-0002.bin>

From p.j.a.cock at googlemail.com  Tue Oct 18 13:20:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 18 Oct 2011 14:20:34 +0100
Subject: [Biopython-dev] SeqFeature comparison for equality
In-Reply-To: <bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>
References: <mailman.4400.1318878198.3053.biopython-dev@lists.open-bio.org>
	<bd8739699efa70a9c038b92ac5a9c3c0.squirrel@lipid.biocomp.unibo.it>
Message-ID: <CAKVJ-_6o6b1L4fYB25JV-CJWXO+MiMJn-FoNGwjLJ5tP4SfAzw@mail.gmail.com>

On Tue, Oct 18, 2011 at 1:59 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
> Hi,
> I don't know if this can help,
> but I've been subclassing seqfeature and seqrecord objects to assert
> equalities.
> I've attached the very simple code for the seqfeature equality
> Handling complex location equalities with a given set of rules could be
> misleading.
> a feature starting in position 11 is different, for me, from one located
> at position 12.
>
> Andrea

That looks reasonable for basic SeqFeature comparison, although
comparing the annotations in the qualifiers dict is debatable
(as with SeqRecord object's annotation).

Given the way join locations (etc) are currently handled, it
would be important to also compare the sub-features.

I think it would be more practical to first (and perhaps only)
implement equality testing for FeatureLocation (checking
start, end, strand, ref and db_ref), then you can compare
the location of a SeqFeature easily with:

f1.location == f2.location.

Peter


From carlcrott at gmail.com  Tue Oct 18 16:18:39 2011
From: carlcrott at gmail.com (carl crott)
Date: Tue, 18 Oct 2011 12:18:39 -0400
Subject: [Biopython-dev] fixes on the tutorials
In-Reply-To: <CAKVJ-_5y8Fn6xFJn0uU_X=9Vn+oCA6t3TFNs+GDNHVXwd-jkZQ@mail.gmail.com>
References: <CAE2MQmAgUJHy8-fkoYZQ=Q=ryNa5Gt2ie-qPw4BudpxLBDJRaQ@mail.gmail.com>
	<CAKVJ-_4w+R-=PeUdCSXOkG7G4XxPNyP2G+rnW8bcgsumKW9PXA@mail.gmail.com>
	<CAE2MQmB3OuuVHkFcoQmujccK0aKqBfFy308BkjJumsP0vVCrzg@mail.gmail.com>
	<CAKVJ-_6yUFwQ0U6XYEDvGS5OxXN7TQqD1Vwd3qB7QTWTiYoKxw@mail.gmail.com>
	<CAKVJ-_5y8Fn6xFJn0uU_X=9Vn+oCA6t3TFNs+GDNHVXwd-jkZQ@mail.gmail.com>
Message-ID: <CAE2MQmDTSCnoSEKGNLBb79fL87U+rZT0rOi3c_q9BLEJqmYHNg@mail.gmail.com>

Peter and other devs,

I'm deeply interested in any kind of HMM applications ... As I'm not quite a
biologist if you guys wanted to 'sic me' on any particular bug related to
these let me know .. however as far as the GIT stuff .. that would be more
of the control for updates and merging all the code that you guys work on
separately.

toodles!

-Carl


On Tue, Oct 18, 2011 at 5:36 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Mon, Oct 17, 2011 at 2:34 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > ...
> >
> > P.S. Don't forget to CC the mailing list ;)
>
> Apologies for posting that to the wrong development mailing list
> (samtools rather than biopython), I need to be more careful with
> autocomplete.
>
> Peter
>


-- 
Carl Crott
Web Applications Engineer
www.black-glass.com
412-610-0600


From mjldehoon at yahoo.com  Wed Oct 19 02:39:53 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 18 Oct 2011 19:39:53 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
Message-ID: <1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>

Hi Peter,

> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.

In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used in the _open function, which is an internal function used in the "search" function in Bio.SCOP. The UndoHandle is used to wrap a handle returned by urllib.urlopen.

This search function returns a handle to data in HTML format. I don't think we have a parser for it. This suggests that there is no specific purpose for UndoHandle in Bio.SCOP._open.

So I would suggest to just remove the UndoHandle from Bio.SCOP._open and return the urllib.urlopen handle directly.
Any objections?

--Michiel.

--- On Mon, 10/17/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Bio.File
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Monday, October 17, 2011, 11:03 AM
> Hi Michiel,
> 
> Regarding code using Bio.File, which you asked about
> deprecating last month:
> http://lists.open-bio.org/pipermail/biopython-dev/2011-September/009144.html
> 
> I objected at the time because I was using it for the
> TogoWS code I was working on,
> 
> On Thu, Sep 8, 2011 at 4:25 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> On Wed, Sep 7, 2011 at 3:36 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >>> If the server could be relied on to always
> give an
> >>> HTTP error code this wouldn't be needed:
> >>>
> >>> https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py
> >>>
> >
> > ...
> >
> > [Some of those TogoWS checks are probably superfluous
> > right now, I'm still polishing the error handling -
> some of
> > which will rely on TogoWS itself catching more
> conditions]
> 
> I've updated my TogoWS to rely on the HTTP error codes,
> and removed the heuristic error detection which required
> Bio.File for the UndoHandle. That seems to be working fine
> now.
> 
> That leaves Bio/SCOP/__init__.py as the only existing or
> imminent code using Bio.File, so if we can sort that out,
> we can deprecate Bio.File as you suggested.
> 
> Regards,
> 
> Peter
> 


From mjldehoon at yahoo.com  Wed Oct 19 02:46:33 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 18 Oct 2011 19:46:33 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
Message-ID: <1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>

I agree that it doesn't make sense to have a separate module for this. Even if we put it in Bio/__init__.py, people are likely to forget about it, and we will end up with some modules that use this code in Bio/__init__.py and other modules that copy this code in their source code. As this code is very short, I would just copy it into the modules that use it.

Best,
--Michiel.


--- On Mon, 10/17/11, Jo?o Rodrigues <anaryin at gmail.com> wrote:
I think it doesn't make sense to keep the module for 5 lines of code.?
? ? if isinstance(handleish, basestring):

? ? ? ? with open(handleish, mode) as fp:? ? ? ? ? ? yield fp

? ? else:? ? ? ? yield handleish


I'd either place them in __init__.py or just insert them in all Bio.*IO modules wherever needed. If we had more snippets in common with all *IOs, it would be valuable and understandable to have a separate module, but as is it's a bit unnecessary IMHO.


From p.j.a.cock at googlemail.com  Wed Oct 19 08:49:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 09:49:27 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>
References: <CAKVJ-_5PAFmgG+0axCCNnqexTnU_94ay4JTWTT_gU-aEOioFjQ@mail.gmail.com>
	<1318991993.91732.YahooMailClassic@web161201.mail.bf1.yahoo.com>
Message-ID: <CAKVJ-_7ufiA5ibrqCP1WqO+3Tt_b1JeajXJCcx9V0g+xRepLFA@mail.gmail.com>

On Wed, Oct 19, 2011 at 3:39 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi Peter,
>
>> That leaves Bio/SCOP/__init__.py as the only existing or
>> imminent code using Bio.File, so if we can sort that out,
>> we can deprecate Bio.File as you suggested.
>
> In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used in the _open
> function, which is an internal function used in the "search" function
> in Bio.SCOP. The UndoHandle is used to wrap a handle returned
> by urllib.urlopen.

Should we change that to use urllib2 for better error handling,
as in Bio.Entrez's _open?

> This search function returns a handle to data in HTML format.
> I don't think we have a parser for it. This suggests that there is
> no specific purpose for UndoHandle in Bio.SCOP._open.

I wonder if that is a sign of URL rot, it would make more sense
to get plain text back. Sadly there were no unit tests for this at
all until now, and I don't yet do anything with the handle other
than confirm we get one!

https://github.com/biopython/biopython/commit/10b94a7b5611edde5fe05f95406d927e5a6a02d9

> So I would suggest to just remove the UndoHandle from
> Bio.SCOP._open and return the urllib.urlopen handle directly.
>
> Any objections?

Sounds fine.

Peter


From p.j.a.cock at googlemail.com  Wed Oct 19 08:53:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 09:53:25 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
Message-ID: <CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>

2011/10/19 Michiel de Hoon <mjldehoon at yahoo.com>
>
> I agree that it doesn't make sense to have a separate module for this.

For just the one little function, maybe not. I suspect we may want
more "File related" things like this for Python 3, what with text vs
binary handles and so on, in which case keeping Bio/File.py is
sensible.

> Even if we put it in Bio/__init__.py, people are likely to forget about
> it, and we will end up with some modules that use this code in
> Bio/__init__.py and other modules that copy this code in their
> source code. As this code is very short, I would just copy it into
> the modules that use it.

It may be short, but duplicating this function all over the place
seems like a very bad idea. I think we should just be vigilant in
making sure it is used uniformly wherever we want to accept
either a handle or a filename. Perhaps some of the historically
handle-only parsers should start using it now?

Peter


From anaryin at gmail.com  Wed Oct 19 11:46:26 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 19 Oct 2011 13:46:26 +0200
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
Message-ID: <CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>

Hey Peter,


> For just the one little function, maybe not. I suspect we may want
> more "File related" things like this for Python 3, what with text vs
> binary handles and so on, in which case keeping Bio/File.py is
> sensible.
>

What kind of "things" are we talking about here? Could they be anticipated?


>
> It may be short, but duplicating this function all over the place
> seems like a very bad idea. I think we should just be vigilant in
> making sure it is used uniformly wherever we want to accept
> either a handle or a filename. Perhaps some of the historically
> handle-only parsers should start using it now?
>

Duplicating is not a beautiful solution I must agree, but keeping a module
and adding an import statement in every parser for only 5 lines isn't
neither.

I suggest we keep Bio.File, deprecating all the other functions, and
meanwhile look at which changes we could include due to Py3.


From p.j.a.cock at googlemail.com  Wed Oct 19 12:28:03 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 19 Oct 2011 13:28:03 +0100
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>
References: <CAJ9sUYOBqHDXKyRLbqTARFADp2xmYrrhhx+h9tWKP+opMDgiDA@mail.gmail.com>
	<1318992393.44240.YahooMailClassic@web161204.mail.bf1.yahoo.com>
	<CAKVJ-_58X67yqFgy_PhcHgxwzvi4Nm_YjmgJN=C9LOnZOf9VTA@mail.gmail.com>
	<CAJ9sUYNUuUwTWNrHQogii9pwqjy3oMoAHuxXVE4ewhFFePEAEA@mail.gmail.com>
Message-ID: <CAKVJ-_4iSZAFH8=W71R8obECEsLpNFOm9KcCnwMH5e2q5jedcA@mail.gmail.com>

On Wed, Oct 19, 2011 at 12:46 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter,
>
>>
>> For just the one little function, maybe not. I suspect we may want
>> more "File related" things like this for Python 3, what with text vs
>> binary handles and so on, in which case keeping Bio/File.py is
>> sensible.
>
> What kind of "things" are we talking about here? Could they be
>?anticipated?
>

For instance, in Python 3 it might be useful for a parsing text
files efficiently to use binary mode (i.e. byte strings not unicode)
but also have universal newlines (which I think happens for
you automatically in Python 3 for text mode, i.e. unicode).

Surprisingly open(filename, "rbU") is accepted in Python 3,
but it acts like "rb", typical binary read mode.

>> It may be short, but duplicating this function all over the place
>> seems like a very bad idea. I think we should just be vigilant in
>> making sure it is used uniformly wherever we want to accept
>> either a handle or a filename. Perhaps some of the historically
>> handle-only parsers should start using it now?
>
> Duplicating is not a beautiful solution I must agree, but keeping
> a module and adding an import statement in every parser for
> only 5 lines isn't neither.
> I suggest we keep Bio.File, deprecating all the other functions, and
> meanwhile look at which changes we could include due to Py3.

Yes, that's what I am suggesting.

Peter


From mjldehoon at yahoo.com  Sat Oct 22 12:17:58 2011
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 22 Oct 2011 05:17:58 -0700 (PDT)
Subject: [Biopython-dev] Bio.File
In-Reply-To: <CAKVJ-_7ufiA5ibrqCP1WqO+3Tt_b1JeajXJCcx9V0g+xRepLFA@mail.gmail.com>
Message-ID: <1319285878.88223.YahooMailClassic@web161206.mail.bf1.yahoo.com>

OK, done.

Best,
--Michiel

--- On Wed, 10/19/11, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython-dev] Bio.File
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: biopython-dev at biopython.org
> Date: Wednesday, October 19, 2011, 4:49 AM
> On Wed, Oct 19, 2011 at 3:39 AM,
> Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
> > Hi Peter,
> >
> >> That leaves Bio/SCOP/__init__.py as the only
> existing or
> >> imminent code using Bio.File, so if we can sort
> that out,
> >> we can deprecate Bio.File as you suggested.
> >
> > In Bio/SCOP/__init__.py, Bio.File.UndoHandle is used
> in the _open
> > function, which is an internal function used in the
> "search" function
> > in Bio.SCOP. The UndoHandle is used to wrap a handle
> returned
> > by urllib.urlopen.
> 
> Should we change that to use urllib2 for better error
> handling,
> as in Bio.Entrez's _open?
> 
> > This search function returns a handle to data in HTML
> format.
> > I don't think we have a parser for it. This suggests
> that there is
> > no specific purpose for UndoHandle in Bio.SCOP._open.
> 
> I wonder if that is a sign of URL rot, it would make more
> sense
> to get plain text back. Sadly there were no unit tests for
> this at
> all until now, and I don't yet do anything with the handle
> other
> than confirm we get one!
> 
> https://github.com/biopython/biopython/commit/10b94a7b5611edde5fe05f95406d927e5a6a02d9
> 
> > So I would suggest to just remove the UndoHandle from
> > Bio.SCOP._open and return the urllib.urlopen handle
> directly.
> >
> > Any objections?
> 
> Sounds fine.
> 
> Peter
> 


From p.j.a.cock at googlemail.com  Wed Oct 26 11:11:57 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 26 Oct 2011 12:11:57 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
Message-ID: <CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>

On Wed, Oct 26, 2011 at 12:02 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey Peter,
> Thanks for the answer. How do I pass the matrix and which format should it
> be on? Is there an example I could read?
> Jo?o [...] Rodrigues
> http://nmr.chem.uu.nl/~joao

Not that I know of, but adding one to the docstrings and test_pairwise2.py
would be great. I think you use it with a score matrix as a dictionary from
Bio.SubsMat.MatrixInfo

Peter


From eric.talevich at gmail.com  Wed Oct 26 13:27:17 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 26 Oct 2011 09:27:17 -0400
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
Message-ID: <CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>

On Wed, Oct 26, 2011 at 7:11 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Oct 26, 2011 at 12:02 PM, Jo?o Rodrigues <anaryin at gmail.com>
> wrote:
> > Hey Peter,
> > Thanks for the answer. How do I pass the matrix and which format should
> it
> > be on? Is there an example I could read?
> > Jo?o [...] Rodrigues
> > http://nmr.chem.uu.nl/~joao
>
> Not that I know of, but adding one to the docstrings and test_pairwise2.py
> would be great. I think you use it with a score matrix as a dictionary from
> Bio.SubsMat.MatrixInfo
>
> Peter
>
>
Here's an example:

from Bio import pairwise2, SeqIO
from Bio.SubsMat.MatrixInfo import blosum62

# pairwise2 works with raw strings, not SeqRecords
seq1 = str(SeqIO.read("seq1.fa", "fasta"))
seq2 = str(SeqIO.read("seq2.fa", "fasta"))

results = pairwise2.align.globalds(seq1, seq2, blosum62, -10, -0.5)
# Returns a tuple: (seqA, seqB, score, begin, end)
return results[0][2]


From anaryin at gmail.com  Wed Oct 26 13:31:29 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Wed, 26 Oct 2011 15:31:29 +0200
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
Message-ID: <CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>

Hello all,

Coming back after lunch...

I managed to load a matrix using this:

from Bio import pairwise2
from Bio.SubsMat import MatrixInfo as m
#print dir(m)
matrix = m.blosum60
pairwise2.align.localdx(seqA, seqB, matrix)


Thanks a lot for the help, it was simple after all, just a bit hard to start
with..


From redmine at redmine.open-bio.org  Thu Oct 27 04:55:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 27 Oct 2011 04:55:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3308] (New) SeqIO FastaIO: Blank
	Descriptor causes Indes Out of Range
Message-ID: <redmine.issue-3308.20111027045553@redmine.open-bio.org>


Issue #3308 has been reported by Darren Cullerne.

----------------------------------------
Bug #3308: SeqIO FastaIO: Blank Descriptor causes Indes Out of Range
https://redmine.open-bio.org/issues/3308

Author: Darren Cullerne
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Entering a FASTA sequence with a blank descriptor:

">"
"ACTAGTACTAGATCAGACTACAGTACAGAGAGGACATCTATACTACGAGAGACATACTACTCAGCATACGATAC"

Causes the following error:
  File "C:\Python27\lib\site-packages\Bio\SeqIO\__init__.py", line 532, in parse
    for r in i:
  File "C:\Python27\lib\site-packages\Bio\SeqIO\FastaIO.py", line 49, in FastaIterator
    id   = descr.split()[0]
IndexError: list index out of range

Please let me know if there is any further information you require.

Thanks,


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Oct 27 14:03:42 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 27 Oct 2011 14:03:42 +0000
Subject: [Biopython-dev] [Biopython - Bug #3309] (New) GenBank Scanner
	expects sequence lines to start at position 9
Message-ID: <redmine.issue-3309.20111027140342@redmine.open-bio.org>


Issue #3309 has been reported by Liam Childs.

----------------------------------------
Bug #3309: GenBank Scanner expects sequence lines to start at position 9
https://redmine.open-bio.org/issues/3309

Author: Liam Childs
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 1.57
URL: 


Some programs (eg. Vector NTI and Lasegene) produce GenBank files where the sequences start at an index on the line other than index 9. I don't know how tightly defined the GenBank file format is, but if the indent for the start of the sequence can be variable, it seems to me there is a simple fix.

Current version (Bio/GenBank/Scanner.py:904):
line = self.line
... 15 lines
if len(line) > 9 and line[9:10]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

Simple fix 1 (variable per file):
line = self.line
idx = line.find('1') + 1
... 15 lines
if len(line) > idx and line[idx:idx + 1]!=' ':
    raise ValueError("Sequence line mal-formed, '%s'"% line)
seq_lines.append(line[idx + 1:]) #remove spaces later

The index can be obtained in any number of ways, this was the simplest I could think of off the top of my head. If sequences are allowed to start at a position other than '1', then maybe a regular expression should be used instead.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Oct 27 14:46:08 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 27 Oct 2011 15:46:08 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
Message-ID: <CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>

On Wed, Oct 26, 2011 at 2:31 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hello all,
> Coming back after lunch...
> I managed to load a matrix using this:
>
> from Bio import pairwise2
> from Bio.SubsMat import MatrixInfo as m
> #print dir(m)
> matrix = m.blosum60
> pairwise2.align.localdx(seqA, seqB, matrix)
>
> Thanks a lot for the help, it was simple after all, just a bit hard to start
> with..

Hi Jo?o,

Could you write a little documentation for the pairwise2 docstring?
Just something short based on the above example would be great
(ideally as a doctest).

Thanks,

Peter


From anaryin at gmail.com  Thu Oct 27 14:52:25 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Thu, 27 Oct 2011 16:52:25 +0200
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
Message-ID: <CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>

Sure thing. The docstring is actually pretty explicit, it's just missing the
part that you can get the matrices from SubsMat. Or at least, not that
clear. I'll go over it this weekend, maybe earlier.

Best,

Jo?o


From p.j.a.cock at googlemail.com  Fri Oct 28 16:15:36 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 28 Oct 2011 17:15:36 +0100
Subject: [Biopython-dev] Fwd: [Utilities-announce] Upcoming Release of NCBI
	EFetch version 2.0
In-Reply-To: <A9D8BF3D8A74DF4A925FB541C0F39D2A164585570F@NIHMLBX15.nih.gov>
References: <A9D8BF3D8A74DF4A925FB541C0F39D2A164585570F@NIHMLBX15.nih.gov>
Message-ID: <CAKVJ-_5yMwCAm1=NjxbK+4YBKNiunXE==M18jiC8a4Yc-X=wyg@mail.gmail.com>

Hi all,

We may need to update Bio.Entrez for EFetch v2.0 soon, although at first
glance there is nothing that will obviously cause trouble...

Peter

---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Fri, Oct 28, 2011 at 4:15 PM
Subject: [Utilities-announce] Upcoming Release of NCBI EFetch version 2.0
To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>


Upcoming Release of EFetch version 2.0


In November 2011 NCBI plans to release version 2.0 of EFetch. The
major changes and updates are as follows:

????????? EFetch now supports the following databases: biosample,
biosystems and sra

????????? EFetch now has defined default values for &retmode and
&rettype for all supported databases (please see Table 1 for all
supported values of these parameters)

????????? EFetch no longer supports &retmode=html; requests containing
&retmode=html will return data using the default &retmode value for
the specified database (&db)

????????? EFetch requests including &rettype=docsum will return XML
data equivalent to ESummary output

Details about EFetch can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch


An updated, complete listing of supported &rettype and &retmode values
can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?report=objectonly


Release notes about this and future releases can be found at
http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.Release_Notes


Please write to info at ncbi.nlm.nih.gov if you have any questions about
these changes.


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
-------------- next part --------------
_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

From redmine at redmine.open-bio.org  Fri Oct 28 23:45:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 28 Oct 2011 23:45:53 +0000
Subject: [Biopython-dev] [Biopython - Feature #3310] (New) HMMER parser
Message-ID: <redmine.issue-3310.20111028234553@redmine.open-bio.org>


Issue #3310 has been reported by J M.

----------------------------------------
Feature #3310: HMMER parser
https://redmine.open-bio.org/issues/3310

Author: J M
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


This is a parser for the output of hmmsearch from the HMMER package.  Given the output of the hmmsearch, this program can retrieve information for each of the alignments including the expected values, the starting and ending positions of each alignment, as well as insert, deletion and mismatch information for each alignment.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Sat Oct 29 02:00:07 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Sat, 29 Oct 2011 02:00:07 +0000
Subject: [Biopython-dev] [Biopython - Bug #3311] (New) GFF parser fails to
	intelligently break lines
Message-ID: <redmine.issue-3311.20111029020007@redmine.open-bio.org>


Issue #3311 has been reported by gahoo lee.

----------------------------------------
Bug #3311: GFF parser fails to intelligently break lines
https://redmine.open-bio.org/issues/3311

Author: gahoo lee
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


Move from "BioStar":http://biostar.stackexchange.com/questions/13651/gff-parsing-in-python-is-not-so-perfect

I use BCBio.GFF to parse "chr01.gff3":ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/chr01.dir/chr01.gff3 and "all.gff3":ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_6.1/all.dir/all.gff3 . But things didn't work out as I expect. Here's the code:

@from BCBio import GFF
limits = dict(gff_type = ["gene","mRNA","CDS"])
gff_handle = open('chr01.gff3')
for rec in GFF.parse(gff_handle,target_lines=1000,limit_info=limits):
    #Chromosome seq level
    for gene_feature in rec.features:
        #gene level
        for mRNA_feature in gene_feature.sub_features:
            #mRNA level
            print mRNA_feature.type
            print mRNA_feature.qualifiers['Alias']@

And I got:

@Traceback (most recent call last):
  File "R:\Untitled 1.py", line 14, in <module>
    print mRNA_feature.qualifiers['Alias']
KeyError: 'Alias'@

And the 'type' is "CDS" which is not correct. When parsing without

@target_lines=1000@

everything is ok. But parsing all.gff3 came to the same problem. Maybe all.gff3 is too huge to parse.

The problem might be due to the parser did not recognise the entry boudary correctly.


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org