From p.j.a.cock at googlemail.com  Tue Nov  1 17:21:31 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 21:21:31 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
Message-ID: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>

Dear all,

Would someone like to review the TogoWS code I have written
to access the Togo Web Service's REST API please?

http://togows.dbcls.jp/
http://togows.dbcls.jp/site/en/rest.html
http://dx.doi.org/doi:10.1093/nar/gkq386

This provides a nice simple URL based API for fetching database
entries in various formats (XML, JSON, GenBank etc - even some
individual fields from some database records, e.g. the accession
of a GenBank record), searching, and even some file format
conversion (which uses a range of tools on their server, some
in BioRuby and others in BioPerl I believe).

The code is on this branch,
https://github.com/peterjc/biopython/tree/togows

See module Bio.TogoWS and its docstrings,
https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py

Unit tests in Tests/test_TogoWS.py
https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py

I have be guided by the naming we've used in Bio.Entrez
for accessing the NCBI Entrez API.

Note that in addition to major Japanese databases, TogoWS
also proxies and caches data from Europe (e.g. UniProt) and
America (e.g. GenBank and PubMed). It was very fast when
testing from Japan this summer - not quite so speedy from
the UK though ;)

Personally I found TogoWS much easier to use for searching
and retrieving batches of records than the NCBI Entrez API
with its complicated history requirement. I expect it to be
particularly popular with Biopython uses in Japan.

Thanks in advance,

Peter

From p.j.a.cock at googlemail.com  Tue Nov  1 17:27:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 21:27:15 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
Message-ID: <CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>

On Tue, Nov 1, 2011 at 9:21 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear all,
>
> Would someone like to review the TogoWS code I have written
> to access the Togo Web Service's REST API please?
>
> ...
>
> Unit tests in Tests/test_TogoWS.py
> https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py

P.S. Some of the test are a little bit slow right now, so
we can comment some out as part of merging this to the
trunk.

Peter

From chapmanb at 50mail.com  Wed Nov  2 08:19:58 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 02 Nov 2011 08:19:58 -0400
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
	<CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
Message-ID: <8762j2iump.fsf@fastmail.fm>


Peter;

> > Would someone like to review the TogoWS code I have written
> > to access the Togo Web Service's REST API please?

This looks great and the tests are all passing for me. My only small
suggestion would be to avoid hardcoding 'http://togows.dbcls.jp'
everywhere. I'd stick this as a top level variable along with the global
caches and reference it in the code. This way if they ever get any
mirrors we could adjust on the fly.

Thanks for getting this in,
Brad

From p.j.a.cock at googlemail.com  Wed Nov  2 09:27:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 2 Nov 2011 13:27:25 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <8762j2iump.fsf@fastmail.fm>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
	<CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
	<8762j2iump.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6kjBtOELqRxGHd-+Yb2gSeGp97mXDRJ1gJFagWALGL2Q@mail.gmail.com>

On Wed, Nov 2, 2011 at 12:19 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> > Would someone like to review the TogoWS code I have written
>> > to access the Togo Web Service's REST API please?
>
> This looks great and the tests are all passing for me. My only small
> suggestion would be to avoid hardcoding 'http://togows.dbcls.jp'
> everywhere. I'd stick this as a top level variable along with the global
> caches and reference it in the code. This way if they ever get any
> mirrors we could adjust on the fly.
>
> Thanks for getting this in,
> Brad

Good point regarding the URL.

I've also realised it will need some tweaks for Python 3 (bytes
versus unicode), or at least to skip the unit tests in the short
term to avoid hiding real errors on the buildbot.

Peter

From redmine at redmine.open-bio.org  Tue Nov  8 05:17:00 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 8 Nov 2011 10:17:00 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] (New) Failing to parse
	fasta-m10 format generated by lalign36
Message-ID: <redmine.issue-3312.20111108101700@redmine.open-bio.org>


Issue #3312 has been reported by gahoo lee.

----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Nov  8 10:38:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 15:38:32 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked
	GNU Zip Format)
Message-ID: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>

Dear all,

We've talking in the past about indexing sequencing in gzipped files, e.g.
http://lists.open-bio.org/pipermail/biopython/2010-June/006546.html

That discussion concluded that random access into simple GZIP files
was not practical, but BGZF (used in BAM) was worth looking into.
I wrote some proof of principle code back then:
http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html

I have recently polished that old code up, and done some
benchmarking (using some reasonably large FASTA, Swiss,
and UniProt-XML files). Please read this blog post:
http://blastedbio.blogspot.com/

I think random access to sequences compressed with BGZF is fast
enough to be useful practically (while confirming this is not true for
large gzipped files). I've also put this idea forward on SEQanswers,
http://seqanswers.com/forums/showthread.php?t=15347

The cleaned up BGZF code is on the following branch:
https://github.com/peterjc/biopython/tree/bgzf

This adds a new module Bio.bgzf (position in namespace open to
debate) which provides read/write handles to BGZF files - trying to
follow the API used in the Python gzip library.

I then use the new BGZF reader (with its special seek/tell offsets)
from within Bio.SeqIO's index functionality. I've been doing testing
with Bio.SeqIO.index(...) only so far, but it should work fine with
Bio.SeqIO.index_db(...) as well but here the SQLite schema will
need a small update to record the compression type for each file.

Is anyone interested in testing this out?

Note that to produce a BGZF file, you can use the tool bgzip in
samtools, or Bio/bgzf.py if run directly at the command line will
compress stdin to stdout. Both approaches call zlib internally,
and the run time is practically identical.

Regards,

Peter

From p.j.a.cock at googlemail.com  Tue Nov  8 10:41:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 15:41:15 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
	(Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
Message-ID: <CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>

On Tue, Nov 8, 2011 at 3:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> That discussion concluded that random access into simple GZIP files
> was not practical, but BGZF (used in BAM) was worth looking into.
> I wrote some proof of principle code back then:
> http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html
>
> I have recently polished that old code up, and done some
> benchmarking (using some reasonably large FASTA, Swiss,
> and UniProt-XML files). Please read this blog post:
> http://blastedbio.blogspot.com/

More precise link to my BGZF post:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

Peter

From bioinformed at gmail.com  Tue Nov  8 12:40:36 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Tue, 8 Nov 2011 12:40:36 -0500
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
Message-ID: <CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>

I've added a proper LRU uncompressed block cache to the samtools tabix
code, if that would be of any help.  It greatly improves performance for
many access patterns.  (I didn't look to see if you'd already done that in
your code.)

-Kevin

From p.j.a.cock at googlemail.com  Tue Nov  8 12:52:59 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 17:52:59 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
Message-ID: <CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>

On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
> I've added a proper LRU uncompressed block cache to the samtools tabix code,
> if that would be of any help. ?It greatly improves performance for many
> access patterns. ?(I didn't look to see if you'd already done that in your
> code.)
> -Kevin

Hi Kevin,

Is this already in the mainline samtools tabix repository?

The current implementation in my Python code just caches the
current block - but a simple pool had occurred to me. How many
blocks (given each is 64kb) and how best to pick that number
isn't obvious to me. Perhaps you can suggest some sensible
defaults?

In fact, a proper LRU cache would make sense for the handle
pool in Bio.SeqIO.index_db(...) as well.

Regards,

Peter


From bioinformed at gmail.com  Tue Nov  8 13:11:56 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Tue, 8 Nov 2011 13:11:56 -0500
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
Message-ID: <CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>

On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
> > I've added a proper LRU uncompressed block cache to the samtools tabix
> code,
> > if that would be of any help.  It greatly improves performance for many
> > access patterns.  (I didn't look to see if you'd already done that in
> your
> > code.)
> > -Kevin
>
> Hi Kevin,
>
> Is this already in the mainline samtools tabix repository?
>
> The current implementation in my Python code just caches the
> current block - but a simple pool had occurred to me. How many
> blocks (given each is 64kb) and how best to pick that number
> isn't obvious to me. Perhaps you can suggest some sensible
> defaults?
>
> In fact, a proper LRU cache would make sense for the handle
> pool in Bio.SeqIO.index_db(...) as well.
>
>
Hi Peter,

There is a random-eviction cache implemented in the mainline that is okay,
but it is turned off by default and, if enabled, can be very inefficient if
it keeps evicting your most active blocks.  Converting the cache it to LRU
was very simple and I've been using it locally for some time now, but I
haven't had time to send the changes on to Heng Li.

I choose the size of the cache based on the application and access
patterns.  For roughly sequential sequence queries (a la samtools faidx or
Pysam Fastafile), all one needs is a handful of active blocks (say 16).
 When repeated querying tabix files via pysam, I typically use 128 blocks
for the best trade-off between memory and performance.  Choosing a cache
size for BAM files is much more complicated and I have a wide-range of
setting depending on how many parallel BAM streams and access patterns are
employed.

The cache size numbers needed to be quite a bit larger before switching to
LRU (which was a bit surprising).  However, using even a small cache is
vastly beneficial for many access patterns.   The cost of re-reading a
block from disk can be mitigated by the OS filesystem cache, but the
decompression step takes non-trivial CPU time and can be triggered dozens
of hundreds of times per block for some sensible-seeming access patterns.

-Kevin

From p.j.a.cock at googlemail.com  Tue Nov  8 13:28:04 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 18:28:04 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
	<CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
Message-ID: <CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>

On Tue, Nov 8, 2011 at 6:11 PM, Kevin Jacobs wrote:
> On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote:
>> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
>> > I've added a proper LRU uncompressed block cache to the
>> > samtools tabix code, if that would be of any help. It greatl
>> > improves performance for many access patterns.
>> >?(I didn't look to see if you'd already done that in your
>> > code.)
>> > -Kevin
>>
>> Hi Kevin,
>>
>> Is this already in the mainline samtools tabix repository?
>>
>> The current implementation in my Python code just caches the
>> current block - but a simple pool had occurred to me. How many
>> blocks (given each is 64kb) and how best to pick that number
>> isn't obvious to me. Perhaps you can suggest some sensible
>> defaults?
>>
>> In fact, a proper LRU cache would make sense for the handle
>> pool in Bio.SeqIO.index_db(...) as well.
>>
>
> Hi Peter,
>
> There is a random-eviction cache implemented in the mainline that is okay,
> but it is turned off by default and, if enabled, can be very inefficient if
> it keeps evicting your most active blocks. ?Converting the cache it to LRU
> was very simple and I've been using it locally for some time now, but I
> haven't had time to send the changes on to Heng Li.

Are your changes on github or somewhere public? Heng Li has the
core samtools bit of the samtools SVN on github, which he seems
to use for experimental new code: https://github.com/lh3/samtools

> I choose the size of the cache based on the application and access patterns.
> ?For roughly sequential sequence queries (a la samtools faidx or Pysam
> Fastafile), all one needs is a handful of active blocks (say 16). ?When
> repeated querying tabix files via pysam, I typically use 128 blocks for the
> best trade-off between memory and performance. ?Choosing a cache size for
> BAM files is much more complicated and I have a wide-range of setting
> depending on how many parallel BAM streams and access patterns are employed.
> The cache size numbers needed to be quite a bit larger before switching to
> LRU (which was a bit surprising). ?However, using even a small cache is
> vastly beneficial for many access patterns. ? The cost of re-reading a block
> from disk can be mitigated by the OS filesystem cache, but the decompression
> step takes non-trivial CPU time and can be triggered dozens of hundreds of
> times per block for some sensible-seeming access patterns.
> -Kevin

Certainly useful food for thought - thank you. I agree that the OS
will probably cache commonly used BGZF blocks in the filesystem
cache, but it doesn't solve the CPU overhead of decompression.

In the case of Bio.SeqIO.index(...) which accesses one file, and
Bio.SeqIO.index_db(...) which may access several files, we currently
don't offer any end user options like this. However, there is an internal
option for the max number of handles, and a similar option could
control the number of BGZF blocks to cache. I could try 100
blocks (100 times 64kb is about 6MB) as the default, and redo
the UniProt timings (random access to sequences).

That might be a good compromise, given the SeqIO indexing code
has no easy way to know the calling code's usage patterns.

As I said on the blog post, we should be able to improve the
speed of the BGZF random access - this idea alone could
make a big difference, although probably a naive block cache
(rather than LRU) would be a worthwhile step in itself.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Nov  9 14:53:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 9 Nov 2011 19:53:52 +0000
Subject: [Biopython-dev] Fwd: Bug in DSSP.py
In-Reply-To: <CAD2GCOi6t7TTVj=XNojivendA-AWNYCXgTZmJCh45BocMtjA8A@mail.gmail.com>
References: <CAD2GCOi6t7TTVj=XNojivendA-AWNYCXgTZmJCh45BocMtjA8A@mail.gmail.com>
Message-ID: <CAKVJ-_7u-YLA3QQcobBmMTXRg0tdKazcTay6wiVAuVN2AxiHwA@mail.gmail.com>

FYI, hopefully someone uses DSSP.

---------- Forwarded message ----------
From: Austin Meyer
Date: Tuesday, November 8, 2011
Subject: Bug in DSSP.py
To: biopython-owner at lists.open-bio.org


Ahoy,

I have no idea how to contribute code so I thought I would pass this along.

The newest DSSP adds a citation section for the first two lines, and a
blank third line in it's output file.  The parser reads each line one at a
time, splits it, then looks at the second element of the resulting list.
As the blank line has only one element, there is an index out of range
failure that occurs. This error does not happen with the older DSSP
version.  A quick fix checks the length of the list prior to looking at
it's elements.  Thus at line 121 in the DSSP.py file, just after the sl =
l.split(), this will fix the problem:

*if len(sl) < 2:
>   continue*
>

The whole function will look like so:

*def make_dssp_dict(filename):
>     """
>     Return a DSSP dictionary that maps (chainid, resid) to
>     aa, ss and accessibility, from a DSSP file.
>
>     @param filename: the DSSP output file
>     @type filename: string
>     """
>     dssp = {}
>     handle = open(filename, "r")
>     try:
>         start = 0
>         keys = []
>         for l in handle.readlines():
>             sl = l.split()
>             if len(sl) < 2:
>                 continue
>             if sl[1] == "RESIDUE":
>                 # Start parsing from here
>                 start = 1
>                 continue
>             if not start:
>                 continue
>             if l[9] == " ":
>                 # Skip -- missing residue
>                 continue
>             resseq = int(l[5:10])
>             icode = l[10]
>             chainid = l[11]
>             aa = l[13]
>             ss = l[16]
>             if ss == " ":
>                 ss = "-"
>             try:
>                 acc = int(l[34:38])
>                 phi = float(l[103:109])
>                 psi = float(l[109:115])
>             except ValueError, exc:
>                 # DSSP output breaks its own format when there are >9999
>                 # residues, since only 4 digits are allocated to the seq
> num
>                 # field.  See 3kic chain T res 321, 1vsy chain T res 6077.
>                 # Here, look for whitespace to figure out the number of
> extra
>                 # digits, and shift parsing the rest of the line by that
> amount.
>                 if l[34] != ' ':
>                     shift = l[34:].find(' ')
>                     acc = int((l[34+shift:38+shift]))
>                     phi = float(l[103+shift:109+shift])
>                     psi = float(l[109+shift:115+shift])
>                 else:
>                     raise ValueError, exc
>             res_id = (" ", resseq, icode)
>             dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi)
>             keys.append((chainid, res_id))
>     finally:
>         handle.close()
>     return dssp, keys
> *


Thanks,

--
Austin Meyer

From p.j.a.cock at googlemail.com  Wed Nov  9 19:01:19 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Nov 2011 00:01:19 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
	<CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
	<CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>
Message-ID: <CAKVJ-_4P6Ta-DANrZmBTh0aFQUhq=erQEn05FLwVV47tXj1==A@mail.gmail.com>

On Tue, Nov 8, 2011 at 6:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I choose the size of the cache based on the application and access patterns.
>> ?For roughly sequential sequence queries (a la samtools faidx or Pysam
>> Fastafile), all one needs is a handful of active blocks (say 16). ?When
>> repeated querying tabix files via pysam, I typically use 128 blocks for the
>> best trade-off between memory and performance. ?Choosing a cache size for
>> BAM files is much more complicated and I have a wide-range of setting
>> depending on how many parallel BAM streams and access patterns are employed.
>> The cache size numbers needed to be quite a bit larger before switching to
>> LRU (which was a bit surprising). ?However, using even a small cache is
>> vastly beneficial for many access patterns. ? The cost of re-reading a block
>> from disk can be mitigated by the OS filesystem cache, but the decompression
>> step takes non-trivial CPU time and can be triggered dozens of hundreds of
>> times per block for some sensible-seeming access patterns.
>> -Kevin
>
> Certainly useful food for thought - thank you. I agree that the OS
> will probably cache commonly used BGZF blocks in the filesystem
> cache, but it doesn't solve the CPU overhead of decompression.
>
> In the case of Bio.SeqIO.index(...) which accesses one file, and
> Bio.SeqIO.index_db(...) which may access several files, we currently
> don't offer any end user options like this. However, there is an internal
> option for the max number of handles, and a similar option could
> control the number of BGZF blocks to cache. I could try 100
> blocks (100 times 64kb is about 6MB) as the default, and redo
> the UniProt timings (random access to sequences).
>
> That might be a good compromise, given the SeqIO indexing code
> has no easy way to know the calling code's usage patterns.

I've tried a cache of up to 100 BGZF blocks which are cleared
"randomly" and it doesn't make a noticeable difference to my
UniProt benchmark, which is a shame but not actually very
surprising. After all, that is deliberately accessing the records
(and thus the blocks) in a random order, and the files contain
far far more than 100 blocks.

I'll need a more realistic test case to properly evaluate the cache.

One example that comes to mind is iterating over BAM reads
(which would look at blocks sequentially) but also jumping to
look at the partner reads (paired end etc) and then back again.

Peter

P.S. When I said "random", what I'm actually using is a Python
dictionary keyed on the start offset, and the dictionary's itempop
method to remove a cached block "at random" once I have got
100 blocks in memory and need to free one. Of course, this isn't
really random, it is arbitrary and likely Python implementation
dependent.


From redmine at redmine.open-bio.org  Thu Nov 10 05:10:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 10:10:06 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14731.20111110101006@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

Thank you - I can reproduce this on the latest Biopython in our repository.

May we include your sample file in Biopython as a unit test please?
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 06:10:23 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 11:10:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14732.20111110111023@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.


Sure. My pleasure.
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 06:34:39 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 11:34:39 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14733.20111110113439@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
</pre>

At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
</pre>

Curious. It seems LALIGN is starting to write out another alignment, but then doesn't.

It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines.

I have updated Biopython to give a more helpful error message in this case:
https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25

<pre>
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
...     print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
</pre>

Are you on Bill Pearson's FASTA mailing list? We should report this.

Peter

----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 08:13:55 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 13:13:55 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14734.20111110131355@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.

File 3Seqs.zip added

Well, I'm not on the FASTA mailing list. In fact I found a small bug in mshowalign2.c which a colon is missing on line 616, just don't know how to join the mailing list.
Here's the FASTA output with 3 sequences alignment, I hope these file would help. The odd lines changed in the output.
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 09:33:28 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 14:33:28 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14735.20111110143328@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


The link has changed slightly, but the mailing list is here:

https://lists.virginia.edu/sympa/info/fasta_list
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 20:42:59 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 11 Nov 2011 01:42:59 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14736.20111111014259@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.


Oh, I got it.
Did you report this problem to FASTA mailing list?
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Nov 16 11:27:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 16:27:46 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
Message-ID: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>

Hi all,

Something I've been working on this month in discussion with Leighton
is some enhancements to GenomeDiagram, driven partly by a figure
I wanted to draw for a paper. The code is here,
https://github.com/peterjc/biopython/tree/gd-links

First, we can now show links between tracks joining any two features
or regions. One use of this is to mimic the output from the Artemis
Comparison Tool, ACT, http://www.sanger.ac.uk/resources/software/act/
ACT is great as an exploratory tool, but doesn't let you output a high
quality vector image.

Related to this, it is useful to be able to "crop" different tracks, since
for ACT style comparisons the different sequences are unlikely to
be the same length. Therefore each GenomeDiagram track can now
have its own start/end positions outside which is doesn't get drawn.

This includes some extra unit tests, run test_GenomeDiagram.py
and have a look at Graphics/GD_by_obj_*.pdf

Also try the file Doc/example/ACT_example.py which mimics
a simple two-reference ACT diagram:
https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py

Simple linear output (split into three fragments) shown here:
http://twitter.com/#!/pjacock/status/136509137826754560

Circular version here (in this case deliberately not using a
closed circle, but that works too), note the curving links are
intentional so as to display very large cross-links nicely:
http://twitter.com/#!/pjacock/status/136840628502933505

This demo script should use blue flipped links where the matches
are to the reverse strand. I haven't put together a nice example
for a proper demonstration of that yet. Perhaps a set of several
E. coli genomes would work nicely...

I plan to merge this to the trunk, and write some end-use
documentation, but would be happy to have someone else
look over the code first.

Note that the API is intended to be quite low level but very
flexible in terms of creating the cross links. You can use
transparency (as in the current version of ACT_example.py)
or explicitly colour links according to say BLAST bit score.
The user also has full control of the z-order, which again
allows you to do things like ACT does and put longer
matches at the back with short matches at the front, etc.

Peter

From chapmanb at 50mail.com  Thu Nov 17 06:51:11 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 06:51:11 -0500
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
Message-ID: <87hb23ezm8.fsf@fastmail.fm>


Peter;

> Something I've been working on this month in discussion with Leighton
> is some enhancements to GenomeDiagram, driven partly by a figure
> I wanted to draw for a paper. The code is here,
> https://github.com/peterjc/biopython/tree/gd-links

Awesome. The direction you are pushing this is great. I'd definitely
love to see this in the next release.

> Also try the file Doc/example/ACT_example.py which mimics
> a simple two-reference ACT diagram:
> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
> 
> Simple linear output (split into three fragments) shown here:
> http://twitter.com/#!/pjacock/status/136509137826754560

Really nice. My only suggestion would be to combine the examples and
outputs together in the Cookbook. One of the best ways to learn plotting
and drawing packages is by looking through examples, finding one that
most closely matches what you want, and then iterating until you get at
what you need.

Brad

From chapmanb at 50mail.com  Thu Nov 17 07:00:01 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 07:00:01 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
Message-ID: <87d3crez7i.fsf@fastmail.fm>


Peter and Eric;
I wanted to follow up about the patch to automate Biopython installs
from easy_install and pip when NumPu is not present:

https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b

You'd both reviewed it, and the only holdup was a warning message when
setuptools is not installed:

> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
>   warnings.warn(msg)

We'd discussed some other options like including setuptools and
installing it, ignoring the warning, or ignoring it since it is not
problematic.

My lazy side says ignoring it is fine, but if you want to explicitly
turn it off we can use this around the setup call:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

Happy to handle it however you prefer but I'd love to get this in,
Brad

From p.j.a.cock at googlemail.com  Thu Nov 17 07:24:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 12:24:42 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <87hb23ezm8.fsf@fastmail.fm>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
	<87hb23ezm8.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>

On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> Something I've been working on this month in discussion with Leighton
>> is some enhancements to GenomeDiagram, driven partly by a figure
>> I wanted to draw for a paper. The code is here,
>> https://github.com/peterjc/biopython/tree/gd-links
>
> Awesome. The direction you are pushing this is great. I'd definitely
> love to see this in the next release.

Cool. It will end up being a graphics heavy release at this rate :)

>> Also try the file Doc/example/ACT_example.py which mimics
>> a simple two-reference ACT diagram:
>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
>>
>> Simple linear output (split into three fragments) shown here:
>> http://twitter.com/#!/pjacock/status/136509137826754560
>
> Really nice. My only suggestion would be to combine the examples and
> outputs together in the Cookbook. One of the best ways to learn plotting
> and drawing packages is by looking through examples, finding one that
> most closely matches what you want, and then iterating until you get at
> what you need.

Unless I can find a nicer small sample dataset (or make one) which
includes an inversion, I plan to use that ACT sample data in the
tutorial - basically taking the user though the ACT_example.py
script.

Peter

From p.j.a.cock at googlemail.com  Thu Nov 17 07:45:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 12:45:54 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87d3crez7i.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>

On Thu, Nov 17, 2011 at 12:00 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter and Eric;
> I wanted to follow up about the patch to automate Biopython installs
> from easy_install and pip when NumPu is not present:
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> You'd both reviewed it, and the only holdup was a warning message when
> setuptools is not installed:
>
>> $ jython setup.py install
>> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
>> Unknown distribution option: 'install_requires'
>> ? warnings.warn(msg)
>
> We'd discussed some other options like including setuptools and
> installing it, ignoring the warning, or ignoring it since it is not
> problematic.
>
> My lazy side says ignoring it is fine, but if you want to explicitly
> turn it off we can use this around the setup call:
>
> with warnings.catch_warnings():
> ? ?warnings.simplefilter("ignore")
>
> Happy to handle it however you prefer but I'd love to get this in,
> Brad

How about this to avoid the warning by not passing the argument?
https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e

Note I rebased to the current master.

If you and Eric are happy with that, I guess we can check it in
and see how the build slaves like it...

Peter


From chapmanb at 50mail.com  Thu Nov 17 08:56:41 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 08:56:41 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
Message-ID: <87aa7ug8di.fsf@fastmail.fm>


Peter;

> > I wanted to follow up about the patch to automate Biopython installs
> > from easy_install and pip when NumPu is not present:
[...]
> How about this to avoid the warning by not passing the argument?
> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e

That works great, thanks for looking at this. Having this in the next
release will be a big help for scripts using install_requires.

Brad

From redmine at redmine.open-bio.org  Thu Nov 17 09:10:30 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 17 Nov 2011 14:10:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14741.20111117141030@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


Missing alignments reported here:
https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00001.html

Missing colon reported here:
https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00004.html
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Nov 17 09:13:11 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 14:13:11 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87aa7ug8di.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> > I wanted to follow up about the patch to automate Biopython installs
>> > from easy_install and pip when NumPu is not present:
> [...]
>> How about this to avoid the warning by not passing the argument?
>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e
>
> That works great, thanks for looking at this. Having this in the next
> release will be a big help for scripts using install_requires.
>
> Brad

OK, I'll put that on the trunk then - thanks Brad.

Peter

From p.j.a.cock at googlemail.com  Thu Nov 17 10:10:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 15:10:34 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
	<87hb23ezm8.fsf@fastmail.fm>
	<CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>
Message-ID: <CAKVJ-_7TrOP0FsOtRDWTA1r5v6TKkGZCznOfSFcxs4jYcECHLA@mail.gmail.com>

On Thu, Nov 17, 2011 at 12:24 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>>> Something I've been working on this month in discussion with Leighton
>>> is some enhancements to GenomeDiagram, driven partly by a figure
>>> I wanted to draw for a paper. The code is here,
>>> https://github.com/peterjc/biopython/tree/gd-links
>>
>> Awesome. The direction you are pushing this is great. I'd definitely
>> love to see this in the next release.
>
> Cool. It will end up being a graphics heavy release at this rate :)
>

Committed to trunk,
https://github.com/biopython/biopython/commit/980791237330923706e4dc4901bb6794d3222d0e

>>> Also try the file Doc/example/ACT_example.py which mimics
>>> a simple two-reference ACT diagram:
>>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
>>>
>>> Simple linear output (split into three fragments) shown here:
>>> http://twitter.com/#!/pjacock/status/136509137826754560
>>
>> Really nice. My only suggestion would be to combine the examples and
>> outputs together in the Cookbook. One of the best ways to learn plotting
>> and drawing packages is by looking through examples, finding one that
>> most closely matches what you want, and then iterating until you get at
>> what you need.
>
> Unless I can find a nicer small sample dataset (or make one) which
> includes an inversion, I plan to use that ACT sample data in the
> tutorial - basically taking the user though the ACT_example.py
> script.

I plan to do another OBF blog entry on this as well, probably
with the same example.

Peter

From p.j.a.cock at googlemail.com  Thu Nov 17 10:12:55 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 15:12:55 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
Message-ID: <CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>

On Thu, Nov 17, 2011 at 2:13 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>>> > I wanted to follow up about the patch to automate Biopython installs
>>> > from easy_install and pip when NumPu is not present:
>> [...]
>>> How about this to avoid the warning by not passing the argument?
>>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e
>>
>> That works great, thanks for looking at this. Having this in the next
>> release will be a big help for scripts using install_requires.
>>
>> Brad
>
> OK, I'll put that on the trunk then - thanks Brad.
>
> Peter

That all looks fine with the buildslaves, but the real
testing will be with random end user machines.

Brad, could you write a snippet for the NEWS file about
this? Basically when using setuptools to install Biopython
it will list NumPy as a dependency (except on Jython
and PyPy) and thus install it if not present already?

Peter

From chapmanb at 50mail.com  Thu Nov 17 10:51:01 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 10:51:01 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
Message-ID: <877h2yg32y.fsf@fastmail.fm>


Peter;

> That all looks fine with the buildslaves, but the real
> testing will be with random end user machines.
> 
> Brad, could you write a snippet for the NEWS file about
> this? Basically when using setuptools to install Biopython
> it will list NumPy as a dependency (except on Jython
> and PyPy) and thus install it if not present already?

Great, glad that is working without any problems. I added a bit to the
news about the functionality and usage. Thanks again for the help,
Brad

From anaryin at gmail.com  Thu Nov 17 18:16:56 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 18 Nov 2011 00:16:56 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
	<CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
Message-ID: <CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>

Hey all,

My laptop decided to die on me the last week...

I added a very simple and small example to the docstring, in line with all
the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can
cherry-pick it?

Best,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/27 Jo?o Rodrigues <anaryin at gmail.com>

> Sure thing. The docstring is actually pretty explicit, it's just missing
> the part that you can get the matrices from SubsMat. Or at least, not that
> clear. I'll go over it this weekend, maybe earlier.
>
> Best,
>
> Jo?o
>


From p.j.a.cock at googlemail.com  Fri Nov 18 05:37:23 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 18 Nov 2011 10:37:23 +0000
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
	<CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
	<CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>
Message-ID: <CAKVJ-_66iKoBqOKucrrzrTwkBwsNvoeQUdzEguEM9UH2smSc_A@mail.gmail.com>

On Thu, Nov 17, 2011 at 11:16 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey all,
> My laptop decided to die on me the last week...
> I added a very simple and small example to the docstring, in line with all
> the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can
> cherry-pick it?
> Best,
> Jo?o [...] Rodrigues
> http://nmr.chem.uu.nl/~joao

Cherry-picked, and updated the existing examples to make them into
functional doctests, and call them from the test suite.

Thanks.

Peter


From redmine at redmine.open-bio.org  Mon Nov 21 09:35:37 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 21 Nov 2011 14:35:37 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14742.20111121143537@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

We'd got most things working or skipped gracefully under PyPy 1.6 and we're in almost the same situation for PyPy 1.7

I just fixed a break under PyPy 1.7 where we assumed set order,
https://github.com/biopython/biopython/commit/d6a3fce2d03d6e613600abec4d837c8c7b929f6f

>From test_Entrez.py under PyPy 1.6 we hit https://bugs.pypy.org/issue914 which is fixed in PyPy 1.7 but I'm now hitting https://bugs.pypy.org/issue933 instead.

Note that "import numpy" has been replaced with "import numpypy" in PyPy 1.7, so if we may decide not to support PyPy 1.6 that hassle goes away.

Still issues with test_Pathway.py, test_Restriction.py (and also test_CAPS.py) and a whole load of "Too many open files" - probably due to leaking handles and different garbage collection.
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Nov 21 09:37:54 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 21 Nov 2011 14:37:54 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14743.20111121143754@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.


Point of clarification - the code on the Biopython trunk is deliberately skipping all our C extensions under PyPy (and Jython). We may want to start gradually enabling those if possible - but getting the pure Python code all working first seems like a sensible strategy.
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 22 06:30:58 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 22 Nov 2011 11:30:58 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14744.20111122113058@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.


I have deprecated Bio.Pathway.Rep.HashSet and switched Bio.Pathway.Rep.Graph to use Python's built in set instead. This means test_Pathway.py now passes under PyPy 1.6 and 1.7,

https://github.com/biopython/biopython/commit/cbc7c875448a9a57a4cdcbecbc01bcf6b115da69
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Nov 22 07:22:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 22 Nov 2011 12:22:21 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <877h2yg32y.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>

On Thu, Nov 17, 2011 at 3:51 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Great, glad that is working without any problems. I added a bit to the
> news about the functionality and usage. Thanks again for the help,
> Brad
>

I've noticed a probably regression on my Mac,

$ python setup.py install
running install
running build
running build_py
running build_ext
running install_lib
running install_egg_info
running egg_info
writing biopython.egg-info/PKG-INFO
writing top-level names to biopython.egg-info/top_level.txt
writing dependency_links to biopython.egg-info/dependency_links.txt
reading manifest file 'biopython.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching 'Tests/Graphics/*.png'
warning: no previously-included files matching '*' found under
directory 'Tests/UnitTests'
warning: no previously-included files matching '.gitignore' found
under directory '*'
writing manifest file 'biopython.egg-info/SOURCES.txt'
removing '/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info'
(and everything under it)
Copying biopython.egg-info to
/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info
running install_scripts

I never used to get these manifest warnings during a simple "python
setup.py install" (but I recall seeing them during the official build
process under Linux when we do the manifest step).

We could tweak the manifest file I guess...

Peter

From chapmanb at 50mail.com  Tue Nov 22 20:15:08 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 22 Nov 2011 20:15:08 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
	<CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
Message-ID: <87sjlfhc6b.fsf@fastmail.fm>


Peter;

> I've noticed a probably regression on my Mac,
> 
> $ python setup.py install
[.,.]
> warning: no previously-included files found matching 'Tests/Graphics/*.png'
> warning: no previously-included files matching '*' found under

These look like warnings from setuptools about excluding some files that
aren't actually present or included. Apparently distutils silently
ignores them. I cleaned up the MANIFEST.in to reduce these. Thanks for
spotting this,
Brad

From p.j.a.cock at googlemail.com  Wed Nov 23 04:14:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 23 Nov 2011 09:14:21 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87sjlfhc6b.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
	<CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
	<87sjlfhc6b.fsf@fastmail.fm>
Message-ID: <CAKVJ-_7h3zjQu+H960SZNYFLZwpOC00eobHHiX+QdS4ged8LgQ@mail.gmail.com>

On Wed, Nov 23, 2011 at 1:15 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> I've noticed a probably regression on my Mac,
>>
>> $ python setup.py install
> [.,.]
>> warning: no previously-included files found matching 'Tests/Graphics/*.png'
>> warning: no previously-included files matching '*' found under
>
> These look like warnings from setuptools about excluding some files that
> aren't actually present or included. Apparently distutils silently
> ignores them.

That was my guess.

> I cleaned up the MANIFEST.in to reduce these. Thanks for
> spotting this,
> Brad

Thanks,

Peter

From p.j.a.cock at googlemail.com  Thu Nov 24 06:54:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 24 Nov 2011 11:54:35 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
Message-ID: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>

Dear all,

Aside from a problem with leaking handles, the remaining problem
with Biopython's test suite under PyPy is in Bio.Restriction,
specifically this line in the RestrictionType class __init__ method,

super(RestrictionType, cls).__init__(cls, name, bases, dct)

Here is the error under PyPy 1.7 (same with PyPy 1.6),

$ pypy
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49)
[PyPy 1.7.0 with GCC 4.0.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``<arigato> no, normal work is so
much less tiring than vacations''
>>>> from Bio import Restriction
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "Bio/Restriction/__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "Bio/Restriction/Restriction.py", line 2404, in <module>
    newenz = T(k, bases, enzymedict[k])
  File "Bio/Restriction/Restriction.py", line 241, in __init__
    super(RestrictionType, cls).__init__(cls, name, bases, dct)
TypeError: unbound method __init__() must be called with BssMI
instance as first argument (got RestrictionType instance instead)
>>>> quit()

Note that we had to tweak the super call to get this to work under Python 2.6,
http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004369.html
https://github.com/biopython/biopython/commit/11332d6d4951406f3cc001cea41ea75fce177f89

It used to be:
super(RestrictionType, cls).__init__(name, bases, dct)

PyPy doesn't like that either,

$ pypy
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49)
[PyPy 1.7.0 with GCC 4.0.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``"3 + 3 = 8" - Anto in the JIT
talk''
>>>> from Bio import Restriction
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "Bio/Restriction/__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "Bio/Restriction/Restriction.py", line 2405, in <module>
    newenz = T(k, bases, enzymedict[k])
  File "Bio/Restriction/Restriction.py", line 242, in __init__
    super(RestrictionType, cls).__init__(name, bases, dct)
TypeError: unbound method __init__() must be called with BssMI
instance as first argument (got str instance instead)
>>>>


What I find interesting is if we comment out the super call, everything
seems to work  - test_Restriction.py and test_CAPS.py pass under PyPy,
Jython, Python 2, and Python 3. I'm tempted to just do that - but I don't
fully understand what is going on and why.

Can anyone throw some light on this?

Thanks,

Peter

From chapmanb at 50mail.com  Thu Nov 24 10:33:57 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 24 Nov 2011 10:33:57 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
Message-ID: <87lir5wn4q.fsf@fastmail.fm>


Peter;

> Aside from a problem with leaking handles, 

Is this from tempfile.mkstemp? This has tricked me an annoying number of
times, so I eventually wrote a wrapper. The trick is doing an os.close
on the file descriptor:

https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118

> the remaining problem
> with Biopython's test suite under PyPy is in Bio.Restriction,
> specifically this line in the RestrictionType class __init__ method,
> 
> super(RestrictionType, cls).__init__(cls, name, bases, dct)

That seems strange: the __init__ is calling super on itself. You'd
normally expect this from a derived class. I'm not sure why this doesn't
trigger an infinite recursion initializing the object. I'm +1 on
commenting it out.

Brad

From p.j.a.cock at googlemail.com  Fri Nov 25 06:40:49 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 25 Nov 2011 11:40:49 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <87lir5wn4q.fsf@fastmail.fm>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
Message-ID: <CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>

On Thu, Nov 24, 2011 at 3:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> Aside from a problem with leaking handles,
>
> Is this from tempfile.mkstemp? This has tricked me an annoying number of
> times, so I eventually wrote a wrapper. The trick is doing an os.close
> on the file descriptor:
>
> https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118

Possibly in test_PDB.py but there are other handle leaks.

>> the remaining problem
>> with Biopython's test suite under PyPy is in Bio.Restriction,
>> specifically this line in the RestrictionType class __init__ method,
>>
>> super(RestrictionType, cls).__init__(cls, name, bases, dct)
>
> That seems strange: the __init__ is calling super on itself. You'd
> normally expect this from a derived class. I'm not sure why this
> doesn't trigger an infinite recursion initializing the object. I'm +1
> on commenting it out.
>
> Brad

I suppose we could be cautious and skip that line under PyPy
only. How about that as a compromise - that way if is really
is important for something not covered in the unit test, we only
break it under PyPy, but C Python and Jython would be fine?

Peter

From chapmanb at 50mail.com  Fri Nov 25 20:24:25 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 25 Nov 2011 20:24:25 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
Message-ID: <8762i7he0m.fsf@fastmail.fm>


Peter;

> >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> >
> > That seems strange: the __init__ is calling super on itself. You'd
> > normally expect this from a derived class. I'm not sure why this
> > doesn't trigger an infinite recursion initializing the object. I'm +1
> > on commenting it out.

> I suppose we could be cautious and skip that line under PyPy
> only. How about that as a compromise - that way if is really
> is important for something not covered in the unit test, we only
> break it under PyPy, but C Python and Jython would be fine?

My vote would be to comment it out generally instead of if_pypy 
flags. I don't want to break anything, but if we do I'd rather find out
straight away instead of chasing down platform specific bugs later. I'd
be happy to hear other's opinions, especially if they ynderstand the
super magic going on.

Brad

From eric.talevich at gmail.com  Fri Nov 25 22:00:04 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 25 Nov 2011 22:00:04 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <8762i7he0m.fsf@fastmail.fm>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
Message-ID: <CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>

On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Peter;
>
> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> > >
> > > That seems strange: the __init__ is calling super on itself. You'd
> > > normally expect this from a derived class. I'm not sure why this
> > > doesn't trigger an infinite recursion initializing the object. I'm +1
> > > on commenting it out.
>
> > I suppose we could be cautious and skip that line under PyPy
> > only. How about that as a compromise - that way if is really
> > is important for something not covered in the unit test, we only
> > break it under PyPy, but C Python and Jython would be fine?
>
> My vote would be to comment it out generally instead of if_pypy
> flags. I don't want to break anything, but if we do I'd rather find out
> straight away instead of chasing down platform specific bugs later. I'd
> be happy to hear other's opinions, especially if they ynderstand the
> super magic going on.
>
>
I support that, and maybe we can add some more unit tests to see if we can
find out what breaks, if anything.

Looking at the Bio/Restriction/Restriction.py, I can suggest these
candidates:

1. In the implementation of the class RestrictionType, a few of the magic
methods use the test "if isinstance(other, RestrictionType)" -- can you see
any way these might break without the super().__init__ call?


2. Other classes in the same file derive from RestrictionType, but don't
define their own __init__ methods (e.g. AbstractCut, and indirectly NoCut,
OneCut, etc.). All the methods seem to be class methods, also. (NB: maybe
use the @classmethod decorator everywhere for clarity.) As far as I can
tell, the unit test only uses class methods on EciRI, not any instance
methods -- if I'm reading that right, then maybe there should be a unit
test that hits that. This and #1 can be done at the same time with the
magic methods __add__, __ne__ and __gt__, for example.


3. In Bio/Restriction/__init__.py, I see this comment:

When testing for the presence of a Restriction enzyme in a
RestrictionBatch, the user can use:
1) a class of type 'RestrictionType'
2) a string of the name of the enzyme (it's repr)
i.e:
>>> from Bio.Restriction import RestrictionBatch, EcoRI
>>> MyBatch = RestrictionBatch(EcoRI)
>>> #!/usr/bin/env python
>>> EcoRI in MyBatch        # the class EcoRI.
True
>>>
>>> 'EcoRI' in MyBatch      # a string representation
True

I don't see this included in the unit test, test_Restriction.py. I don't
think the super().__init__ combo has anything to do with this feature, but
maybe it should be tested anyway, since it relies on some substantial magic.


-Eric

From p.j.a.cock at googlemail.com  Sat Nov 26 08:38:26 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 26 Nov 2011 13:38:26 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
	<CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
Message-ID: <CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>

On Saturday, November 26, 2011, Eric Talevich <eric.talevich at gmail.com>
wrote:
> On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
>> > >
>> > > That seems strange: the __init__ is calling super on itself. You'd
>> > > normally expect this from a derived class. I'm not sure why this
>> > > doesn't trigger an infinite recursion initializing the object. I'm +1
>> > > on commenting it out.
>>
>> > I suppose we could be cautious and skip that line under PyPy
>> > only. How about that as a compromise - that way if is really
>> > is important for something not covered in the unit test, we only
>> > break it under PyPy, but C Python and Jython would be fine?
>>
>> My vote would be to comment it out generally instead of if_pypy
>> flags. I don't want to break anything, but if we do I'd rather find out
>> straight away instead of chasing down platform specific bugs later. I'd
>> be happy to hear other's opinions, especially if they ynderstand the
>> super magic going on.
>>
>
> I support that, and maybe we can add some more unit tests to
> see if we can find out what breaks, if anything.

OK

> Looking at the Bio/Restriction/Restriction.py, I can suggest these
> candidates:

Great - do you want to try to turn those into unit tests?

Thanks,

Peter

From eric.talevich at gmail.com  Sat Nov 26 14:49:35 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 26 Nov 2011 14:49:35 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
	<CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
	<CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>
Message-ID: <CAMC681mQoCo38PHJBFxqFCLUBawSKOBZj=apj-jrTqk+ipc_Xw@mail.gmail.com>

On Sat, Nov 26, 2011 at 8:38 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Saturday, November 26, 2011, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com>
> wrote:
> >>
> >> Peter;
> >>
> >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> >> > >
> >> > > That seems strange: the __init__ is calling super on itself. You'd
> >> > > normally expect this from a derived class. I'm not sure why this
> >> > > doesn't trigger an infinite recursion initializing the object. I'm
> +1
> >> > > on commenting it out.
> >>
> >> > I suppose we could be cautious and skip that line under PyPy
> >> > only. How about that as a compromise - that way if is really
> >> > is important for something not covered in the unit test, we only
> >> > break it under PyPy, but C Python and Jython would be fine?
> >>
> >> My vote would be to comment it out generally instead of if_pypy
> >> flags. I don't want to break anything, but if we do I'd rather find out
> >> straight away instead of chasing down platform specific bugs later. I'd
> >> be happy to hear other's opinions, especially if they ynderstand the
> >> super magic going on.
> >>
> >
> > I support that, and maybe we can add some more unit tests to
> > see if we can find out what breaks, if anything.
>
> OK
>
>
> > Looking at the Bio/Restriction/Restriction.py, I can suggest these
> > candidates:
>
> Great - do you want to try to turn those into unit tests?
>
>
Sure thing. Here's the relevant commit:
https://github.com/biopython/biopython/commit/eb1c163909801731dc0a3d7fbcb2ee514f212da3

Unit tests for most of the magic methods were already there, I just didn't
notice them earlier.

I also commented out the offending line in Restriction.py and stirred the
code a bit in that file and in the test suite. I tested with Python 2.7 and
Pypy 1.7 on Ubuntu; we'll see what the build bots say now.

-Eric

From p.j.a.cock at googlemail.com  Tue Nov  1 21:21:31 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 21:21:31 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
Message-ID: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>

Dear all,

Would someone like to review the TogoWS code I have written
to access the Togo Web Service's REST API please?

http://togows.dbcls.jp/
http://togows.dbcls.jp/site/en/rest.html
http://dx.doi.org/doi:10.1093/nar/gkq386

This provides a nice simple URL based API for fetching database
entries in various formats (XML, JSON, GenBank etc - even some
individual fields from some database records, e.g. the accession
of a GenBank record), searching, and even some file format
conversion (which uses a range of tools on their server, some
in BioRuby and others in BioPerl I believe).

The code is on this branch,
https://github.com/peterjc/biopython/tree/togows

See module Bio.TogoWS and its docstrings,
https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py

Unit tests in Tests/test_TogoWS.py
https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py

I have be guided by the naming we've used in Bio.Entrez
for accessing the NCBI Entrez API.

Note that in addition to major Japanese databases, TogoWS
also proxies and caches data from Europe (e.g. UniProt) and
America (e.g. GenBank and PubMed). It was very fast when
testing from Japan this summer - not quite so speedy from
the UK though ;)

Personally I found TogoWS much easier to use for searching
and retrieving batches of records than the NCBI Entrez API
with its complicated history requirement. I expect it to be
particularly popular with Biopython uses in Japan.

Thanks in advance,

Peter


From p.j.a.cock at googlemail.com  Tue Nov  1 21:27:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 1 Nov 2011 21:27:15 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
Message-ID: <CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>

On Tue, Nov 1, 2011 at 9:21 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear all,
>
> Would someone like to review the TogoWS code I have written
> to access the Togo Web Service's REST API please?
>
> ...
>
> Unit tests in Tests/test_TogoWS.py
> https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py

P.S. Some of the test are a little bit slow right now, so
we can comment some out as part of merging this to the
trunk.

Peter


From chapmanb at 50mail.com  Wed Nov  2 12:19:58 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 02 Nov 2011 08:19:58 -0400
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
	<CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
Message-ID: <8762j2iump.fsf@fastmail.fm>


Peter;

> > Would someone like to review the TogoWS code I have written
> > to access the Togo Web Service's REST API please?

This looks great and the tests are all passing for me. My only small
suggestion would be to avoid hardcoding 'http://togows.dbcls.jp'
everywhere. I'd stick this as a top level variable along with the global
caches and reference it in the code. This way if they ever get any
mirrors we could adjust on the fly.

Thanks for getting this in,
Brad


From p.j.a.cock at googlemail.com  Wed Nov  2 13:27:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 2 Nov 2011 13:27:25 +0000
Subject: [Biopython-dev] TogoWS in Biopython?
In-Reply-To: <8762j2iump.fsf@fastmail.fm>
References: <CAKVJ-_73tdOmZynMGkxZ+KppKw7md4eQkrD9SHiKe-4iU6MHsw@mail.gmail.com>
	<CAKVJ-_71S6GmvfDXM6krCgBMFnt7A7gxXrfS8_07tGOwMDYj1A@mail.gmail.com>
	<8762j2iump.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6kjBtOELqRxGHd-+Yb2gSeGp97mXDRJ1gJFagWALGL2Q@mail.gmail.com>

On Wed, Nov 2, 2011 at 12:19 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> > Would someone like to review the TogoWS code I have written
>> > to access the Togo Web Service's REST API please?
>
> This looks great and the tests are all passing for me. My only small
> suggestion would be to avoid hardcoding 'http://togows.dbcls.jp'
> everywhere. I'd stick this as a top level variable along with the global
> caches and reference it in the code. This way if they ever get any
> mirrors we could adjust on the fly.
>
> Thanks for getting this in,
> Brad

Good point regarding the URL.

I've also realised it will need some tweaks for Python 3 (bytes
versus unicode), or at least to skip the unit tests in the short
term to avoid hiding real errors on the buildbot.

Peter


From redmine at redmine.open-bio.org  Tue Nov  8 10:17:00 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 8 Nov 2011 10:17:00 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] (New) Failing to parse
	fasta-m10 format generated by lalign36
Message-ID: <redmine.issue-3312.20111108101700@redmine.open-bio.org>


Issue #3312 has been reported by gahoo lee.

----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Nov  8 15:38:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 15:38:32 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked
	GNU Zip Format)
Message-ID: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>

Dear all,

We've talking in the past about indexing sequencing in gzipped files, e.g.
http://lists.open-bio.org/pipermail/biopython/2010-June/006546.html

That discussion concluded that random access into simple GZIP files
was not practical, but BGZF (used in BAM) was worth looking into.
I wrote some proof of principle code back then:
http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html

I have recently polished that old code up, and done some
benchmarking (using some reasonably large FASTA, Swiss,
and UniProt-XML files). Please read this blog post:
http://blastedbio.blogspot.com/

I think random access to sequences compressed with BGZF is fast
enough to be useful practically (while confirming this is not true for
large gzipped files). I've also put this idea forward on SEQanswers,
http://seqanswers.com/forums/showthread.php?t=15347

The cleaned up BGZF code is on the following branch:
https://github.com/peterjc/biopython/tree/bgzf

This adds a new module Bio.bgzf (position in namespace open to
debate) which provides read/write handles to BGZF files - trying to
follow the API used in the Python gzip library.

I then use the new BGZF reader (with its special seek/tell offsets)
from within Bio.SeqIO's index functionality. I've been doing testing
with Bio.SeqIO.index(...) only so far, but it should work fine with
Bio.SeqIO.index_db(...) as well but here the SQLite schema will
need a small update to record the compression type for each file.

Is anyone interested in testing this out?

Note that to produce a BGZF file, you can use the tool bgzip in
samtools, or Bio/bgzf.py if run directly at the command line will
compress stdin to stdout. Both approaches call zlib internally,
and the run time is practically identical.

Regards,

Peter


From p.j.a.cock at googlemail.com  Tue Nov  8 15:41:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 15:41:15 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
	(Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
Message-ID: <CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>

On Tue, Nov 8, 2011 at 3:38 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> That discussion concluded that random access into simple GZIP files
> was not practical, but BGZF (used in BAM) was worth looking into.
> I wrote some proof of principle code back then:
> http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html
>
> I have recently polished that old code up, and done some
> benchmarking (using some reasonably large FASTA, Swiss,
> and UniProt-XML files). Please read this blog post:
> http://blastedbio.blogspot.com/

More precise link to my BGZF post:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

Peter


From bioinformed at gmail.com  Tue Nov  8 17:40:36 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Tue, 8 Nov 2011 12:40:36 -0500
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
Message-ID: <CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>

I've added a proper LRU uncompressed block cache to the samtools tabix
code, if that would be of any help.  It greatly improves performance for
many access patterns.  (I didn't look to see if you'd already done that in
your code.)

-Kevin


From p.j.a.cock at googlemail.com  Tue Nov  8 17:52:59 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 17:52:59 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
Message-ID: <CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>

On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
> I've added a proper LRU uncompressed block cache to the samtools tabix code,
> if that would be of any help. ?It greatly improves performance for many
> access patterns. ?(I didn't look to see if you'd already done that in your
> code.)
> -Kevin

Hi Kevin,

Is this already in the mainline samtools tabix repository?

The current implementation in my Python code just caches the
current block - but a simple pool had occurred to me. How many
blocks (given each is 64kb) and how best to pick that number
isn't obvious to me. Perhaps you can suggest some sensible
defaults?

In fact, a proper LRU cache would make sense for the handle
pool in Bio.SeqIO.index_db(...) as well.

Regards,

Peter


From bioinformed at gmail.com  Tue Nov  8 18:11:56 2011
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Tue, 8 Nov 2011 13:11:56 -0500
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
Message-ID: <CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>

On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
> > I've added a proper LRU uncompressed block cache to the samtools tabix
> code,
> > if that would be of any help.  It greatly improves performance for many
> > access patterns.  (I didn't look to see if you'd already done that in
> your
> > code.)
> > -Kevin
>
> Hi Kevin,
>
> Is this already in the mainline samtools tabix repository?
>
> The current implementation in my Python code just caches the
> current block - but a simple pool had occurred to me. How many
> blocks (given each is 64kb) and how best to pick that number
> isn't obvious to me. Perhaps you can suggest some sensible
> defaults?
>
> In fact, a proper LRU cache would make sense for the handle
> pool in Bio.SeqIO.index_db(...) as well.
>
>
Hi Peter,

There is a random-eviction cache implemented in the mainline that is okay,
but it is turned off by default and, if enabled, can be very inefficient if
it keeps evicting your most active blocks.  Converting the cache it to LRU
was very simple and I've been using it locally for some time now, but I
haven't had time to send the changes on to Heng Li.

I choose the size of the cache based on the application and access
patterns.  For roughly sequential sequence queries (a la samtools faidx or
Pysam Fastafile), all one needs is a handful of active blocks (say 16).
 When repeated querying tabix files via pysam, I typically use 128 blocks
for the best trade-off between memory and performance.  Choosing a cache
size for BAM files is much more complicated and I have a wide-range of
setting depending on how many parallel BAM streams and access patterns are
employed.

The cache size numbers needed to be quite a bit larger before switching to
LRU (which was a bit surprising).  However, using even a small cache is
vastly beneficial for many access patterns.   The cost of re-reading a
block from disk can be mitigated by the OS filesystem cache, but the
decompression step takes non-trivial CPU time and can be triggered dozens
of hundreds of times per block for some sensible-seeming access patterns.

-Kevin


From p.j.a.cock at googlemail.com  Tue Nov  8 18:28:04 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 8 Nov 2011 18:28:04 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
	<CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
Message-ID: <CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>

On Tue, Nov 8, 2011 at 6:11 PM, Kevin Jacobs wrote:
> On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote:
>> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote:
>> > I've added a proper LRU uncompressed block cache to the
>> > samtools tabix code, if that would be of any help. It greatl
>> > improves performance for many access patterns.
>> >?(I didn't look to see if you'd already done that in your
>> > code.)
>> > -Kevin
>>
>> Hi Kevin,
>>
>> Is this already in the mainline samtools tabix repository?
>>
>> The current implementation in my Python code just caches the
>> current block - but a simple pool had occurred to me. How many
>> blocks (given each is 64kb) and how best to pick that number
>> isn't obvious to me. Perhaps you can suggest some sensible
>> defaults?
>>
>> In fact, a proper LRU cache would make sense for the handle
>> pool in Bio.SeqIO.index_db(...) as well.
>>
>
> Hi Peter,
>
> There is a random-eviction cache implemented in the mainline that is okay,
> but it is turned off by default and, if enabled, can be very inefficient if
> it keeps evicting your most active blocks. ?Converting the cache it to LRU
> was very simple and I've been using it locally for some time now, but I
> haven't had time to send the changes on to Heng Li.

Are your changes on github or somewhere public? Heng Li has the
core samtools bit of the samtools SVN on github, which he seems
to use for experimental new code: https://github.com/lh3/samtools

> I choose the size of the cache based on the application and access patterns.
> ?For roughly sequential sequence queries (a la samtools faidx or Pysam
> Fastafile), all one needs is a handful of active blocks (say 16). ?When
> repeated querying tabix files via pysam, I typically use 128 blocks for the
> best trade-off between memory and performance. ?Choosing a cache size for
> BAM files is much more complicated and I have a wide-range of setting
> depending on how many parallel BAM streams and access patterns are employed.
> The cache size numbers needed to be quite a bit larger before switching to
> LRU (which was a bit surprising). ?However, using even a small cache is
> vastly beneficial for many access patterns. ? The cost of re-reading a block
> from disk can be mitigated by the OS filesystem cache, but the decompression
> step takes non-trivial CPU time and can be triggered dozens of hundreds of
> times per block for some sensible-seeming access patterns.
> -Kevin

Certainly useful food for thought - thank you. I agree that the OS
will probably cache commonly used BGZF blocks in the filesystem
cache, but it doesn't solve the CPU overhead of decompression.

In the case of Bio.SeqIO.index(...) which accesses one file, and
Bio.SeqIO.index_db(...) which may access several files, we currently
don't offer any end user options like this. However, there is an internal
option for the max number of handles, and a similar option could
control the number of BGZF blocks to cache. I could try 100
blocks (100 times 64kb is about 6MB) as the default, and redo
the UniProt timings (random access to sequences).

That might be a good compromise, given the SeqIO indexing code
has no easy way to know the calling code's usage patterns.

As I said on the blog post, we should be able to improve the
speed of the BGZF random access - this idea alone could
make a big difference, although probably a naive block cache
(rather than LRU) would be a worthwhile step in itself.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Nov  9 19:53:52 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 9 Nov 2011 19:53:52 +0000
Subject: [Biopython-dev] Fwd: Bug in DSSP.py
In-Reply-To: <CAD2GCOi6t7TTVj=XNojivendA-AWNYCXgTZmJCh45BocMtjA8A@mail.gmail.com>
References: <CAD2GCOi6t7TTVj=XNojivendA-AWNYCXgTZmJCh45BocMtjA8A@mail.gmail.com>
Message-ID: <CAKVJ-_7u-YLA3QQcobBmMTXRg0tdKazcTay6wiVAuVN2AxiHwA@mail.gmail.com>

FYI, hopefully someone uses DSSP.

---------- Forwarded message ----------
From: Austin Meyer
Date: Tuesday, November 8, 2011
Subject: Bug in DSSP.py
To: biopython-owner at lists.open-bio.org


Ahoy,

I have no idea how to contribute code so I thought I would pass this along.

The newest DSSP adds a citation section for the first two lines, and a
blank third line in it's output file.  The parser reads each line one at a
time, splits it, then looks at the second element of the resulting list.
As the blank line has only one element, there is an index out of range
failure that occurs. This error does not happen with the older DSSP
version.  A quick fix checks the length of the list prior to looking at
it's elements.  Thus at line 121 in the DSSP.py file, just after the sl =
l.split(), this will fix the problem:

*if len(sl) < 2:
>   continue*
>

The whole function will look like so:

*def make_dssp_dict(filename):
>     """
>     Return a DSSP dictionary that maps (chainid, resid) to
>     aa, ss and accessibility, from a DSSP file.
>
>     @param filename: the DSSP output file
>     @type filename: string
>     """
>     dssp = {}
>     handle = open(filename, "r")
>     try:
>         start = 0
>         keys = []
>         for l in handle.readlines():
>             sl = l.split()
>             if len(sl) < 2:
>                 continue
>             if sl[1] == "RESIDUE":
>                 # Start parsing from here
>                 start = 1
>                 continue
>             if not start:
>                 continue
>             if l[9] == " ":
>                 # Skip -- missing residue
>                 continue
>             resseq = int(l[5:10])
>             icode = l[10]
>             chainid = l[11]
>             aa = l[13]
>             ss = l[16]
>             if ss == " ":
>                 ss = "-"
>             try:
>                 acc = int(l[34:38])
>                 phi = float(l[103:109])
>                 psi = float(l[109:115])
>             except ValueError, exc:
>                 # DSSP output breaks its own format when there are >9999
>                 # residues, since only 4 digits are allocated to the seq
> num
>                 # field.  See 3kic chain T res 321, 1vsy chain T res 6077.
>                 # Here, look for whitespace to figure out the number of
> extra
>                 # digits, and shift parsing the rest of the line by that
> amount.
>                 if l[34] != ' ':
>                     shift = l[34:].find(' ')
>                     acc = int((l[34+shift:38+shift]))
>                     phi = float(l[103+shift:109+shift])
>                     psi = float(l[109+shift:115+shift])
>                 else:
>                     raise ValueError, exc
>             res_id = (" ", resseq, icode)
>             dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi)
>             keys.append((chainid, res_id))
>     finally:
>         handle.close()
>     return dssp, keys
> *


Thanks,

--
Austin Meyer


From p.j.a.cock at googlemail.com  Thu Nov 10 00:01:19 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 10 Nov 2011 00:01:19 +0000
Subject: [Biopython-dev] Indexing sequences compressed with BGZF
 (Blocked GNU Zip Format)
In-Reply-To: <CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>
References: <CAKVJ-_7_MLKthBfv7f4t=5Kzp-Nx6SR95zuBxgj-_GQB9e6ERQ@mail.gmail.com>
	<CAKVJ-_6n-RUJ6Qq5Axn2A7YgF9oDjKdrAEs2URYC4dnTDTBFyQ@mail.gmail.com>
	<CAD=vDiqxziZ=Td=7dUcCxyXMQ2i_eRwvWxQb1nAf04TqjiSt-Q@mail.gmail.com>
	<CAKVJ-_7RqEWCLpmFgP3np14rc_nDL3wxaXbKNVy5DGeeam+3Aw@mail.gmail.com>
	<CAD=vDiqFUStRr-64Nqqot195AKrDVNQaYpTA_-7JUtb76PdQEA@mail.gmail.com>
	<CAKVJ-_7YnYfiXbdRQkH8CFoo0iAgNBozZrfP6fTr_oEnLWdhfg@mail.gmail.com>
Message-ID: <CAKVJ-_4P6Ta-DANrZmBTh0aFQUhq=erQEn05FLwVV47tXj1==A@mail.gmail.com>

On Tue, Nov 8, 2011 at 6:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> I choose the size of the cache based on the application and access patterns.
>> ?For roughly sequential sequence queries (a la samtools faidx or Pysam
>> Fastafile), all one needs is a handful of active blocks (say 16). ?When
>> repeated querying tabix files via pysam, I typically use 128 blocks for the
>> best trade-off between memory and performance. ?Choosing a cache size for
>> BAM files is much more complicated and I have a wide-range of setting
>> depending on how many parallel BAM streams and access patterns are employed.
>> The cache size numbers needed to be quite a bit larger before switching to
>> LRU (which was a bit surprising). ?However, using even a small cache is
>> vastly beneficial for many access patterns. ? The cost of re-reading a block
>> from disk can be mitigated by the OS filesystem cache, but the decompression
>> step takes non-trivial CPU time and can be triggered dozens of hundreds of
>> times per block for some sensible-seeming access patterns.
>> -Kevin
>
> Certainly useful food for thought - thank you. I agree that the OS
> will probably cache commonly used BGZF blocks in the filesystem
> cache, but it doesn't solve the CPU overhead of decompression.
>
> In the case of Bio.SeqIO.index(...) which accesses one file, and
> Bio.SeqIO.index_db(...) which may access several files, we currently
> don't offer any end user options like this. However, there is an internal
> option for the max number of handles, and a similar option could
> control the number of BGZF blocks to cache. I could try 100
> blocks (100 times 64kb is about 6MB) as the default, and redo
> the UniProt timings (random access to sequences).
>
> That might be a good compromise, given the SeqIO indexing code
> has no easy way to know the calling code's usage patterns.

I've tried a cache of up to 100 BGZF blocks which are cleared
"randomly" and it doesn't make a noticeable difference to my
UniProt benchmark, which is a shame but not actually very
surprising. After all, that is deliberately accessing the records
(and thus the blocks) in a random order, and the files contain
far far more than 100 blocks.

I'll need a more realistic test case to properly evaluate the cache.

One example that comes to mind is iterating over BAM reads
(which would look at blocks sequentially) but also jumping to
look at the partner reads (paired end etc) and then back again.

Peter

P.S. When I said "random", what I'm actually using is a Python
dictionary keyed on the start offset, and the dictionary's itempop
method to remove a cached block "at random" once I have got
100 blocks in memory and need to free one. Of course, this isn't
really random, it is arbitrary and likely Python implementation
dependent.


From redmine at redmine.open-bio.org  Thu Nov 10 10:10:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 10:10:06 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14731.20111110101006@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

Thank you - I can reproduce this on the latest Biopython in our repository.

May we include your sample file in Biopython as a unit test please?
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 11:10:23 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 11:10:23 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14732.20111110111023@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.


Sure. My pleasure.
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 11:34:39 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 11:34:39 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14733.20111110113439@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
</pre>

At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
</pre>

Curious. It seems LALIGN is starting to write out another alignment, but then doesn't.

It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines.

I have updated Biopython to give a more helpful error message in this case:
https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25

<pre>
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
...     print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
</pre>

Are you on Bill Pearson's FASTA mailing list? We should report this.

Peter

----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 13:13:55 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 13:13:55 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14734.20111110131355@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.

File 3Seqs.zip added

Well, I'm not on the FASTA mailing list. In fact I found a small bug in mshowalign2.c which a colon is missing on line 616, just don't know how to join the mailing list.
Here's the FASTA output with 3 sequences alignment, I hope these file would help. The odd lines changed in the output.
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Thu Nov 10 14:33:28 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 10 Nov 2011 14:33:28 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14735.20111110143328@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


The link has changed slightly, but the mailing list is here:

https://lists.virginia.edu/sympa/info/fasta_list
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Fri Nov 11 01:42:59 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Fri, 11 Nov 2011 01:42:59 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14736.20111111014259@redmine.open-bio.org>


Issue #3312 has been updated by gahoo lee.


Oh, I got it.
Did you report this problem to FASTA mailing list?
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Nov 16 16:27:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 16 Nov 2011 16:27:46 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
Message-ID: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>

Hi all,

Something I've been working on this month in discussion with Leighton
is some enhancements to GenomeDiagram, driven partly by a figure
I wanted to draw for a paper. The code is here,
https://github.com/peterjc/biopython/tree/gd-links

First, we can now show links between tracks joining any two features
or regions. One use of this is to mimic the output from the Artemis
Comparison Tool, ACT, http://www.sanger.ac.uk/resources/software/act/
ACT is great as an exploratory tool, but doesn't let you output a high
quality vector image.

Related to this, it is useful to be able to "crop" different tracks, since
for ACT style comparisons the different sequences are unlikely to
be the same length. Therefore each GenomeDiagram track can now
have its own start/end positions outside which is doesn't get drawn.

This includes some extra unit tests, run test_GenomeDiagram.py
and have a look at Graphics/GD_by_obj_*.pdf

Also try the file Doc/example/ACT_example.py which mimics
a simple two-reference ACT diagram:
https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py

Simple linear output (split into three fragments) shown here:
http://twitter.com/#!/pjacock/status/136509137826754560

Circular version here (in this case deliberately not using a
closed circle, but that works too), note the curving links are
intentional so as to display very large cross-links nicely:
http://twitter.com/#!/pjacock/status/136840628502933505

This demo script should use blue flipped links where the matches
are to the reverse strand. I haven't put together a nice example
for a proper demonstration of that yet. Perhaps a set of several
E. coli genomes would work nicely...

I plan to merge this to the trunk, and write some end-use
documentation, but would be happy to have someone else
look over the code first.

Note that the API is intended to be quite low level but very
flexible in terms of creating the cross links. You can use
transparency (as in the current version of ACT_example.py)
or explicitly colour links according to say BLAST bit score.
The user also has full control of the z-order, which again
allows you to do things like ACT does and put longer
matches at the back with short matches at the front, etc.

Peter


From chapmanb at 50mail.com  Thu Nov 17 11:51:11 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 06:51:11 -0500
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
Message-ID: <87hb23ezm8.fsf@fastmail.fm>


Peter;

> Something I've been working on this month in discussion with Leighton
> is some enhancements to GenomeDiagram, driven partly by a figure
> I wanted to draw for a paper. The code is here,
> https://github.com/peterjc/biopython/tree/gd-links

Awesome. The direction you are pushing this is great. I'd definitely
love to see this in the next release.

> Also try the file Doc/example/ACT_example.py which mimics
> a simple two-reference ACT diagram:
> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
> 
> Simple linear output (split into three fragments) shown here:
> http://twitter.com/#!/pjacock/status/136509137826754560

Really nice. My only suggestion would be to combine the examples and
outputs together in the Cookbook. One of the best ways to learn plotting
and drawing packages is by looking through examples, finding one that
most closely matches what you want, and then iterating until you get at
what you need.

Brad


From chapmanb at 50mail.com  Thu Nov 17 12:00:01 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 07:00:01 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
Message-ID: <87d3crez7i.fsf@fastmail.fm>


Peter and Eric;
I wanted to follow up about the patch to automate Biopython installs
from easy_install and pip when NumPu is not present:

https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b

You'd both reviewed it, and the only holdup was a warning message when
setuptools is not installed:

> $ jython setup.py install
> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
> Unknown distribution option: 'install_requires'
>   warnings.warn(msg)

We'd discussed some other options like including setuptools and
installing it, ignoring the warning, or ignoring it since it is not
problematic.

My lazy side says ignoring it is fine, but if you want to explicitly
turn it off we can use this around the setup call:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

Happy to handle it however you prefer but I'd love to get this in,
Brad


From p.j.a.cock at googlemail.com  Thu Nov 17 12:24:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 12:24:42 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <87hb23ezm8.fsf@fastmail.fm>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
	<87hb23ezm8.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>

On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> Something I've been working on this month in discussion with Leighton
>> is some enhancements to GenomeDiagram, driven partly by a figure
>> I wanted to draw for a paper. The code is here,
>> https://github.com/peterjc/biopython/tree/gd-links
>
> Awesome. The direction you are pushing this is great. I'd definitely
> love to see this in the next release.

Cool. It will end up being a graphics heavy release at this rate :)

>> Also try the file Doc/example/ACT_example.py which mimics
>> a simple two-reference ACT diagram:
>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
>>
>> Simple linear output (split into three fragments) shown here:
>> http://twitter.com/#!/pjacock/status/136509137826754560
>
> Really nice. My only suggestion would be to combine the examples and
> outputs together in the Cookbook. One of the best ways to learn plotting
> and drawing packages is by looking through examples, finding one that
> most closely matches what you want, and then iterating until you get at
> what you need.

Unless I can find a nicer small sample dataset (or make one) which
includes an inversion, I plan to use that ACT sample data in the
tutorial - basically taking the user though the ACT_example.py
script.

Peter


From p.j.a.cock at googlemail.com  Thu Nov 17 12:45:54 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 12:45:54 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87d3crez7i.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
Message-ID: <CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>

On Thu, Nov 17, 2011 at 12:00 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter and Eric;
> I wanted to follow up about the patch to automate Biopython installs
> from easy_install and pip when NumPu is not present:
>
> https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b
>
> You'd both reviewed it, and the only holdup was a warning message when
> setuptools is not installed:
>
>> $ jython setup.py install
>> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning:
>> Unknown distribution option: 'install_requires'
>> ? warnings.warn(msg)
>
> We'd discussed some other options like including setuptools and
> installing it, ignoring the warning, or ignoring it since it is not
> problematic.
>
> My lazy side says ignoring it is fine, but if you want to explicitly
> turn it off we can use this around the setup call:
>
> with warnings.catch_warnings():
> ? ?warnings.simplefilter("ignore")
>
> Happy to handle it however you prefer but I'd love to get this in,
> Brad

How about this to avoid the warning by not passing the argument?
https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e

Note I rebased to the current master.

If you and Eric are happy with that, I guess we can check it in
and see how the build slaves like it...

Peter


From chapmanb at 50mail.com  Thu Nov 17 13:56:41 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 08:56:41 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
Message-ID: <87aa7ug8di.fsf@fastmail.fm>


Peter;

> > I wanted to follow up about the patch to automate Biopython installs
> > from easy_install and pip when NumPu is not present:
[...]
> How about this to avoid the warning by not passing the argument?
> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e

That works great, thanks for looking at this. Having this in the next
release will be a big help for scripts using install_requires.

Brad


From redmine at redmine.open-bio.org  Thu Nov 17 14:10:30 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Thu, 17 Nov 2011 14:10:30 +0000
Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10
	format generated by lalign36
References: <redmine.issue-3312.20111108101700@redmine.open-bio.org>
Message-ID: <redmine.journal-14741.20111117141030@redmine.open-bio.org>


Issue #3312 has been updated by Peter Cock.


Missing alignments reported here:
https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00001.html

Missing colon reported here:
https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00004.html
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Thu Nov 17 14:13:11 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 14:13:11 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87aa7ug8di.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>

On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> > I wanted to follow up about the patch to automate Biopython installs
>> > from easy_install and pip when NumPu is not present:
> [...]
>> How about this to avoid the warning by not passing the argument?
>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e
>
> That works great, thanks for looking at this. Having this in the next
> release will be a big help for scripts using install_requires.
>
> Brad

OK, I'll put that on the trunk then - thanks Brad.

Peter


From p.j.a.cock at googlemail.com  Thu Nov 17 15:10:34 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 15:10:34 +0000
Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram
In-Reply-To: <CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>
References: <CAKVJ-_5BHJxz_MNmHPbD12kFOmm4eYO9zF2JAcmN5t31Xjv1ig@mail.gmail.com>
	<87hb23ezm8.fsf@fastmail.fm>
	<CAKVJ-_5wv9d7AyuDpAJp++eHa-CjQTv5Sz3ROqifC1L78CtOEQ@mail.gmail.com>
Message-ID: <CAKVJ-_7TrOP0FsOtRDWTA1r5v6TKkGZCznOfSFcxs4jYcECHLA@mail.gmail.com>

On Thu, Nov 17, 2011 at 12:24 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>>> Something I've been working on this month in discussion with Leighton
>>> is some enhancements to GenomeDiagram, driven partly by a figure
>>> I wanted to draw for a paper. The code is here,
>>> https://github.com/peterjc/biopython/tree/gd-links
>>
>> Awesome. The direction you are pushing this is great. I'd definitely
>> love to see this in the next release.
>
> Cool. It will end up being a graphics heavy release at this rate :)
>

Committed to trunk,
https://github.com/biopython/biopython/commit/980791237330923706e4dc4901bb6794d3222d0e

>>> Also try the file Doc/example/ACT_example.py which mimics
>>> a simple two-reference ACT diagram:
>>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py
>>>
>>> Simple linear output (split into three fragments) shown here:
>>> http://twitter.com/#!/pjacock/status/136509137826754560
>>
>> Really nice. My only suggestion would be to combine the examples and
>> outputs together in the Cookbook. One of the best ways to learn plotting
>> and drawing packages is by looking through examples, finding one that
>> most closely matches what you want, and then iterating until you get at
>> what you need.
>
> Unless I can find a nicer small sample dataset (or make one) which
> includes an inversion, I plan to use that ACT sample data in the
> tutorial - basically taking the user though the ACT_example.py
> script.

I plan to do another OBF blog entry on this as well, probably
with the same example.

Peter


From p.j.a.cock at googlemail.com  Thu Nov 17 15:12:55 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 17 Nov 2011 15:12:55 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
Message-ID: <CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>

On Thu, Nov 17, 2011 at 2:13 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>>> > I wanted to follow up about the patch to automate Biopython installs
>>> > from easy_install and pip when NumPu is not present:
>> [...]
>>> How about this to avoid the warning by not passing the argument?
>>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e
>>
>> That works great, thanks for looking at this. Having this in the next
>> release will be a big help for scripts using install_requires.
>>
>> Brad
>
> OK, I'll put that on the trunk then - thanks Brad.
>
> Peter

That all looks fine with the buildslaves, but the real
testing will be with random end user machines.

Brad, could you write a snippet for the NEWS file about
this? Basically when using setuptools to install Biopython
it will list NumPy as a dependency (except on Jython
and PyPy) and thus install it if not present already?

Peter


From chapmanb at 50mail.com  Thu Nov 17 15:51:01 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 17 Nov 2011 10:51:01 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
Message-ID: <877h2yg32y.fsf@fastmail.fm>


Peter;

> That all looks fine with the buildslaves, but the real
> testing will be with random end user machines.
> 
> Brad, could you write a snippet for the NEWS file about
> this? Basically when using setuptools to install Biopython
> it will list NumPy as a dependency (except on Jython
> and PyPy) and thus install it if not present already?

Great, glad that is working without any problems. I added a bit to the
news about the functionality and usage. Thanks again for the help,
Brad


From anaryin at gmail.com  Thu Nov 17 23:16:56 2011
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Fri, 18 Nov 2011 00:16:56 +0100
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
	<CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
Message-ID: <CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>

Hey all,

My laptop decided to die on me the last week...

I added a very simple and small example to the docstring, in line with all
the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can
cherry-pick it?

Best,

Jo?o [...] Rodrigues
http://nmr.chem.uu.nl/~joao


2011/10/27 Jo?o Rodrigues <anaryin at gmail.com>

> Sure thing. The docstring is actually pretty explicit, it's just missing
> the part that you can get the matrices from SubsMat. Or at least, not that
> clear. I'll go over it this weekend, maybe earlier.
>
> Best,
>
> Jo?o
>


From p.j.a.cock at googlemail.com  Fri Nov 18 10:37:23 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 18 Nov 2011 10:37:23 +0000
Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a
	generic function?
In-Reply-To: <CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>
References: <CAJ9sUYNTF5JJx3MPNMinqyD-zqWpHruxdH_9a+GDymkUSL0C+A@mail.gmail.com>
	<CAKVJ-_7KGjsF_MaQ-ngVSMN43T2_R2kkyYh6Cmh9a3hkk8NhuQ@mail.gmail.com>
	<CAJ9sUYMfQOPbJJDb-mC3=g_WmsoS9z5JWFWtFMHsidyfVexNtw@mail.gmail.com>
	<CAKVJ-_5Ak8prNGztJDUu-14USQ7qAYCQ+XJ6Oo0_FaBKbX3hTA@mail.gmail.com>
	<CAMC681nYk8NNTw19F7qGOiEHPL3CSorcG+-Ugi7tL3WOZGup2Q@mail.gmail.com>
	<CAJ9sUYMM=TGyMCyPeCt1A1_0DSbtQv5heobkM1L46G5etNSetQ@mail.gmail.com>
	<CAKVJ-_4yt6Hi5W_fvmj4yFT2Z=a26kQ6FLkOg44bgnYJ7NE3mw@mail.gmail.com>
	<CAJ9sUYMei67Wm9R37FpCuis6Tb16OMJ+Rp+ATTN5FPbv9GcCiw@mail.gmail.com>
	<CAJ9sUYP2O=Kh+jc70H6rfsV8d+uwcYFZqtJ4x_9qvo3mQer_jQ@mail.gmail.com>
Message-ID: <CAKVJ-_66iKoBqOKucrrzrTwkBwsNvoeQUdzEguEM9UH2smSc_A@mail.gmail.com>

On Thu, Nov 17, 2011 at 11:16 PM, Jo?o Rodrigues <anaryin at gmail.com> wrote:
> Hey all,
> My laptop decided to die on me the last week...
> I added a very simple and small example to the docstring, in line with all
> the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can
> cherry-pick it?
> Best,
> Jo?o [...] Rodrigues
> http://nmr.chem.uu.nl/~joao

Cherry-picked, and updated the existing examples to make them into
functional doctests, and call them from the test suite.

Thanks.

Peter


From redmine at redmine.open-bio.org  Mon Nov 21 14:35:37 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 21 Nov 2011 14:35:37 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14742.20111121143537@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.

Assignee set to Biopython Dev Mailing List

We'd got most things working or skipped gracefully under PyPy 1.6 and we're in almost the same situation for PyPy 1.7

I just fixed a break under PyPy 1.7 where we assumed set order,
https://github.com/biopython/biopython/commit/d6a3fce2d03d6e613600abec4d837c8c7b929f6f

>From test_Entrez.py under PyPy 1.6 we hit https://bugs.pypy.org/issue914 which is fixed in PyPy 1.7 but I'm now hitting https://bugs.pypy.org/issue933 instead.

Note that "import numpy" has been replaced with "import numpypy" in PyPy 1.7, so if we may decide not to support PyPy 1.6 that hassle goes away.

Still issues with test_Pathway.py, test_Restriction.py (and also test_CAPS.py) and a whole load of "Too many open files" - probably due to leaking handles and different garbage collection.
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Nov 21 14:37:54 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 21 Nov 2011 14:37:54 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14743.20111121143754@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.


Point of clarification - the code on the Biopython trunk is deliberately skipping all our C extensions under PyPy (and Jython). We may want to start gradually enabling those if possible - but getting the pure Python code all working first seems like a sensible strategy.
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Nov 22 11:30:58 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 22 Nov 2011 11:30:58 +0000
Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in
	PyPy 1.5
References: <redmine.issue-3236.20110524161311@redmine.open-bio.org>
Message-ID: <redmine.journal-14744.20111122113058@redmine.open-bio.org>


Issue #3236 has been updated by Peter Cock.


I have deprecated Bio.Pathway.Rep.HashSet and switched Bio.Pathway.Rep.Graph to use Python's built in set instead. This means test_Pathway.py now passes under PyPy 1.6 and 1.7,

https://github.com/biopython/biopython/commit/cbc7c875448a9a57a4cdcbecbc01bcf6b115da69
----------------------------------------
Feature #3236: Make Biopython work in PyPy 1.5
https://redmine.open-bio.org/issues/3236

Author: Eric Talevich
Status: In Progress
Priority: Low
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


PyPy is now roughly as production-ready as Jython:
http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html

Let's make Biopython work on PyPy 1.5.

To make the pure-Python core of Biopython work, I did this:

* Download and unpack the pre-compiled Linux tarball from pypy.org
* Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory
* pypy setup.py build; pypy setup.py install
* Delete pypy-c-.../site-packages/Bio/cpairwise2*.so

Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation.

Numpy isn't available on PyPy yet, and it may be some time before it does.

Observations from @pypy setup.py test@:

* test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions
* test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?)
* test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err)
* test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation
* importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Nov 22 12:22:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 22 Nov 2011 12:22:21 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <877h2yg32y.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
Message-ID: <CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>

On Thu, Nov 17, 2011 at 3:51 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Great, glad that is working without any problems. I added a bit to the
> news about the functionality and usage. Thanks again for the help,
> Brad
>

I've noticed a probably regression on my Mac,

$ python setup.py install
running install
running build
running build_py
running build_ext
running install_lib
running install_egg_info
running egg_info
writing biopython.egg-info/PKG-INFO
writing top-level names to biopython.egg-info/top_level.txt
writing dependency_links to biopython.egg-info/dependency_links.txt
reading manifest file 'biopython.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching 'Tests/Graphics/*.png'
warning: no previously-included files matching '*' found under
directory 'Tests/UnitTests'
warning: no previously-included files matching '.gitignore' found
under directory '*'
writing manifest file 'biopython.egg-info/SOURCES.txt'
removing '/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info'
(and everything under it)
Copying biopython.egg-info to
/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info
running install_scripts

I never used to get these manifest warnings during a simple "python
setup.py install" (but I recall seeing them during the official build
process under Linux when we do the manifest step).

We could tweak the manifest file I guess...

Peter


From chapmanb at 50mail.com  Wed Nov 23 01:15:08 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 22 Nov 2011 20:15:08 -0500
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
	automated programs
In-Reply-To: <CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
	<CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
Message-ID: <87sjlfhc6b.fsf@fastmail.fm>


Peter;

> I've noticed a probably regression on my Mac,
> 
> $ python setup.py install
[.,.]
> warning: no previously-included files found matching 'Tests/Graphics/*.png'
> warning: no previously-included files matching '*' found under

These look like warnings from setuptools about excluding some files that
aren't actually present or included. Apparently distutils silently
ignores them. I cleaned up the MANIFEST.in to reduce these. Thanks for
spotting this,
Brad


From p.j.a.cock at googlemail.com  Wed Nov 23 09:14:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 23 Nov 2011 09:14:21 +0000
Subject: [Biopython-dev] NumPy dialog when Biopython installed from
 automated programs
In-Reply-To: <87sjlfhc6b.fsf@fastmail.fm>
References: <871uuhm1fe.fsf@fastmail.fm>
	<CAMC681=h322OjGESjwj3n7n9CzvDWu=K2aY0mZinONO+PYk9Xg@mail.gmail.com>
	<87hb3b51ve.fsf@fastmail.fm>
	<CAKVJ-_6Spa7ynW+_DEq0HWj2zYtoy_pU4SMwtv3t6YaMM=E8yQ@mail.gmail.com>
	<CAKVJ-_5XwM1QQ=+ZVvHwW=UyWHVFdNP0cz-LQ0UZU1JSsaAPMg@mail.gmail.com>
	<87d3crez7i.fsf@fastmail.fm>
	<CAKVJ-_5GLQDtru-MCwmhF83qfFjetMYGN5EA+4EJEpY+NDBbXA@mail.gmail.com>
	<87aa7ug8di.fsf@fastmail.fm>
	<CAKVJ-_6CbZgBQ_3krTAzD5CK7rgnv4w1ckTVbuHG__VPrWC1EQ@mail.gmail.com>
	<CAKVJ-_5-AphG+y4xV1xO64rzd3gLKo6prtWtujXNDmpXkSQxrw@mail.gmail.com>
	<877h2yg32y.fsf@fastmail.fm>
	<CAKVJ-_6_9W5vOd-N+0ev2rMtKTbHBHDJEeVA4On7CKgSjCZ=ZA@mail.gmail.com>
	<87sjlfhc6b.fsf@fastmail.fm>
Message-ID: <CAKVJ-_7h3zjQu+H960SZNYFLZwpOC00eobHHiX+QdS4ged8LgQ@mail.gmail.com>

On Wed, Nov 23, 2011 at 1:15 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> I've noticed a probably regression on my Mac,
>>
>> $ python setup.py install
> [.,.]
>> warning: no previously-included files found matching 'Tests/Graphics/*.png'
>> warning: no previously-included files matching '*' found under
>
> These look like warnings from setuptools about excluding some files that
> aren't actually present or included. Apparently distutils silently
> ignores them.

That was my guess.

> I cleaned up the MANIFEST.in to reduce these. Thanks for
> spotting this,
> Brad

Thanks,

Peter


From p.j.a.cock at googlemail.com  Thu Nov 24 11:54:35 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 24 Nov 2011 11:54:35 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
Message-ID: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>

Dear all,

Aside from a problem with leaking handles, the remaining problem
with Biopython's test suite under PyPy is in Bio.Restriction,
specifically this line in the RestrictionType class __init__ method,

super(RestrictionType, cls).__init__(cls, name, bases, dct)

Here is the error under PyPy 1.7 (same with PyPy 1.6),

$ pypy
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49)
[PyPy 1.7.0 with GCC 4.0.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``<arigato> no, normal work is so
much less tiring than vacations''
>>>> from Bio import Restriction
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "Bio/Restriction/__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "Bio/Restriction/Restriction.py", line 2404, in <module>
    newenz = T(k, bases, enzymedict[k])
  File "Bio/Restriction/Restriction.py", line 241, in __init__
    super(RestrictionType, cls).__init__(cls, name, bases, dct)
TypeError: unbound method __init__() must be called with BssMI
instance as first argument (got RestrictionType instance instead)
>>>> quit()

Note that we had to tweak the super call to get this to work under Python 2.6,
http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004369.html
https://github.com/biopython/biopython/commit/11332d6d4951406f3cc001cea41ea75fce177f89

It used to be:
super(RestrictionType, cls).__init__(name, bases, dct)

PyPy doesn't like that either,

$ pypy
Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49)
[PyPy 1.7.0 with GCC 4.0.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``"3 + 3 = 8" - Anto in the JIT
talk''
>>>> from Bio import Restriction
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "Bio/Restriction/__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "Bio/Restriction/Restriction.py", line 2405, in <module>
    newenz = T(k, bases, enzymedict[k])
  File "Bio/Restriction/Restriction.py", line 242, in __init__
    super(RestrictionType, cls).__init__(name, bases, dct)
TypeError: unbound method __init__() must be called with BssMI
instance as first argument (got str instance instead)
>>>>


What I find interesting is if we comment out the super call, everything
seems to work  - test_Restriction.py and test_CAPS.py pass under PyPy,
Jython, Python 2, and Python 3. I'm tempted to just do that - but I don't
fully understand what is going on and why.

Can anyone throw some light on this?

Thanks,

Peter


From chapmanb at 50mail.com  Thu Nov 24 15:33:57 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 24 Nov 2011 10:33:57 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
Message-ID: <87lir5wn4q.fsf@fastmail.fm>


Peter;

> Aside from a problem with leaking handles, 

Is this from tempfile.mkstemp? This has tricked me an annoying number of
times, so I eventually wrote a wrapper. The trick is doing an os.close
on the file descriptor:

https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118

> the remaining problem
> with Biopython's test suite under PyPy is in Bio.Restriction,
> specifically this line in the RestrictionType class __init__ method,
> 
> super(RestrictionType, cls).__init__(cls, name, bases, dct)

That seems strange: the __init__ is calling super on itself. You'd
normally expect this from a derived class. I'm not sure why this doesn't
trigger an infinite recursion initializing the object. I'm +1 on
commenting it out.

Brad


From p.j.a.cock at googlemail.com  Fri Nov 25 11:40:49 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 25 Nov 2011 11:40:49 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <87lir5wn4q.fsf@fastmail.fm>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
Message-ID: <CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>

On Thu, Nov 24, 2011 at 3:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter;
>
>> Aside from a problem with leaking handles,
>
> Is this from tempfile.mkstemp? This has tricked me an annoying number of
> times, so I eventually wrote a wrapper. The trick is doing an os.close
> on the file descriptor:
>
> https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118

Possibly in test_PDB.py but there are other handle leaks.

>> the remaining problem
>> with Biopython's test suite under PyPy is in Bio.Restriction,
>> specifically this line in the RestrictionType class __init__ method,
>>
>> super(RestrictionType, cls).__init__(cls, name, bases, dct)
>
> That seems strange: the __init__ is calling super on itself. You'd
> normally expect this from a derived class. I'm not sure why this
> doesn't trigger an infinite recursion initializing the object. I'm +1
> on commenting it out.
>
> Brad

I suppose we could be cautious and skip that line under PyPy
only. How about that as a compromise - that way if is really
is important for something not covered in the unit test, we only
break it under PyPy, but C Python and Jython would be fine?

Peter


From chapmanb at 50mail.com  Sat Nov 26 01:24:25 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 25 Nov 2011 20:24:25 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
Message-ID: <8762i7he0m.fsf@fastmail.fm>


Peter;

> >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> >
> > That seems strange: the __init__ is calling super on itself. You'd
> > normally expect this from a derived class. I'm not sure why this
> > doesn't trigger an infinite recursion initializing the object. I'm +1
> > on commenting it out.

> I suppose we could be cautious and skip that line under PyPy
> only. How about that as a compromise - that way if is really
> is important for something not covered in the unit test, we only
> break it under PyPy, but C Python and Jython would be fine?

My vote would be to comment it out generally instead of if_pypy 
flags. I don't want to break anything, but if we do I'd rather find out
straight away instead of chasing down platform specific bugs later. I'd
be happy to hear other's opinions, especially if they ynderstand the
super magic going on.

Brad


From eric.talevich at gmail.com  Sat Nov 26 03:00:04 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 25 Nov 2011 22:00:04 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <8762i7he0m.fsf@fastmail.fm>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
Message-ID: <CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>

On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com> wrote:

>
> Peter;
>
> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> > >
> > > That seems strange: the __init__ is calling super on itself. You'd
> > > normally expect this from a derived class. I'm not sure why this
> > > doesn't trigger an infinite recursion initializing the object. I'm +1
> > > on commenting it out.
>
> > I suppose we could be cautious and skip that line under PyPy
> > only. How about that as a compromise - that way if is really
> > is important for something not covered in the unit test, we only
> > break it under PyPy, but C Python and Jython would be fine?
>
> My vote would be to comment it out generally instead of if_pypy
> flags. I don't want to break anything, but if we do I'd rather find out
> straight away instead of chasing down platform specific bugs later. I'd
> be happy to hear other's opinions, especially if they ynderstand the
> super magic going on.
>
>
I support that, and maybe we can add some more unit tests to see if we can
find out what breaks, if anything.

Looking at the Bio/Restriction/Restriction.py, I can suggest these
candidates:

1. In the implementation of the class RestrictionType, a few of the magic
methods use the test "if isinstance(other, RestrictionType)" -- can you see
any way these might break without the super().__init__ call?


2. Other classes in the same file derive from RestrictionType, but don't
define their own __init__ methods (e.g. AbstractCut, and indirectly NoCut,
OneCut, etc.). All the methods seem to be class methods, also. (NB: maybe
use the @classmethod decorator everywhere for clarity.) As far as I can
tell, the unit test only uses class methods on EciRI, not any instance
methods -- if I'm reading that right, then maybe there should be a unit
test that hits that. This and #1 can be done at the same time with the
magic methods __add__, __ne__ and __gt__, for example.


3. In Bio/Restriction/__init__.py, I see this comment:

When testing for the presence of a Restriction enzyme in a
RestrictionBatch, the user can use:
1) a class of type 'RestrictionType'
2) a string of the name of the enzyme (it's repr)
i.e:
>>> from Bio.Restriction import RestrictionBatch, EcoRI
>>> MyBatch = RestrictionBatch(EcoRI)
>>> #!/usr/bin/env python
>>> EcoRI in MyBatch        # the class EcoRI.
True
>>>
>>> 'EcoRI' in MyBatch      # a string representation
True

I don't see this included in the unit test, test_Restriction.py. I don't
think the super().__init__ combo has anything to do with this feature, but
maybe it should be tested anyway, since it relies on some substantial magic.


-Eric


From p.j.a.cock at googlemail.com  Sat Nov 26 13:38:26 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sat, 26 Nov 2011 13:38:26 +0000
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
	<CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
Message-ID: <CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>

On Saturday, November 26, 2011, Eric Talevich <eric.talevich at gmail.com>
wrote:
> On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>
>> Peter;
>>
>> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
>> > >
>> > > That seems strange: the __init__ is calling super on itself. You'd
>> > > normally expect this from a derived class. I'm not sure why this
>> > > doesn't trigger an infinite recursion initializing the object. I'm +1
>> > > on commenting it out.
>>
>> > I suppose we could be cautious and skip that line under PyPy
>> > only. How about that as a compromise - that way if is really
>> > is important for something not covered in the unit test, we only
>> > break it under PyPy, but C Python and Jython would be fine?
>>
>> My vote would be to comment it out generally instead of if_pypy
>> flags. I don't want to break anything, but if we do I'd rather find out
>> straight away instead of chasing down platform specific bugs later. I'd
>> be happy to hear other's opinions, especially if they ynderstand the
>> super magic going on.
>>
>
> I support that, and maybe we can add some more unit tests to
> see if we can find out what breaks, if anything.

OK

> Looking at the Bio/Restriction/Restriction.py, I can suggest these
> candidates:

Great - do you want to try to turn those into unit tests?

Thanks,

Peter


From eric.talevich at gmail.com  Sat Nov 26 19:49:35 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 26 Nov 2011 14:49:35 -0500
Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?)
In-Reply-To: <CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>
References: <CAKVJ-_7uBB0fjQ-iJ3cMG+iYDM+hujGpLgU8vhnEHQXy8wBGeA@mail.gmail.com>
	<87lir5wn4q.fsf@fastmail.fm>
	<CAKVJ-_4t2YYMzEtEMDN-R2VaVAiAVguF+cV-4RK64VLgD0EGhA@mail.gmail.com>
	<8762i7he0m.fsf@fastmail.fm>
	<CAMC681nNbTmZJPdVvs9zyDPSG0ooiXTXPY4Vq4nnozhvpUmNQg@mail.gmail.com>
	<CAKVJ-_5Ercnpvqh0ypX3O0jkzK9L4qEGyDkM_cZT5PfUs9Zsrg@mail.gmail.com>
Message-ID: <CAMC681mQoCo38PHJBFxqFCLUBawSKOBZj=apj-jrTqk+ipc_Xw@mail.gmail.com>

On Sat, Nov 26, 2011 at 8:38 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Saturday, November 26, 2011, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman <chapmanb at 50mail.com>
> wrote:
> >>
> >> Peter;
> >>
> >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct)
> >> > >
> >> > > That seems strange: the __init__ is calling super on itself. You'd
> >> > > normally expect this from a derived class. I'm not sure why this
> >> > > doesn't trigger an infinite recursion initializing the object. I'm
> +1
> >> > > on commenting it out.
> >>
> >> > I suppose we could be cautious and skip that line under PyPy
> >> > only. How about that as a compromise - that way if is really
> >> > is important for something not covered in the unit test, we only
> >> > break it under PyPy, but C Python and Jython would be fine?
> >>
> >> My vote would be to comment it out generally instead of if_pypy
> >> flags. I don't want to break anything, but if we do I'd rather find out
> >> straight away instead of chasing down platform specific bugs later. I'd
> >> be happy to hear other's opinions, especially if they ynderstand the
> >> super magic going on.
> >>
> >
> > I support that, and maybe we can add some more unit tests to
> > see if we can find out what breaks, if anything.
>
> OK
>
>
> > Looking at the Bio/Restriction/Restriction.py, I can suggest these
> > candidates:
>
> Great - do you want to try to turn those into unit tests?
>
>
Sure thing. Here's the relevant commit:
https://github.com/biopython/biopython/commit/eb1c163909801731dc0a3d7fbcb2ee514f212da3

Unit tests for most of the magic methods were already there, I just didn't
notice them earlier.

I also commented out the offending line in Restriction.py and stirred the
code a bit in that file and in the test suite. I tested with Python 2.7 and
Pypy 1.7 on Ubuntu; we'll see what the build bots say now.

-Eric