From p.j.a.cock at googlemail.com Tue Nov 1 17:21:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 21:21:31 +0000 Subject: [Biopython-dev] TogoWS in Biopython? Message-ID: Dear all, Would someone like to review the TogoWS code I have written to access the Togo Web Service's REST API please? http://togows.dbcls.jp/ http://togows.dbcls.jp/site/en/rest.html http://dx.doi.org/doi:10.1093/nar/gkq386 This provides a nice simple URL based API for fetching database entries in various formats (XML, JSON, GenBank etc - even some individual fields from some database records, e.g. the accession of a GenBank record), searching, and even some file format conversion (which uses a range of tools on their server, some in BioRuby and others in BioPerl I believe). The code is on this branch, https://github.com/peterjc/biopython/tree/togows See module Bio.TogoWS and its docstrings, https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py Unit tests in Tests/test_TogoWS.py https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py I have be guided by the naming we've used in Bio.Entrez for accessing the NCBI Entrez API. Note that in addition to major Japanese databases, TogoWS also proxies and caches data from Europe (e.g. UniProt) and America (e.g. GenBank and PubMed). It was very fast when testing from Japan this summer - not quite so speedy from the UK though ;) Personally I found TogoWS much easier to use for searching and retrieving batches of records than the NCBI Entrez API with its complicated history requirement. I expect it to be particularly popular with Biopython uses in Japan. Thanks in advance, Peter From p.j.a.cock at googlemail.com Tue Nov 1 17:27:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 21:27:15 +0000 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: References: Message-ID: On Tue, Nov 1, 2011 at 9:21 PM, Peter Cock wrote: > Dear all, > > Would someone like to review the TogoWS code I have written > to access the Togo Web Service's REST API please? > > ... > > Unit tests in Tests/test_TogoWS.py > https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py P.S. Some of the test are a little bit slow right now, so we can comment some out as part of merging this to the trunk. Peter From chapmanb at 50mail.com Wed Nov 2 08:19:58 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 02 Nov 2011 08:19:58 -0400 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: References: Message-ID: <8762j2iump.fsf@fastmail.fm> Peter; > > Would someone like to review the TogoWS code I have written > > to access the Togo Web Service's REST API please? This looks great and the tests are all passing for me. My only small suggestion would be to avoid hardcoding 'http://togows.dbcls.jp' everywhere. I'd stick this as a top level variable along with the global caches and reference it in the code. This way if they ever get any mirrors we could adjust on the fly. Thanks for getting this in, Brad From p.j.a.cock at googlemail.com Wed Nov 2 09:27:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 2 Nov 2011 13:27:25 +0000 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: <8762j2iump.fsf@fastmail.fm> References: <8762j2iump.fsf@fastmail.fm> Message-ID: On Wed, Nov 2, 2011 at 12:19 PM, Brad Chapman wrote: > > Peter; > >> > Would someone like to review the TogoWS code I have written >> > to access the Togo Web Service's REST API please? > > This looks great and the tests are all passing for me. My only small > suggestion would be to avoid hardcoding 'http://togows.dbcls.jp' > everywhere. I'd stick this as a top level variable along with the global > caches and reference it in the code. This way if they ever get any > mirrors we could adjust on the fly. > > Thanks for getting this in, > Brad Good point regarding the URL. I've also realised it will need some tweaks for Python 3 (bytes versus unicode), or at least to skip the unit tests in the short term to avoid hiding real errors on the buildbot. Peter From redmine at redmine.open-bio.org Tue Nov 8 05:17:00 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 8 Nov 2011 10:17:00 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] (New) Failing to parse fasta-m10 format generated by lalign36 Message-ID: Issue #3312 has been reported by gahoo lee. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Nov 8 10:38:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 15:38:32 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) Message-ID: Dear all, We've talking in the past about indexing sequencing in gzipped files, e.g. http://lists.open-bio.org/pipermail/biopython/2010-June/006546.html That discussion concluded that random access into simple GZIP files was not practical, but BGZF (used in BAM) was worth looking into. I wrote some proof of principle code back then: http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html I have recently polished that old code up, and done some benchmarking (using some reasonably large FASTA, Swiss, and UniProt-XML files). Please read this blog post: http://blastedbio.blogspot.com/ I think random access to sequences compressed with BGZF is fast enough to be useful practically (while confirming this is not true for large gzipped files). I've also put this idea forward on SEQanswers, http://seqanswers.com/forums/showthread.php?t=15347 The cleaned up BGZF code is on the following branch: https://github.com/peterjc/biopython/tree/bgzf This adds a new module Bio.bgzf (position in namespace open to debate) which provides read/write handles to BGZF files - trying to follow the API used in the Python gzip library. I then use the new BGZF reader (with its special seek/tell offsets) from within Bio.SeqIO's index functionality. I've been doing testing with Bio.SeqIO.index(...) only so far, but it should work fine with Bio.SeqIO.index_db(...) as well but here the SQLite schema will need a small update to record the compression type for each file. Is anyone interested in testing this out? Note that to produce a BGZF file, you can use the tool bgzip in samtools, or Bio/bgzf.py if run directly at the command line will compress stdin to stdout. Both approaches call zlib internally, and the run time is practically identical. Regards, Peter From p.j.a.cock at googlemail.com Tue Nov 8 10:41:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 15:41:15 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 3:38 PM, Peter Cock wrote: > That discussion concluded that random access into simple GZIP files > was not practical, but BGZF (used in BAM) was worth looking into. > I wrote some proof of principle code back then: > http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html > > I have recently polished that old code up, and done some > benchmarking (using some reasonably large FASTA, Swiss, > and UniProt-XML files). Please read this blog post: > http://blastedbio.blogspot.com/ More precise link to my BGZF post: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html Peter From bioinformed at gmail.com Tue Nov 8 12:40:36 2011 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 8 Nov 2011 12:40:36 -0500 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: I've added a proper LRU uncompressed block cache to the samtools tabix code, if that would be of any help. It greatly improves performance for many access patterns. (I didn't look to see if you'd already done that in your code.) -Kevin From p.j.a.cock at googlemail.com Tue Nov 8 12:52:59 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 17:52:59 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: > I've added a proper LRU uncompressed block cache to the samtools tabix code, > if that would be of any help. ?It greatly improves performance for many > access patterns. ?(I didn't look to see if you'd already done that in your > code.) > -Kevin Hi Kevin, Is this already in the mainline samtools tabix repository? The current implementation in my Python code just caches the current block - but a simple pool had occurred to me. How many blocks (given each is 64kb) and how best to pick that number isn't obvious to me. Perhaps you can suggest some sensible defaults? In fact, a proper LRU cache would make sense for the handle pool in Bio.SeqIO.index_db(...) as well. Regards, Peter From bioinformed at gmail.com Tue Nov 8 13:11:56 2011 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 8 Nov 2011 13:11:56 -0500 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote: > On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: > > I've added a proper LRU uncompressed block cache to the samtools tabix > code, > > if that would be of any help. It greatly improves performance for many > > access patterns. (I didn't look to see if you'd already done that in > your > > code.) > > -Kevin > > Hi Kevin, > > Is this already in the mainline samtools tabix repository? > > The current implementation in my Python code just caches the > current block - but a simple pool had occurred to me. How many > blocks (given each is 64kb) and how best to pick that number > isn't obvious to me. Perhaps you can suggest some sensible > defaults? > > In fact, a proper LRU cache would make sense for the handle > pool in Bio.SeqIO.index_db(...) as well. > > Hi Peter, There is a random-eviction cache implemented in the mainline that is okay, but it is turned off by default and, if enabled, can be very inefficient if it keeps evicting your most active blocks. Converting the cache it to LRU was very simple and I've been using it locally for some time now, but I haven't had time to send the changes on to Heng Li. I choose the size of the cache based on the application and access patterns. For roughly sequential sequence queries (a la samtools faidx or Pysam Fastafile), all one needs is a handful of active blocks (say 16). When repeated querying tabix files via pysam, I typically use 128 blocks for the best trade-off between memory and performance. Choosing a cache size for BAM files is much more complicated and I have a wide-range of setting depending on how many parallel BAM streams and access patterns are employed. The cache size numbers needed to be quite a bit larger before switching to LRU (which was a bit surprising). However, using even a small cache is vastly beneficial for many access patterns. The cost of re-reading a block from disk can be mitigated by the OS filesystem cache, but the decompression step takes non-trivial CPU time and can be triggered dozens of hundreds of times per block for some sensible-seeming access patterns. -Kevin From p.j.a.cock at googlemail.com Tue Nov 8 13:28:04 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 18:28:04 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 6:11 PM, Kevin Jacobs wrote: > On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote: >> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: >> > I've added a proper LRU uncompressed block cache to the >> > samtools tabix code, if that would be of any help. It greatl >> > improves performance for many access patterns. >> >?(I didn't look to see if you'd already done that in your >> > code.) >> > -Kevin >> >> Hi Kevin, >> >> Is this already in the mainline samtools tabix repository? >> >> The current implementation in my Python code just caches the >> current block - but a simple pool had occurred to me. How many >> blocks (given each is 64kb) and how best to pick that number >> isn't obvious to me. Perhaps you can suggest some sensible >> defaults? >> >> In fact, a proper LRU cache would make sense for the handle >> pool in Bio.SeqIO.index_db(...) as well. >> > > Hi Peter, > > There is a random-eviction cache implemented in the mainline that is okay, > but it is turned off by default and, if enabled, can be very inefficient if > it keeps evicting your most active blocks. ?Converting the cache it to LRU > was very simple and I've been using it locally for some time now, but I > haven't had time to send the changes on to Heng Li. Are your changes on github or somewhere public? Heng Li has the core samtools bit of the samtools SVN on github, which he seems to use for experimental new code: https://github.com/lh3/samtools > I choose the size of the cache based on the application and access patterns. > ?For roughly sequential sequence queries (a la samtools faidx or Pysam > Fastafile), all one needs is a handful of active blocks (say 16). ?When > repeated querying tabix files via pysam, I typically use 128 blocks for the > best trade-off between memory and performance. ?Choosing a cache size for > BAM files is much more complicated and I have a wide-range of setting > depending on how many parallel BAM streams and access patterns are employed. > The cache size numbers needed to be quite a bit larger before switching to > LRU (which was a bit surprising). ?However, using even a small cache is > vastly beneficial for many access patterns. ? The cost of re-reading a block > from disk can be mitigated by the OS filesystem cache, but the decompression > step takes non-trivial CPU time and can be triggered dozens of hundreds of > times per block for some sensible-seeming access patterns. > -Kevin Certainly useful food for thought - thank you. I agree that the OS will probably cache commonly used BGZF blocks in the filesystem cache, but it doesn't solve the CPU overhead of decompression. In the case of Bio.SeqIO.index(...) which accesses one file, and Bio.SeqIO.index_db(...) which may access several files, we currently don't offer any end user options like this. However, there is an internal option for the max number of handles, and a similar option could control the number of BGZF blocks to cache. I could try 100 blocks (100 times 64kb is about 6MB) as the default, and redo the UniProt timings (random access to sequences). That might be a good compromise, given the SeqIO indexing code has no easy way to know the calling code's usage patterns. As I said on the blog post, we should be able to improve the speed of the BGZF random access - this idea alone could make a big difference, although probably a naive block cache (rather than LRU) would be a worthwhile step in itself. Regards, Peter From p.j.a.cock at googlemail.com Wed Nov 9 14:53:52 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Nov 2011 19:53:52 +0000 Subject: [Biopython-dev] Fwd: Bug in DSSP.py In-Reply-To: References: Message-ID: FYI, hopefully someone uses DSSP. ---------- Forwarded message ---------- From: Austin Meyer Date: Tuesday, November 8, 2011 Subject: Bug in DSSP.py To: biopython-owner at lists.open-bio.org Ahoy, I have no idea how to contribute code so I thought I would pass this along. The newest DSSP adds a citation section for the first two lines, and a blank third line in it's output file. The parser reads each line one at a time, splits it, then looks at the second element of the resulting list. As the blank line has only one element, there is an index out of range failure that occurs. This error does not happen with the older DSSP version. A quick fix checks the length of the list prior to looking at it's elements. Thus at line 121 in the DSSP.py file, just after the sl = l.split(), this will fix the problem: *if len(sl) < 2: > continue* > The whole function will look like so: *def make_dssp_dict(filename): > """ > Return a DSSP dictionary that maps (chainid, resid) to > aa, ss and accessibility, from a DSSP file. > > @param filename: the DSSP output file > @type filename: string > """ > dssp = {} > handle = open(filename, "r") > try: > start = 0 > keys = [] > for l in handle.readlines(): > sl = l.split() > if len(sl) < 2: > continue > if sl[1] == "RESIDUE": > # Start parsing from here > start = 1 > continue > if not start: > continue > if l[9] == " ": > # Skip -- missing residue > continue > resseq = int(l[5:10]) > icode = l[10] > chainid = l[11] > aa = l[13] > ss = l[16] > if ss == " ": > ss = "-" > try: > acc = int(l[34:38]) > phi = float(l[103:109]) > psi = float(l[109:115]) > except ValueError, exc: > # DSSP output breaks its own format when there are >9999 > # residues, since only 4 digits are allocated to the seq > num > # field. See 3kic chain T res 321, 1vsy chain T res 6077. > # Here, look for whitespace to figure out the number of > extra > # digits, and shift parsing the rest of the line by that > amount. > if l[34] != ' ': > shift = l[34:].find(' ') > acc = int((l[34+shift:38+shift])) > phi = float(l[103+shift:109+shift]) > psi = float(l[109+shift:115+shift]) > else: > raise ValueError, exc > res_id = (" ", resseq, icode) > dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi) > keys.append((chainid, res_id)) > finally: > handle.close() > return dssp, keys > * Thanks, -- Austin Meyer From p.j.a.cock at googlemail.com Wed Nov 9 19:01:19 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 10 Nov 2011 00:01:19 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 6:28 PM, Peter Cock wrote: >> I choose the size of the cache based on the application and access patterns. >> ?For roughly sequential sequence queries (a la samtools faidx or Pysam >> Fastafile), all one needs is a handful of active blocks (say 16). ?When >> repeated querying tabix files via pysam, I typically use 128 blocks for the >> best trade-off between memory and performance. ?Choosing a cache size for >> BAM files is much more complicated and I have a wide-range of setting >> depending on how many parallel BAM streams and access patterns are employed. >> The cache size numbers needed to be quite a bit larger before switching to >> LRU (which was a bit surprising). ?However, using even a small cache is >> vastly beneficial for many access patterns. ? The cost of re-reading a block >> from disk can be mitigated by the OS filesystem cache, but the decompression >> step takes non-trivial CPU time and can be triggered dozens of hundreds of >> times per block for some sensible-seeming access patterns. >> -Kevin > > Certainly useful food for thought - thank you. I agree that the OS > will probably cache commonly used BGZF blocks in the filesystem > cache, but it doesn't solve the CPU overhead of decompression. > > In the case of Bio.SeqIO.index(...) which accesses one file, and > Bio.SeqIO.index_db(...) which may access several files, we currently > don't offer any end user options like this. However, there is an internal > option for the max number of handles, and a similar option could > control the number of BGZF blocks to cache. I could try 100 > blocks (100 times 64kb is about 6MB) as the default, and redo > the UniProt timings (random access to sequences). > > That might be a good compromise, given the SeqIO indexing code > has no easy way to know the calling code's usage patterns. I've tried a cache of up to 100 BGZF blocks which are cleared "randomly" and it doesn't make a noticeable difference to my UniProt benchmark, which is a shame but not actually very surprising. After all, that is deliberately accessing the records (and thus the blocks) in a random order, and the files contain far far more than 100 blocks. I'll need a more realistic test case to properly evaluate the cache. One example that comes to mind is iterating over BAM reads (which would look at blocks sequentially) but also jumping to look at the partner reads (paired end etc) and then back again. Peter P.S. When I said "random", what I'm actually using is a Python dictionary keyed on the start offset, and the dictionary's itempop method to remove a cached block "at random" once I have got 100 blocks in memory and need to free one. Of course, this isn't really random, it is arbitrary and likely Python implementation dependent. From redmine at redmine.open-bio.org Thu Nov 10 05:10:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 10:10:06 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List Thank you - I can reproduce this on the latest Biopython in our repository. May we include your sample file in Biopython as a unit test please? ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 06:10:23 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 11:10:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. Sure. My pleasure. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 06:34:39 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 11:34:39 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
Curious. It seems LALIGN is starting to write out another alignment, but then doesn't. It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines. I have updated Biopython to give a more helpful error message in this case: https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
...     print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
Are you on Bill Pearson's FASTA mailing list? We should report this. Peter ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 08:13:55 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 13:13:55 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. File 3Seqs.zip added Well, I'm not on the FASTA mailing list. In fact I found a small bug in mshowalign2.c which a colon is missing on line 616, just don't know how to join the mailing list. Here's the FASTA output with 3 sequences alignment, I hope these file would help. The odd lines changed in the output. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 09:33:28 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 14:33:28 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. The link has changed slightly, but the mailing list is here: https://lists.virginia.edu/sympa/info/fasta_list ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 20:42:59 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 11 Nov 2011 01:42:59 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. Oh, I got it. Did you report this problem to FASTA mailing list? ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Wed Nov 16 11:27:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 16:27:46 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram Message-ID: Hi all, Something I've been working on this month in discussion with Leighton is some enhancements to GenomeDiagram, driven partly by a figure I wanted to draw for a paper. The code is here, https://github.com/peterjc/biopython/tree/gd-links First, we can now show links between tracks joining any two features or regions. One use of this is to mimic the output from the Artemis Comparison Tool, ACT, http://www.sanger.ac.uk/resources/software/act/ ACT is great as an exploratory tool, but doesn't let you output a high quality vector image. Related to this, it is useful to be able to "crop" different tracks, since for ACT style comparisons the different sequences are unlikely to be the same length. Therefore each GenomeDiagram track can now have its own start/end positions outside which is doesn't get drawn. This includes some extra unit tests, run test_GenomeDiagram.py and have a look at Graphics/GD_by_obj_*.pdf Also try the file Doc/example/ACT_example.py which mimics a simple two-reference ACT diagram: https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py Simple linear output (split into three fragments) shown here: http://twitter.com/#!/pjacock/status/136509137826754560 Circular version here (in this case deliberately not using a closed circle, but that works too), note the curving links are intentional so as to display very large cross-links nicely: http://twitter.com/#!/pjacock/status/136840628502933505 This demo script should use blue flipped links where the matches are to the reverse strand. I haven't put together a nice example for a proper demonstration of that yet. Perhaps a set of several E. coli genomes would work nicely... I plan to merge this to the trunk, and write some end-use documentation, but would be happy to have someone else look over the code first. Note that the API is intended to be quite low level but very flexible in terms of creating the cross links. You can use transparency (as in the current version of ACT_example.py) or explicitly colour links according to say BLAST bit score. The user also has full control of the z-order, which again allows you to do things like ACT does and put longer matches at the back with short matches at the front, etc. Peter From chapmanb at 50mail.com Thu Nov 17 06:51:11 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 06:51:11 -0500 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: References: Message-ID: <87hb23ezm8.fsf@fastmail.fm> Peter; > Something I've been working on this month in discussion with Leighton > is some enhancements to GenomeDiagram, driven partly by a figure > I wanted to draw for a paper. The code is here, > https://github.com/peterjc/biopython/tree/gd-links Awesome. The direction you are pushing this is great. I'd definitely love to see this in the next release. > Also try the file Doc/example/ACT_example.py which mimics > a simple two-reference ACT diagram: > https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py > > Simple linear output (split into three fragments) shown here: > http://twitter.com/#!/pjacock/status/136509137826754560 Really nice. My only suggestion would be to combine the examples and outputs together in the Cookbook. One of the best ways to learn plotting and drawing packages is by looking through examples, finding one that most closely matches what you want, and then iterating until you get at what you need. Brad From chapmanb at 50mail.com Thu Nov 17 07:00:01 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 07:00:01 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> Message-ID: <87d3crez7i.fsf@fastmail.fm> Peter and Eric; I wanted to follow up about the patch to automate Biopython installs from easy_install and pip when NumPu is not present: https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b You'd both reviewed it, and the only holdup was a warning message when setuptools is not installed: > $ jython setup.py install > /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning: > Unknown distribution option: 'install_requires' > warnings.warn(msg) We'd discussed some other options like including setuptools and installing it, ignoring the warning, or ignoring it since it is not problematic. My lazy side says ignoring it is fine, but if you want to explicitly turn it off we can use this around the setup call: with warnings.catch_warnings(): warnings.simplefilter("ignore") Happy to handle it however you prefer but I'd love to get this in, Brad From p.j.a.cock at googlemail.com Thu Nov 17 07:24:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 12:24:42 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: <87hb23ezm8.fsf@fastmail.fm> References: <87hb23ezm8.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman wrote: > > Peter; > >> Something I've been working on this month in discussion with Leighton >> is some enhancements to GenomeDiagram, driven partly by a figure >> I wanted to draw for a paper. The code is here, >> https://github.com/peterjc/biopython/tree/gd-links > > Awesome. The direction you are pushing this is great. I'd definitely > love to see this in the next release. Cool. It will end up being a graphics heavy release at this rate :) >> Also try the file Doc/example/ACT_example.py which mimics >> a simple two-reference ACT diagram: >> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py >> >> Simple linear output (split into three fragments) shown here: >> http://twitter.com/#!/pjacock/status/136509137826754560 > > Really nice. My only suggestion would be to combine the examples and > outputs together in the Cookbook. One of the best ways to learn plotting > and drawing packages is by looking through examples, finding one that > most closely matches what you want, and then iterating until you get at > what you need. Unless I can find a nicer small sample dataset (or make one) which includes an inversion, I plan to use that ACT sample data in the tutorial - basically taking the user though the ACT_example.py script. Peter From p.j.a.cock at googlemail.com Thu Nov 17 07:45:54 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 12:45:54 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87d3crez7i.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 12:00 PM, Brad Chapman wrote: > > Peter and Eric; > I wanted to follow up about the patch to automate Biopython installs > from easy_install and pip when NumPu is not present: > > https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b > > You'd both reviewed it, and the only holdup was a warning message when > setuptools is not installed: > >> $ jython setup.py install >> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning: >> Unknown distribution option: 'install_requires' >> ? warnings.warn(msg) > > We'd discussed some other options like including setuptools and > installing it, ignoring the warning, or ignoring it since it is not > problematic. > > My lazy side says ignoring it is fine, but if you want to explicitly > turn it off we can use this around the setup call: > > with warnings.catch_warnings(): > ? ?warnings.simplefilter("ignore") > > Happy to handle it however you prefer but I'd love to get this in, > Brad How about this to avoid the warning by not passing the argument? https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e Note I rebased to the current master. If you and Eric are happy with that, I guess we can check it in and see how the build slaves like it... Peter From chapmanb at 50mail.com Thu Nov 17 08:56:41 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 08:56:41 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> Message-ID: <87aa7ug8di.fsf@fastmail.fm> Peter; > > I wanted to follow up about the patch to automate Biopython installs > > from easy_install and pip when NumPu is not present: [...] > How about this to avoid the warning by not passing the argument? > https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e That works great, thanks for looking at this. Having this in the next release will be a big help for scripts using install_requires. Brad From redmine at redmine.open-bio.org Thu Nov 17 09:10:30 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 17 Nov 2011 14:10:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Missing alignments reported here: https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00001.html Missing colon reported here: https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00004.html ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Nov 17 09:13:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 14:13:11 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87aa7ug8di.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman wrote: > > Peter; > >> > I wanted to follow up about the patch to automate Biopython installs >> > from easy_install and pip when NumPu is not present: > [...] >> How about this to avoid the warning by not passing the argument? >> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e > > That works great, thanks for looking at this. Having this in the next > release will be a big help for scripts using install_requires. > > Brad OK, I'll put that on the trunk then - thanks Brad. Peter From p.j.a.cock at googlemail.com Thu Nov 17 10:10:34 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 15:10:34 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: References: <87hb23ezm8.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 12:24 PM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman wrote: >> >> Peter; >> >>> Something I've been working on this month in discussion with Leighton >>> is some enhancements to GenomeDiagram, driven partly by a figure >>> I wanted to draw for a paper. The code is here, >>> https://github.com/peterjc/biopython/tree/gd-links >> >> Awesome. The direction you are pushing this is great. I'd definitely >> love to see this in the next release. > > Cool. It will end up being a graphics heavy release at this rate :) > Committed to trunk, https://github.com/biopython/biopython/commit/980791237330923706e4dc4901bb6794d3222d0e >>> Also try the file Doc/example/ACT_example.py which mimics >>> a simple two-reference ACT diagram: >>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py >>> >>> Simple linear output (split into three fragments) shown here: >>> http://twitter.com/#!/pjacock/status/136509137826754560 >> >> Really nice. My only suggestion would be to combine the examples and >> outputs together in the Cookbook. One of the best ways to learn plotting >> and drawing packages is by looking through examples, finding one that >> most closely matches what you want, and then iterating until you get at >> what you need. > > Unless I can find a nicer small sample dataset (or make one) which > includes an inversion, I plan to use that ACT sample data in the > tutorial - basically taking the user though the ACT_example.py > script. I plan to do another OBF blog entry on this as well, probably with the same example. Peter From p.j.a.cock at googlemail.com Thu Nov 17 10:12:55 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 15:12:55 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 2:13 PM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman wrote: >> >> Peter; >> >>> > I wanted to follow up about the patch to automate Biopython installs >>> > from easy_install and pip when NumPu is not present: >> [...] >>> How about this to avoid the warning by not passing the argument? >>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e >> >> That works great, thanks for looking at this. Having this in the next >> release will be a big help for scripts using install_requires. >> >> Brad > > OK, I'll put that on the trunk then - thanks Brad. > > Peter That all looks fine with the buildslaves, but the real testing will be with random end user machines. Brad, could you write a snippet for the NEWS file about this? Basically when using setuptools to install Biopython it will list NumPy as a dependency (except on Jython and PyPy) and thus install it if not present already? Peter From chapmanb at 50mail.com Thu Nov 17 10:51:01 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 10:51:01 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: <877h2yg32y.fsf@fastmail.fm> Peter; > That all looks fine with the buildslaves, but the real > testing will be with random end user machines. > > Brad, could you write a snippet for the NEWS file about > this? Basically when using setuptools to install Biopython > it will list NumPy as a dependency (except on Jython > and PyPy) and thus install it if not present already? Great, glad that is working without any problems. I added a bit to the news about the functionality and usage. Thanks again for the help, Brad From anaryin at gmail.com Thu Nov 17 18:16:56 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 18 Nov 2011 00:16:56 +0100 Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a generic function? In-Reply-To: References: Message-ID: Hey all, My laptop decided to die on me the last week... I added a very simple and small example to the docstring, in line with all the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can cherry-pick it? Best, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2011/10/27 Jo?o Rodrigues > Sure thing. The docstring is actually pretty explicit, it's just missing > the part that you can get the matrices from SubsMat. Or at least, not that > clear. I'll go over it this weekend, maybe earlier. > > Best, > > Jo?o > From p.j.a.cock at googlemail.com Fri Nov 18 05:37:23 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 18 Nov 2011 10:37:23 +0000 Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a generic function? In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 11:16 PM, Jo?o Rodrigues wrote: > Hey all, > My laptop decided to die on me the last week... > I added a very simple and small example to the docstring, in line with all > the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can > cherry-pick it? > Best, > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao Cherry-picked, and updated the existing examples to make them into functional doctests, and call them from the test suite. Thanks. Peter From redmine at redmine.open-bio.org Mon Nov 21 09:35:37 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 21 Nov 2011 14:35:37 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List We'd got most things working or skipped gracefully under PyPy 1.6 and we're in almost the same situation for PyPy 1.7 I just fixed a break under PyPy 1.7 where we assumed set order, https://github.com/biopython/biopython/commit/d6a3fce2d03d6e613600abec4d837c8c7b929f6f >From test_Entrez.py under PyPy 1.6 we hit https://bugs.pypy.org/issue914 which is fixed in PyPy 1.7 but I'm now hitting https://bugs.pypy.org/issue933 instead. Note that "import numpy" has been replaced with "import numpypy" in PyPy 1.7, so if we may decide not to support PyPy 1.6 that hassle goes away. Still issues with test_Pathway.py, test_Restriction.py (and also test_CAPS.py) and a whole load of "Too many open files" - probably due to leaking handles and different garbage collection. ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Nov 21 09:37:54 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 21 Nov 2011 14:37:54 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. Point of clarification - the code on the Biopython trunk is deliberately skipping all our C extensions under PyPy (and Jython). We may want to start gradually enabling those if possible - but getting the pure Python code all working first seems like a sensible strategy. ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 22 06:30:58 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 22 Nov 2011 11:30:58 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. I have deprecated Bio.Pathway.Rep.HashSet and switched Bio.Pathway.Rep.Graph to use Python's built in set instead. This means test_Pathway.py now passes under PyPy 1.6 and 1.7, https://github.com/biopython/biopython/commit/cbc7c875448a9a57a4cdcbecbc01bcf6b115da69 ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Nov 22 07:22:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Nov 2011 12:22:21 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <877h2yg32y.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 3:51 PM, Brad Chapman wrote: > > Great, glad that is working without any problems. I added a bit to the > news about the functionality and usage. Thanks again for the help, > Brad > I've noticed a probably regression on my Mac, $ python setup.py install running install running build running build_py running build_ext running install_lib running install_egg_info running egg_info writing biopython.egg-info/PKG-INFO writing top-level names to biopython.egg-info/top_level.txt writing dependency_links to biopython.egg-info/dependency_links.txt reading manifest file 'biopython.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files found matching 'Tests/Graphics/*.png' warning: no previously-included files matching '*' found under directory 'Tests/UnitTests' warning: no previously-included files matching '.gitignore' found under directory '*' writing manifest file 'biopython.egg-info/SOURCES.txt' removing '/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info' (and everything under it) Copying biopython.egg-info to /Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info running install_scripts I never used to get these manifest warnings during a simple "python setup.py install" (but I recall seeing them during the official build process under Linux when we do the manifest step). We could tweak the manifest file I guess... Peter From chapmanb at 50mail.com Tue Nov 22 20:15:08 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 22 Nov 2011 20:15:08 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> Message-ID: <87sjlfhc6b.fsf@fastmail.fm> Peter; > I've noticed a probably regression on my Mac, > > $ python setup.py install [.,.] > warning: no previously-included files found matching 'Tests/Graphics/*.png' > warning: no previously-included files matching '*' found under These look like warnings from setuptools about excluding some files that aren't actually present or included. Apparently distutils silently ignores them. I cleaned up the MANIFEST.in to reduce these. Thanks for spotting this, Brad From p.j.a.cock at googlemail.com Wed Nov 23 04:14:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Nov 2011 09:14:21 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87sjlfhc6b.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> <87sjlfhc6b.fsf@fastmail.fm> Message-ID: On Wed, Nov 23, 2011 at 1:15 AM, Brad Chapman wrote: > > Peter; > >> I've noticed a probably regression on my Mac, >> >> $ python setup.py install > [.,.] >> warning: no previously-included files found matching 'Tests/Graphics/*.png' >> warning: no previously-included files matching '*' found under > > These look like warnings from setuptools about excluding some files that > aren't actually present or included. Apparently distutils silently > ignores them. That was my guess. > I cleaned up the MANIFEST.in to reduce these. Thanks for > spotting this, > Brad Thanks, Peter From p.j.a.cock at googlemail.com Thu Nov 24 06:54:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Nov 2011 11:54:35 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) Message-ID: Dear all, Aside from a problem with leaking handles, the remaining problem with Biopython's test suite under PyPy is in Bio.Restriction, specifically this line in the RestrictionType class __init__ method, super(RestrictionType, cls).__init__(cls, name, bases, dct) Here is the error under PyPy 1.7 (same with PyPy 1.6), $ pypy Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49) [PyPy 1.7.0 with GCC 4.0.1] on darwin Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: `` no, normal work is so much less tiring than vacations'' >>>> from Bio import Restriction Traceback (most recent call last): File "", line 1, in File "Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "Bio/Restriction/Restriction.py", line 2404, in newenz = T(k, bases, enzymedict[k]) File "Bio/Restriction/Restriction.py", line 241, in __init__ super(RestrictionType, cls).__init__(cls, name, bases, dct) TypeError: unbound method __init__() must be called with BssMI instance as first argument (got RestrictionType instance instead) >>>> quit() Note that we had to tweak the super call to get this to work under Python 2.6, http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004369.html https://github.com/biopython/biopython/commit/11332d6d4951406f3cc001cea41ea75fce177f89 It used to be: super(RestrictionType, cls).__init__(name, bases, dct) PyPy doesn't like that either, $ pypy Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49) [PyPy 1.7.0 with GCC 4.0.1] on darwin Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: ``"3 + 3 = 8" - Anto in the JIT talk'' >>>> from Bio import Restriction Traceback (most recent call last): File "", line 1, in File "Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "Bio/Restriction/Restriction.py", line 2405, in newenz = T(k, bases, enzymedict[k]) File "Bio/Restriction/Restriction.py", line 242, in __init__ super(RestrictionType, cls).__init__(name, bases, dct) TypeError: unbound method __init__() must be called with BssMI instance as first argument (got str instance instead) >>>> What I find interesting is if we comment out the super call, everything seems to work - test_Restriction.py and test_CAPS.py pass under PyPy, Jython, Python 2, and Python 3. I'm tempted to just do that - but I don't fully understand what is going on and why. Can anyone throw some light on this? Thanks, Peter From chapmanb at 50mail.com Thu Nov 24 10:33:57 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 24 Nov 2011 10:33:57 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: Message-ID: <87lir5wn4q.fsf@fastmail.fm> Peter; > Aside from a problem with leaking handles, Is this from tempfile.mkstemp? This has tricked me an annoying number of times, so I eventually wrote a wrapper. The trick is doing an os.close on the file descriptor: https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118 > the remaining problem > with Biopython's test suite under PyPy is in Bio.Restriction, > specifically this line in the RestrictionType class __init__ method, > > super(RestrictionType, cls).__init__(cls, name, bases, dct) That seems strange: the __init__ is calling super on itself. You'd normally expect this from a derived class. I'm not sure why this doesn't trigger an infinite recursion initializing the object. I'm +1 on commenting it out. Brad From p.j.a.cock at googlemail.com Fri Nov 25 06:40:49 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 25 Nov 2011 11:40:49 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: <87lir5wn4q.fsf@fastmail.fm> References: <87lir5wn4q.fsf@fastmail.fm> Message-ID: On Thu, Nov 24, 2011 at 3:33 PM, Brad Chapman wrote: > > Peter; > >> Aside from a problem with leaking handles, > > Is this from tempfile.mkstemp? This has tricked me an annoying number of > times, so I eventually wrote a wrapper. The trick is doing an os.close > on the file descriptor: > > https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118 Possibly in test_PDB.py but there are other handle leaks. >> the remaining problem >> with Biopython's test suite under PyPy is in Bio.Restriction, >> specifically this line in the RestrictionType class __init__ method, >> >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > That seems strange: the __init__ is calling super on itself. You'd > normally expect this from a derived class. I'm not sure why this > doesn't trigger an infinite recursion initializing the object. I'm +1 > on commenting it out. > > Brad I suppose we could be cautious and skip that line under PyPy only. How about that as a compromise - that way if is really is important for something not covered in the unit test, we only break it under PyPy, but C Python and Jython would be fine? Peter From chapmanb at 50mail.com Fri Nov 25 20:24:25 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 25 Nov 2011 20:24:25 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> Message-ID: <8762i7he0m.fsf@fastmail.fm> Peter; > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > > > That seems strange: the __init__ is calling super on itself. You'd > > normally expect this from a derived class. I'm not sure why this > > doesn't trigger an infinite recursion initializing the object. I'm +1 > > on commenting it out. > I suppose we could be cautious and skip that line under PyPy > only. How about that as a compromise - that way if is really > is important for something not covered in the unit test, we only > break it under PyPy, but C Python and Jython would be fine? My vote would be to comment it out generally instead of if_pypy flags. I don't want to break anything, but if we do I'd rather find out straight away instead of chasing down platform specific bugs later. I'd be happy to hear other's opinions, especially if they ynderstand the super magic going on. Brad From eric.talevich at gmail.com Fri Nov 25 22:00:04 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 25 Nov 2011 22:00:04 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: <8762i7he0m.fsf@fastmail.fm> References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman wrote: > > Peter; > > > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > > > > > That seems strange: the __init__ is calling super on itself. You'd > > > normally expect this from a derived class. I'm not sure why this > > > doesn't trigger an infinite recursion initializing the object. I'm +1 > > > on commenting it out. > > > I suppose we could be cautious and skip that line under PyPy > > only. How about that as a compromise - that way if is really > > is important for something not covered in the unit test, we only > > break it under PyPy, but C Python and Jython would be fine? > > My vote would be to comment it out generally instead of if_pypy > flags. I don't want to break anything, but if we do I'd rather find out > straight away instead of chasing down platform specific bugs later. I'd > be happy to hear other's opinions, especially if they ynderstand the > super magic going on. > > I support that, and maybe we can add some more unit tests to see if we can find out what breaks, if anything. Looking at the Bio/Restriction/Restriction.py, I can suggest these candidates: 1. In the implementation of the class RestrictionType, a few of the magic methods use the test "if isinstance(other, RestrictionType)" -- can you see any way these might break without the super().__init__ call? 2. Other classes in the same file derive from RestrictionType, but don't define their own __init__ methods (e.g. AbstractCut, and indirectly NoCut, OneCut, etc.). All the methods seem to be class methods, also. (NB: maybe use the @classmethod decorator everywhere for clarity.) As far as I can tell, the unit test only uses class methods on EciRI, not any instance methods -- if I'm reading that right, then maybe there should be a unit test that hits that. This and #1 can be done at the same time with the magic methods __add__, __ne__ and __gt__, for example. 3. In Bio/Restriction/__init__.py, I see this comment: When testing for the presence of a Restriction enzyme in a RestrictionBatch, the user can use: 1) a class of type 'RestrictionType' 2) a string of the name of the enzyme (it's repr) i.e: >>> from Bio.Restriction import RestrictionBatch, EcoRI >>> MyBatch = RestrictionBatch(EcoRI) >>> #!/usr/bin/env python >>> EcoRI in MyBatch # the class EcoRI. True >>> >>> 'EcoRI' in MyBatch # a string representation True I don't see this included in the unit test, test_Restriction.py. I don't think the super().__init__ combo has anything to do with this feature, but maybe it should be tested anyway, since it relies on some substantial magic. -Eric From p.j.a.cock at googlemail.com Sat Nov 26 08:38:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 26 Nov 2011 13:38:26 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Saturday, November 26, 2011, Eric Talevich wrote: > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman wrote: >> >> Peter; >> >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) >> > > >> > > That seems strange: the __init__ is calling super on itself. You'd >> > > normally expect this from a derived class. I'm not sure why this >> > > doesn't trigger an infinite recursion initializing the object. I'm +1 >> > > on commenting it out. >> >> > I suppose we could be cautious and skip that line under PyPy >> > only. How about that as a compromise - that way if is really >> > is important for something not covered in the unit test, we only >> > break it under PyPy, but C Python and Jython would be fine? >> >> My vote would be to comment it out generally instead of if_pypy >> flags. I don't want to break anything, but if we do I'd rather find out >> straight away instead of chasing down platform specific bugs later. I'd >> be happy to hear other's opinions, especially if they ynderstand the >> super magic going on. >> > > I support that, and maybe we can add some more unit tests to > see if we can find out what breaks, if anything. OK > Looking at the Bio/Restriction/Restriction.py, I can suggest these > candidates: Great - do you want to try to turn those into unit tests? Thanks, Peter From eric.talevich at gmail.com Sat Nov 26 14:49:35 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 26 Nov 2011 14:49:35 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Sat, Nov 26, 2011 at 8:38 AM, Peter Cock wrote: > On Saturday, November 26, 2011, Eric Talevich > wrote: > > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman > wrote: > >> > >> Peter; > >> > >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > >> > > > >> > > That seems strange: the __init__ is calling super on itself. You'd > >> > > normally expect this from a derived class. I'm not sure why this > >> > > doesn't trigger an infinite recursion initializing the object. I'm > +1 > >> > > on commenting it out. > >> > >> > I suppose we could be cautious and skip that line under PyPy > >> > only. How about that as a compromise - that way if is really > >> > is important for something not covered in the unit test, we only > >> > break it under PyPy, but C Python and Jython would be fine? > >> > >> My vote would be to comment it out generally instead of if_pypy > >> flags. I don't want to break anything, but if we do I'd rather find out > >> straight away instead of chasing down platform specific bugs later. I'd > >> be happy to hear other's opinions, especially if they ynderstand the > >> super magic going on. > >> > > > > I support that, and maybe we can add some more unit tests to > > see if we can find out what breaks, if anything. > > OK > > > > Looking at the Bio/Restriction/Restriction.py, I can suggest these > > candidates: > > Great - do you want to try to turn those into unit tests? > > Sure thing. Here's the relevant commit: https://github.com/biopython/biopython/commit/eb1c163909801731dc0a3d7fbcb2ee514f212da3 Unit tests for most of the magic methods were already there, I just didn't notice them earlier. I also commented out the offending line in Restriction.py and stirred the code a bit in that file and in the test suite. I tested with Python 2.7 and Pypy 1.7 on Ubuntu; we'll see what the build bots say now. -Eric From p.j.a.cock at googlemail.com Tue Nov 1 21:21:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 21:21:31 +0000 Subject: [Biopython-dev] TogoWS in Biopython? Message-ID: Dear all, Would someone like to review the TogoWS code I have written to access the Togo Web Service's REST API please? http://togows.dbcls.jp/ http://togows.dbcls.jp/site/en/rest.html http://dx.doi.org/doi:10.1093/nar/gkq386 This provides a nice simple URL based API for fetching database entries in various formats (XML, JSON, GenBank etc - even some individual fields from some database records, e.g. the accession of a GenBank record), searching, and even some file format conversion (which uses a range of tools on their server, some in BioRuby and others in BioPerl I believe). The code is on this branch, https://github.com/peterjc/biopython/tree/togows See module Bio.TogoWS and its docstrings, https://github.com/peterjc/biopython/blob/togows/Bio/TogoWS/__init__.py Unit tests in Tests/test_TogoWS.py https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py I have be guided by the naming we've used in Bio.Entrez for accessing the NCBI Entrez API. Note that in addition to major Japanese databases, TogoWS also proxies and caches data from Europe (e.g. UniProt) and America (e.g. GenBank and PubMed). It was very fast when testing from Japan this summer - not quite so speedy from the UK though ;) Personally I found TogoWS much easier to use for searching and retrieving batches of records than the NCBI Entrez API with its complicated history requirement. I expect it to be particularly popular with Biopython uses in Japan. Thanks in advance, Peter From p.j.a.cock at googlemail.com Tue Nov 1 21:27:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 1 Nov 2011 21:27:15 +0000 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: References: Message-ID: On Tue, Nov 1, 2011 at 9:21 PM, Peter Cock wrote: > Dear all, > > Would someone like to review the TogoWS code I have written > to access the Togo Web Service's REST API please? > > ... > > Unit tests in Tests/test_TogoWS.py > https://github.com/peterjc/biopython/blob/togows/Tests/test_TogoWS.py P.S. Some of the test are a little bit slow right now, so we can comment some out as part of merging this to the trunk. Peter From chapmanb at 50mail.com Wed Nov 2 12:19:58 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 02 Nov 2011 08:19:58 -0400 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: References: Message-ID: <8762j2iump.fsf@fastmail.fm> Peter; > > Would someone like to review the TogoWS code I have written > > to access the Togo Web Service's REST API please? This looks great and the tests are all passing for me. My only small suggestion would be to avoid hardcoding 'http://togows.dbcls.jp' everywhere. I'd stick this as a top level variable along with the global caches and reference it in the code. This way if they ever get any mirrors we could adjust on the fly. Thanks for getting this in, Brad From p.j.a.cock at googlemail.com Wed Nov 2 13:27:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 2 Nov 2011 13:27:25 +0000 Subject: [Biopython-dev] TogoWS in Biopython? In-Reply-To: <8762j2iump.fsf@fastmail.fm> References: <8762j2iump.fsf@fastmail.fm> Message-ID: On Wed, Nov 2, 2011 at 12:19 PM, Brad Chapman wrote: > > Peter; > >> > Would someone like to review the TogoWS code I have written >> > to access the Togo Web Service's REST API please? > > This looks great and the tests are all passing for me. My only small > suggestion would be to avoid hardcoding 'http://togows.dbcls.jp' > everywhere. I'd stick this as a top level variable along with the global > caches and reference it in the code. This way if they ever get any > mirrors we could adjust on the fly. > > Thanks for getting this in, > Brad Good point regarding the URL. I've also realised it will need some tweaks for Python 3 (bytes versus unicode), or at least to skip the unit tests in the short term to avoid hiding real errors on the buildbot. Peter From redmine at redmine.open-bio.org Tue Nov 8 10:17:00 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 8 Nov 2011 10:17:00 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] (New) Failing to parse fasta-m10 format generated by lalign36 Message-ID: Issue #3312 has been reported by gahoo lee. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Nov 8 15:38:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 15:38:32 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) Message-ID: Dear all, We've talking in the past about indexing sequencing in gzipped files, e.g. http://lists.open-bio.org/pipermail/biopython/2010-June/006546.html That discussion concluded that random access into simple GZIP files was not practical, but BGZF (used in BAM) was worth looking into. I wrote some proof of principle code back then: http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html I have recently polished that old code up, and done some benchmarking (using some reasonably large FASTA, Swiss, and UniProt-XML files). Please read this blog post: http://blastedbio.blogspot.com/ I think random access to sequences compressed with BGZF is fast enough to be useful practically (while confirming this is not true for large gzipped files). I've also put this idea forward on SEQanswers, http://seqanswers.com/forums/showthread.php?t=15347 The cleaned up BGZF code is on the following branch: https://github.com/peterjc/biopython/tree/bgzf This adds a new module Bio.bgzf (position in namespace open to debate) which provides read/write handles to BGZF files - trying to follow the API used in the Python gzip library. I then use the new BGZF reader (with its special seek/tell offsets) from within Bio.SeqIO's index functionality. I've been doing testing with Bio.SeqIO.index(...) only so far, but it should work fine with Bio.SeqIO.index_db(...) as well but here the SQLite schema will need a small update to record the compression type for each file. Is anyone interested in testing this out? Note that to produce a BGZF file, you can use the tool bgzip in samtools, or Bio/bgzf.py if run directly at the command line will compress stdin to stdout. Both approaches call zlib internally, and the run time is practically identical. Regards, Peter From p.j.a.cock at googlemail.com Tue Nov 8 15:41:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 15:41:15 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 3:38 PM, Peter Cock wrote: > That discussion concluded that random access into simple GZIP files > was not practical, but BGZF (used in BAM) was worth looking into. > I wrote some proof of principle code back then: > http://lists.open-bio.org/pipermail/biopython/2010-June/006555.html > > I have recently polished that old code up, and done some > benchmarking (using some reasonably large FASTA, Swiss, > and UniProt-XML files). Please read this blog post: > http://blastedbio.blogspot.com/ More precise link to my BGZF post: http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html Peter From bioinformed at gmail.com Tue Nov 8 17:40:36 2011 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 8 Nov 2011 12:40:36 -0500 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: I've added a proper LRU uncompressed block cache to the samtools tabix code, if that would be of any help. It greatly improves performance for many access patterns. (I didn't look to see if you'd already done that in your code.) -Kevin From p.j.a.cock at googlemail.com Tue Nov 8 17:52:59 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 17:52:59 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: > I've added a proper LRU uncompressed block cache to the samtools tabix code, > if that would be of any help. ?It greatly improves performance for many > access patterns. ?(I didn't look to see if you'd already done that in your > code.) > -Kevin Hi Kevin, Is this already in the mainline samtools tabix repository? The current implementation in my Python code just caches the current block - but a simple pool had occurred to me. How many blocks (given each is 64kb) and how best to pick that number isn't obvious to me. Perhaps you can suggest some sensible defaults? In fact, a proper LRU cache would make sense for the handle pool in Bio.SeqIO.index_db(...) as well. Regards, Peter From bioinformed at gmail.com Tue Nov 8 18:11:56 2011 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Tue, 8 Nov 2011 13:11:56 -0500 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote: > On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: > > I've added a proper LRU uncompressed block cache to the samtools tabix > code, > > if that would be of any help. It greatly improves performance for many > > access patterns. (I didn't look to see if you'd already done that in > your > > code.) > > -Kevin > > Hi Kevin, > > Is this already in the mainline samtools tabix repository? > > The current implementation in my Python code just caches the > current block - but a simple pool had occurred to me. How many > blocks (given each is 64kb) and how best to pick that number > isn't obvious to me. Perhaps you can suggest some sensible > defaults? > > In fact, a proper LRU cache would make sense for the handle > pool in Bio.SeqIO.index_db(...) as well. > > Hi Peter, There is a random-eviction cache implemented in the mainline that is okay, but it is turned off by default and, if enabled, can be very inefficient if it keeps evicting your most active blocks. Converting the cache it to LRU was very simple and I've been using it locally for some time now, but I haven't had time to send the changes on to Heng Li. I choose the size of the cache based on the application and access patterns. For roughly sequential sequence queries (a la samtools faidx or Pysam Fastafile), all one needs is a handful of active blocks (say 16). When repeated querying tabix files via pysam, I typically use 128 blocks for the best trade-off between memory and performance. Choosing a cache size for BAM files is much more complicated and I have a wide-range of setting depending on how many parallel BAM streams and access patterns are employed. The cache size numbers needed to be quite a bit larger before switching to LRU (which was a bit surprising). However, using even a small cache is vastly beneficial for many access patterns. The cost of re-reading a block from disk can be mitigated by the OS filesystem cache, but the decompression step takes non-trivial CPU time and can be triggered dozens of hundreds of times per block for some sensible-seeming access patterns. -Kevin From p.j.a.cock at googlemail.com Tue Nov 8 18:28:04 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 8 Nov 2011 18:28:04 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 6:11 PM, Kevin Jacobs wrote: > On Tue, Nov 8, 2011 at 12:52 PM, Peter Cock wrote: >> On Tue, Nov 8, 2011 at 5:40 PM, Kevin Jacobs wrote: >> > I've added a proper LRU uncompressed block cache to the >> > samtools tabix code, if that would be of any help. It greatl >> > improves performance for many access patterns. >> >?(I didn't look to see if you'd already done that in your >> > code.) >> > -Kevin >> >> Hi Kevin, >> >> Is this already in the mainline samtools tabix repository? >> >> The current implementation in my Python code just caches the >> current block - but a simple pool had occurred to me. How many >> blocks (given each is 64kb) and how best to pick that number >> isn't obvious to me. Perhaps you can suggest some sensible >> defaults? >> >> In fact, a proper LRU cache would make sense for the handle >> pool in Bio.SeqIO.index_db(...) as well. >> > > Hi Peter, > > There is a random-eviction cache implemented in the mainline that is okay, > but it is turned off by default and, if enabled, can be very inefficient if > it keeps evicting your most active blocks. ?Converting the cache it to LRU > was very simple and I've been using it locally for some time now, but I > haven't had time to send the changes on to Heng Li. Are your changes on github or somewhere public? Heng Li has the core samtools bit of the samtools SVN on github, which he seems to use for experimental new code: https://github.com/lh3/samtools > I choose the size of the cache based on the application and access patterns. > ?For roughly sequential sequence queries (a la samtools faidx or Pysam > Fastafile), all one needs is a handful of active blocks (say 16). ?When > repeated querying tabix files via pysam, I typically use 128 blocks for the > best trade-off between memory and performance. ?Choosing a cache size for > BAM files is much more complicated and I have a wide-range of setting > depending on how many parallel BAM streams and access patterns are employed. > The cache size numbers needed to be quite a bit larger before switching to > LRU (which was a bit surprising). ?However, using even a small cache is > vastly beneficial for many access patterns. ? The cost of re-reading a block > from disk can be mitigated by the OS filesystem cache, but the decompression > step takes non-trivial CPU time and can be triggered dozens of hundreds of > times per block for some sensible-seeming access patterns. > -Kevin Certainly useful food for thought - thank you. I agree that the OS will probably cache commonly used BGZF blocks in the filesystem cache, but it doesn't solve the CPU overhead of decompression. In the case of Bio.SeqIO.index(...) which accesses one file, and Bio.SeqIO.index_db(...) which may access several files, we currently don't offer any end user options like this. However, there is an internal option for the max number of handles, and a similar option could control the number of BGZF blocks to cache. I could try 100 blocks (100 times 64kb is about 6MB) as the default, and redo the UniProt timings (random access to sequences). That might be a good compromise, given the SeqIO indexing code has no easy way to know the calling code's usage patterns. As I said on the blog post, we should be able to improve the speed of the BGZF random access - this idea alone could make a big difference, although probably a naive block cache (rather than LRU) would be a worthwhile step in itself. Regards, Peter From p.j.a.cock at googlemail.com Wed Nov 9 19:53:52 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 9 Nov 2011 19:53:52 +0000 Subject: [Biopython-dev] Fwd: Bug in DSSP.py In-Reply-To: References: Message-ID: FYI, hopefully someone uses DSSP. ---------- Forwarded message ---------- From: Austin Meyer Date: Tuesday, November 8, 2011 Subject: Bug in DSSP.py To: biopython-owner at lists.open-bio.org Ahoy, I have no idea how to contribute code so I thought I would pass this along. The newest DSSP adds a citation section for the first two lines, and a blank third line in it's output file. The parser reads each line one at a time, splits it, then looks at the second element of the resulting list. As the blank line has only one element, there is an index out of range failure that occurs. This error does not happen with the older DSSP version. A quick fix checks the length of the list prior to looking at it's elements. Thus at line 121 in the DSSP.py file, just after the sl = l.split(), this will fix the problem: *if len(sl) < 2: > continue* > The whole function will look like so: *def make_dssp_dict(filename): > """ > Return a DSSP dictionary that maps (chainid, resid) to > aa, ss and accessibility, from a DSSP file. > > @param filename: the DSSP output file > @type filename: string > """ > dssp = {} > handle = open(filename, "r") > try: > start = 0 > keys = [] > for l in handle.readlines(): > sl = l.split() > if len(sl) < 2: > continue > if sl[1] == "RESIDUE": > # Start parsing from here > start = 1 > continue > if not start: > continue > if l[9] == " ": > # Skip -- missing residue > continue > resseq = int(l[5:10]) > icode = l[10] > chainid = l[11] > aa = l[13] > ss = l[16] > if ss == " ": > ss = "-" > try: > acc = int(l[34:38]) > phi = float(l[103:109]) > psi = float(l[109:115]) > except ValueError, exc: > # DSSP output breaks its own format when there are >9999 > # residues, since only 4 digits are allocated to the seq > num > # field. See 3kic chain T res 321, 1vsy chain T res 6077. > # Here, look for whitespace to figure out the number of > extra > # digits, and shift parsing the rest of the line by that > amount. > if l[34] != ' ': > shift = l[34:].find(' ') > acc = int((l[34+shift:38+shift])) > phi = float(l[103+shift:109+shift]) > psi = float(l[109+shift:115+shift]) > else: > raise ValueError, exc > res_id = (" ", resseq, icode) > dssp[(chainid, res_id)] = (aa, ss, acc, phi, psi) > keys.append((chainid, res_id)) > finally: > handle.close() > return dssp, keys > * Thanks, -- Austin Meyer From p.j.a.cock at googlemail.com Thu Nov 10 00:01:19 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 10 Nov 2011 00:01:19 +0000 Subject: [Biopython-dev] Indexing sequences compressed with BGZF (Blocked GNU Zip Format) In-Reply-To: References: Message-ID: On Tue, Nov 8, 2011 at 6:28 PM, Peter Cock wrote: >> I choose the size of the cache based on the application and access patterns. >> ?For roughly sequential sequence queries (a la samtools faidx or Pysam >> Fastafile), all one needs is a handful of active blocks (say 16). ?When >> repeated querying tabix files via pysam, I typically use 128 blocks for the >> best trade-off between memory and performance. ?Choosing a cache size for >> BAM files is much more complicated and I have a wide-range of setting >> depending on how many parallel BAM streams and access patterns are employed. >> The cache size numbers needed to be quite a bit larger before switching to >> LRU (which was a bit surprising). ?However, using even a small cache is >> vastly beneficial for many access patterns. ? The cost of re-reading a block >> from disk can be mitigated by the OS filesystem cache, but the decompression >> step takes non-trivial CPU time and can be triggered dozens of hundreds of >> times per block for some sensible-seeming access patterns. >> -Kevin > > Certainly useful food for thought - thank you. I agree that the OS > will probably cache commonly used BGZF blocks in the filesystem > cache, but it doesn't solve the CPU overhead of decompression. > > In the case of Bio.SeqIO.index(...) which accesses one file, and > Bio.SeqIO.index_db(...) which may access several files, we currently > don't offer any end user options like this. However, there is an internal > option for the max number of handles, and a similar option could > control the number of BGZF blocks to cache. I could try 100 > blocks (100 times 64kb is about 6MB) as the default, and redo > the UniProt timings (random access to sequences). > > That might be a good compromise, given the SeqIO indexing code > has no easy way to know the calling code's usage patterns. I've tried a cache of up to 100 BGZF blocks which are cleared "randomly" and it doesn't make a noticeable difference to my UniProt benchmark, which is a shame but not actually very surprising. After all, that is deliberately accessing the records (and thus the blocks) in a random order, and the files contain far far more than 100 blocks. I'll need a more realistic test case to properly evaluate the cache. One example that comes to mind is iterating over BAM reads (which would look at blocks sequentially) but also jumping to look at the partner reads (paired end etc) and then back again. Peter P.S. When I said "random", what I'm actually using is a Python dictionary keyed on the start offset, and the dictionary's itempop method to remove a cached block "at random" once I have got 100 blocks in memory and need to free one. Of course, this isn't really random, it is arbitrary and likely Python implementation dependent. From redmine at redmine.open-bio.org Thu Nov 10 10:10:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 10:10:06 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List Thank you - I can reproduce this on the latest Biopython in our repository. May we include your sample file in Biopython as a unit test please? ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 11:10:23 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 11:10:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. Sure. My pleasure. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 11:34:39 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 11:34:39 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
Curious. It seems LALIGN is starting to write out another alignment, but then doesn't. It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines. I have updated Biopython to give a more helpful error message in this case: https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
...     print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
Are you on Bill Pearson's FASTA mailing list? We should report this. Peter ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 13:13:55 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 13:13:55 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. File 3Seqs.zip added Well, I'm not on the FASTA mailing list. In fact I found a small bug in mshowalign2.c which a colon is missing on line 616, just don't know how to join the mailing list. Here's the FASTA output with 3 sequences alignment, I hope these file would help. The odd lines changed in the output. ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Nov 10 14:33:28 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 10 Nov 2011 14:33:28 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. The link has changed slightly, but the mailing list is here: https://lists.virginia.edu/sympa/info/fasta_list ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Fri Nov 11 01:42:59 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 11 Nov 2011 01:42:59 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by gahoo lee. Oh, I got it. Did you report this problem to FASTA mailing list? ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Wed Nov 16 16:27:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Nov 2011 16:27:46 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram Message-ID: Hi all, Something I've been working on this month in discussion with Leighton is some enhancements to GenomeDiagram, driven partly by a figure I wanted to draw for a paper. The code is here, https://github.com/peterjc/biopython/tree/gd-links First, we can now show links between tracks joining any two features or regions. One use of this is to mimic the output from the Artemis Comparison Tool, ACT, http://www.sanger.ac.uk/resources/software/act/ ACT is great as an exploratory tool, but doesn't let you output a high quality vector image. Related to this, it is useful to be able to "crop" different tracks, since for ACT style comparisons the different sequences are unlikely to be the same length. Therefore each GenomeDiagram track can now have its own start/end positions outside which is doesn't get drawn. This includes some extra unit tests, run test_GenomeDiagram.py and have a look at Graphics/GD_by_obj_*.pdf Also try the file Doc/example/ACT_example.py which mimics a simple two-reference ACT diagram: https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py Simple linear output (split into three fragments) shown here: http://twitter.com/#!/pjacock/status/136509137826754560 Circular version here (in this case deliberately not using a closed circle, but that works too), note the curving links are intentional so as to display very large cross-links nicely: http://twitter.com/#!/pjacock/status/136840628502933505 This demo script should use blue flipped links where the matches are to the reverse strand. I haven't put together a nice example for a proper demonstration of that yet. Perhaps a set of several E. coli genomes would work nicely... I plan to merge this to the trunk, and write some end-use documentation, but would be happy to have someone else look over the code first. Note that the API is intended to be quite low level but very flexible in terms of creating the cross links. You can use transparency (as in the current version of ACT_example.py) or explicitly colour links according to say BLAST bit score. The user also has full control of the z-order, which again allows you to do things like ACT does and put longer matches at the back with short matches at the front, etc. Peter From chapmanb at 50mail.com Thu Nov 17 11:51:11 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 06:51:11 -0500 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: References: Message-ID: <87hb23ezm8.fsf@fastmail.fm> Peter; > Something I've been working on this month in discussion with Leighton > is some enhancements to GenomeDiagram, driven partly by a figure > I wanted to draw for a paper. The code is here, > https://github.com/peterjc/biopython/tree/gd-links Awesome. The direction you are pushing this is great. I'd definitely love to see this in the next release. > Also try the file Doc/example/ACT_example.py which mimics > a simple two-reference ACT diagram: > https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py > > Simple linear output (split into three fragments) shown here: > http://twitter.com/#!/pjacock/status/136509137826754560 Really nice. My only suggestion would be to combine the examples and outputs together in the Cookbook. One of the best ways to learn plotting and drawing packages is by looking through examples, finding one that most closely matches what you want, and then iterating until you get at what you need. Brad From chapmanb at 50mail.com Thu Nov 17 12:00:01 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 07:00:01 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> Message-ID: <87d3crez7i.fsf@fastmail.fm> Peter and Eric; I wanted to follow up about the patch to automate Biopython installs from easy_install and pip when NumPu is not present: https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b You'd both reviewed it, and the only holdup was a warning message when setuptools is not installed: > $ jython setup.py install > /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning: > Unknown distribution option: 'install_requires' > warnings.warn(msg) We'd discussed some other options like including setuptools and installing it, ignoring the warning, or ignoring it since it is not problematic. My lazy side says ignoring it is fine, but if you want to explicitly turn it off we can use this around the setup call: with warnings.catch_warnings(): warnings.simplefilter("ignore") Happy to handle it however you prefer but I'd love to get this in, Brad From p.j.a.cock at googlemail.com Thu Nov 17 12:24:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 12:24:42 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: <87hb23ezm8.fsf@fastmail.fm> References: <87hb23ezm8.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman wrote: > > Peter; > >> Something I've been working on this month in discussion with Leighton >> is some enhancements to GenomeDiagram, driven partly by a figure >> I wanted to draw for a paper. The code is here, >> https://github.com/peterjc/biopython/tree/gd-links > > Awesome. The direction you are pushing this is great. I'd definitely > love to see this in the next release. Cool. It will end up being a graphics heavy release at this rate :) >> Also try the file Doc/example/ACT_example.py which mimics >> a simple two-reference ACT diagram: >> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py >> >> Simple linear output (split into three fragments) shown here: >> http://twitter.com/#!/pjacock/status/136509137826754560 > > Really nice. My only suggestion would be to combine the examples and > outputs together in the Cookbook. One of the best ways to learn plotting > and drawing packages is by looking through examples, finding one that > most closely matches what you want, and then iterating until you get at > what you need. Unless I can find a nicer small sample dataset (or make one) which includes an inversion, I plan to use that ACT sample data in the tutorial - basically taking the user though the ACT_example.py script. Peter From p.j.a.cock at googlemail.com Thu Nov 17 12:45:54 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 12:45:54 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87d3crez7i.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 12:00 PM, Brad Chapman wrote: > > Peter and Eric; > I wanted to follow up about the patch to automate Biopython installs > from easy_install and pip when NumPu is not present: > > https://github.com/chapmanb/biopython/commit/be53d850d721fc82af81bedcd9fb9034b0a2099b > > You'd both reviewed it, and the only holdup was a warning message when > setuptools is not installed: > >> $ jython setup.py install >> /Users/pjcock/jython2.5.2/Lib/distutils/dist.py:263: UserWarning: >> Unknown distribution option: 'install_requires' >> ? warnings.warn(msg) > > We'd discussed some other options like including setuptools and > installing it, ignoring the warning, or ignoring it since it is not > problematic. > > My lazy side says ignoring it is fine, but if you want to explicitly > turn it off we can use this around the setup call: > > with warnings.catch_warnings(): > ? ?warnings.simplefilter("ignore") > > Happy to handle it however you prefer but I'd love to get this in, > Brad How about this to avoid the warning by not passing the argument? https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e Note I rebased to the current master. If you and Eric are happy with that, I guess we can check it in and see how the build slaves like it... Peter From chapmanb at 50mail.com Thu Nov 17 13:56:41 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 08:56:41 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> Message-ID: <87aa7ug8di.fsf@fastmail.fm> Peter; > > I wanted to follow up about the patch to automate Biopython installs > > from easy_install and pip when NumPu is not present: [...] > How about this to avoid the warning by not passing the argument? > https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e That works great, thanks for looking at this. Having this in the next release will be a big help for scripts using install_requires. Brad From redmine at redmine.open-bio.org Thu Nov 17 14:10:30 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 17 Nov 2011 14:10:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36 References: Message-ID: Issue #3312 has been updated by Peter Cock. Missing alignments reported here: https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00001.html Missing colon reported here: https://lists.virginia.edu/sympa/arc/fasta_list/2011-11/msg00004.html ---------------------------------------- Bug #3312: Failing to parse fasta-m10 format generated by lalign36 https://redmine.open-bio.org/issues/3312 Author: gahoo lee Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error. @lalign36 -m 10 at.fasta os.fasta >test.aln@ @from Bio import AlignIO handle = open('test.aln') for a in AlignIO.parse(handle, "fasta-m10"): assert len(a) == 2, "Should be pairwise!" print "Alignment length %i" % a.get_alignment_length() for record in a: print record.seq, record.name, record.id @ @Traceback (most recent call last): File "R:\Untitled 4.py", line 5, in for a in AlignIO.parse(handle, "fasta-m10"): File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse for a in i: File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator yield build_hsp() File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp assert query_tags, query_tags AssertionError: {}@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Nov 17 14:13:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 14:13:11 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87aa7ug8di.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman wrote: > > Peter; > >> > I wanted to follow up about the patch to automate Biopython installs >> > from easy_install and pip when NumPu is not present: > [...] >> How about this to avoid the warning by not passing the argument? >> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e > > That works great, thanks for looking at this. Having this in the next > release will be a big help for scripts using install_requires. > > Brad OK, I'll put that on the trunk then - thanks Brad. Peter From p.j.a.cock at googlemail.com Thu Nov 17 15:10:34 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 15:10:34 +0000 Subject: [Biopython-dev] Cross-links between tracks in GenomeDiagram In-Reply-To: References: <87hb23ezm8.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 12:24 PM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 11:51 AM, Brad Chapman wrote: >> >> Peter; >> >>> Something I've been working on this month in discussion with Leighton >>> is some enhancements to GenomeDiagram, driven partly by a figure >>> I wanted to draw for a paper. The code is here, >>> https://github.com/peterjc/biopython/tree/gd-links >> >> Awesome. The direction you are pushing this is great. I'd definitely >> love to see this in the next release. > > Cool. It will end up being a graphics heavy release at this rate :) > Committed to trunk, https://github.com/biopython/biopython/commit/980791237330923706e4dc4901bb6794d3222d0e >>> Also try the file Doc/example/ACT_example.py which mimics >>> a simple two-reference ACT diagram: >>> https://github.com/peterjc/biopython/blob/gd-links/Doc/examples/ACT_example.py >>> >>> Simple linear output (split into three fragments) shown here: >>> http://twitter.com/#!/pjacock/status/136509137826754560 >> >> Really nice. My only suggestion would be to combine the examples and >> outputs together in the Cookbook. One of the best ways to learn plotting >> and drawing packages is by looking through examples, finding one that >> most closely matches what you want, and then iterating until you get at >> what you need. > > Unless I can find a nicer small sample dataset (or make one) which > includes an inversion, I plan to use that ACT sample data in the > tutorial - basically taking the user though the ACT_example.py > script. I plan to do another OBF blog entry on this as well, probably with the same example. Peter From p.j.a.cock at googlemail.com Thu Nov 17 15:12:55 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 17 Nov 2011 15:12:55 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 2:13 PM, Peter Cock wrote: > On Thu, Nov 17, 2011 at 1:56 PM, Brad Chapman wrote: >> >> Peter; >> >>> > I wanted to follow up about the patch to automate Biopython installs >>> > from easy_install and pip when NumPu is not present: >> [...] >>> How about this to avoid the warning by not passing the argument? >>> https://github.com/peterjc/biopython/commit/78b965f48939c7395aab6e0919b86686443f640e >> >> That works great, thanks for looking at this. Having this in the next >> release will be a big help for scripts using install_requires. >> >> Brad > > OK, I'll put that on the trunk then - thanks Brad. > > Peter That all looks fine with the buildslaves, but the real testing will be with random end user machines. Brad, could you write a snippet for the NEWS file about this? Basically when using setuptools to install Biopython it will list NumPy as a dependency (except on Jython and PyPy) and thus install it if not present already? Peter From chapmanb at 50mail.com Thu Nov 17 15:51:01 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 17 Nov 2011 10:51:01 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> Message-ID: <877h2yg32y.fsf@fastmail.fm> Peter; > That all looks fine with the buildslaves, but the real > testing will be with random end user machines. > > Brad, could you write a snippet for the NEWS file about > this? Basically when using setuptools to install Biopython > it will list NumPy as a dependency (except on Jython > and PyPy) and thus install it if not present already? Great, glad that is working without any problems. I added a bit to the news about the functionality and usage. Thanks again for the help, Brad From anaryin at gmail.com Thu Nov 17 23:16:56 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 18 Nov 2011 00:16:56 +0100 Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a generic function? In-Reply-To: References: Message-ID: Hey all, My laptop decided to die on me the last week... I added a very simple and small example to the docstring, in line with all the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can cherry-pick it? Best, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao 2011/10/27 Jo?o Rodrigues > Sure thing. The docstring is actually pretty explicit, it's just missing > the part that you can get the matrices from SubsMat. Or at least, not that > clear. I'll go over it this weekend, maybe earlier. > > Best, > > Jo?o > From p.j.a.cock at googlemail.com Fri Nov 18 10:37:23 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 18 Nov 2011 10:37:23 +0000 Subject: [Biopython-dev] [Biopython] Pairwise alignment - is it a generic function? In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 11:16 PM, Jo?o Rodrigues wrote: > Hey all, > My laptop decided to die on me the last week... > I added a very simple and small example to the docstring, in line with all > the others. I'm pushing it to my pdb_enhancements branch, maybe Peter can > cherry-pick it? > Best, > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao Cherry-picked, and updated the existing examples to make them into functional doctests, and call them from the test suite. Thanks. Peter From redmine at redmine.open-bio.org Mon Nov 21 14:35:37 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 21 Nov 2011 14:35:37 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List We'd got most things working or skipped gracefully under PyPy 1.6 and we're in almost the same situation for PyPy 1.7 I just fixed a break under PyPy 1.7 where we assumed set order, https://github.com/biopython/biopython/commit/d6a3fce2d03d6e613600abec4d837c8c7b929f6f >From test_Entrez.py under PyPy 1.6 we hit https://bugs.pypy.org/issue914 which is fixed in PyPy 1.7 but I'm now hitting https://bugs.pypy.org/issue933 instead. Note that "import numpy" has been replaced with "import numpypy" in PyPy 1.7, so if we may decide not to support PyPy 1.6 that hassle goes away. Still issues with test_Pathway.py, test_Restriction.py (and also test_CAPS.py) and a whole load of "Too many open files" - probably due to leaking handles and different garbage collection. ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Nov 21 14:37:54 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 21 Nov 2011 14:37:54 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. Point of clarification - the code on the Biopython trunk is deliberately skipping all our C extensions under PyPy (and Jython). We may want to start gradually enabling those if possible - but getting the pure Python code all working first seems like a sensible strategy. ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Nov 22 11:30:58 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 22 Nov 2011 11:30:58 +0000 Subject: [Biopython-dev] [Biopython - Feature #3236] Make Biopython work in PyPy 1.5 References: Message-ID: Issue #3236 has been updated by Peter Cock. I have deprecated Bio.Pathway.Rep.HashSet and switched Bio.Pathway.Rep.Graph to use Python's built in set instead. This means test_Pathway.py now passes under PyPy 1.6 and 1.7, https://github.com/biopython/biopython/commit/cbc7c875448a9a57a4cdcbecbc01bcf6b115da69 ---------------------------------------- Feature #3236: Make Biopython work in PyPy 1.5 https://redmine.open-bio.org/issues/3236 Author: Eric Talevich Status: In Progress Priority: Low Assignee: Biopython Dev Mailing List Category: Target version: URL: PyPy is now roughly as production-ready as Jython: http://morepypy.blogspot.com/2011/04/pypy-15-released-catching-up.html Let's make Biopython work on PyPy 1.5. To make the pure-Python core of Biopython work, I did this: * Download and unpack the pre-compiled Linux tarball from pypy.org * Copy the header file @marshal.h@ from the CPython 2.X installation into the @pypy-c-.../include/@ directory * pypy setup.py build; pypy setup.py install * Delete pypy-c-.../site-packages/Bio/cpairwise2*.so Benchmarking a script that leans heavily on Bio.pairwise2, I see about a 2x speedup between Pypy 1.5 and CPython 2.6 -- yes, that's with the compiled C extension @cpairwise2@ in the CPython 2.6 installation. Numpy isn't available on PyPy yet, and it may be some time before it does. Observations from @pypy setup.py test@: * test_BioSQL triggers tons of RuntimeWarnings related to sqlite3 functions * test_BioSQL_SeqIO fails -- attempts to retrieve P01892 instead of Q29899 (?) * test_Restriction triggers a TypeError, somehow (also causing test_CAPS to err) * test_Entrez fails with many noisy errors -- looks related to expat, may be just my installation * importing @Bio.trie@ fails, probably due to a @marshal.h@ issue with compilation -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Nov 22 12:22:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Nov 2011 12:22:21 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <877h2yg32y.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> Message-ID: On Thu, Nov 17, 2011 at 3:51 PM, Brad Chapman wrote: > > Great, glad that is working without any problems. I added a bit to the > news about the functionality and usage. Thanks again for the help, > Brad > I've noticed a probably regression on my Mac, $ python setup.py install running install running build running build_py running build_ext running install_lib running install_egg_info running egg_info writing biopython.egg-info/PKG-INFO writing top-level names to biopython.egg-info/top_level.txt writing dependency_links to biopython.egg-info/dependency_links.txt reading manifest file 'biopython.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files found matching 'Tests/Graphics/*.png' warning: no previously-included files matching '*' found under directory 'Tests/UnitTests' warning: no previously-included files matching '.gitignore' found under directory '*' writing manifest file 'biopython.egg-info/SOURCES.txt' removing '/Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info' (and everything under it) Copying biopython.egg-info to /Library/Python/2.6/site-packages/biopython-1.58_-py2.6.egg-info running install_scripts I never used to get these manifest warnings during a simple "python setup.py install" (but I recall seeing them during the official build process under Linux when we do the manifest step). We could tweak the manifest file I guess... Peter From chapmanb at 50mail.com Wed Nov 23 01:15:08 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 22 Nov 2011 20:15:08 -0500 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> Message-ID: <87sjlfhc6b.fsf@fastmail.fm> Peter; > I've noticed a probably regression on my Mac, > > $ python setup.py install [.,.] > warning: no previously-included files found matching 'Tests/Graphics/*.png' > warning: no previously-included files matching '*' found under These look like warnings from setuptools about excluding some files that aren't actually present or included. Apparently distutils silently ignores them. I cleaned up the MANIFEST.in to reduce these. Thanks for spotting this, Brad From p.j.a.cock at googlemail.com Wed Nov 23 09:14:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 23 Nov 2011 09:14:21 +0000 Subject: [Biopython-dev] NumPy dialog when Biopython installed from automated programs In-Reply-To: <87sjlfhc6b.fsf@fastmail.fm> References: <871uuhm1fe.fsf@fastmail.fm> <87hb3b51ve.fsf@fastmail.fm> <87d3crez7i.fsf@fastmail.fm> <87aa7ug8di.fsf@fastmail.fm> <877h2yg32y.fsf@fastmail.fm> <87sjlfhc6b.fsf@fastmail.fm> Message-ID: On Wed, Nov 23, 2011 at 1:15 AM, Brad Chapman wrote: > > Peter; > >> I've noticed a probably regression on my Mac, >> >> $ python setup.py install > [.,.] >> warning: no previously-included files found matching 'Tests/Graphics/*.png' >> warning: no previously-included files matching '*' found under > > These look like warnings from setuptools about excluding some files that > aren't actually present or included. Apparently distutils silently > ignores them. That was my guess. > I cleaned up the MANIFEST.in to reduce these. Thanks for > spotting this, > Brad Thanks, Peter From p.j.a.cock at googlemail.com Thu Nov 24 11:54:35 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Nov 2011 11:54:35 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) Message-ID: Dear all, Aside from a problem with leaking handles, the remaining problem with Biopython's test suite under PyPy is in Bio.Restriction, specifically this line in the RestrictionType class __init__ method, super(RestrictionType, cls).__init__(cls, name, bases, dct) Here is the error under PyPy 1.7 (same with PyPy 1.6), $ pypy Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49) [PyPy 1.7.0 with GCC 4.0.1] on darwin Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: `` no, normal work is so much less tiring than vacations'' >>>> from Bio import Restriction Traceback (most recent call last): File "", line 1, in File "Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "Bio/Restriction/Restriction.py", line 2404, in newenz = T(k, bases, enzymedict[k]) File "Bio/Restriction/Restriction.py", line 241, in __init__ super(RestrictionType, cls).__init__(cls, name, bases, dct) TypeError: unbound method __init__() must be called with BssMI instance as first argument (got RestrictionType instance instead) >>>> quit() Note that we had to tweak the super call to get this to work under Python 2.6, http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004369.html https://github.com/biopython/biopython/commit/11332d6d4951406f3cc001cea41ea75fce177f89 It used to be: super(RestrictionType, cls).__init__(name, bases, dct) PyPy doesn't like that either, $ pypy Python 2.7.1 (7773f8fc4223, Nov 18 2011, 22:15:49) [PyPy 1.7.0 with GCC 4.0.1] on darwin Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: ``"3 + 3 = 8" - Anto in the JIT talk'' >>>> from Bio import Restriction Traceback (most recent call last): File "", line 1, in File "Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "Bio/Restriction/Restriction.py", line 2405, in newenz = T(k, bases, enzymedict[k]) File "Bio/Restriction/Restriction.py", line 242, in __init__ super(RestrictionType, cls).__init__(name, bases, dct) TypeError: unbound method __init__() must be called with BssMI instance as first argument (got str instance instead) >>>> What I find interesting is if we comment out the super call, everything seems to work - test_Restriction.py and test_CAPS.py pass under PyPy, Jython, Python 2, and Python 3. I'm tempted to just do that - but I don't fully understand what is going on and why. Can anyone throw some light on this? Thanks, Peter From chapmanb at 50mail.com Thu Nov 24 15:33:57 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 24 Nov 2011 10:33:57 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: Message-ID: <87lir5wn4q.fsf@fastmail.fm> Peter; > Aside from a problem with leaking handles, Is this from tempfile.mkstemp? This has tricked me an annoying number of times, so I eventually wrote a wrapper. The trick is doing an os.close on the file descriptor: https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118 > the remaining problem > with Biopython's test suite under PyPy is in Bio.Restriction, > specifically this line in the RestrictionType class __init__ method, > > super(RestrictionType, cls).__init__(cls, name, bases, dct) That seems strange: the __init__ is calling super on itself. You'd normally expect this from a derived class. I'm not sure why this doesn't trigger an infinite recursion initializing the object. I'm +1 on commenting it out. Brad From p.j.a.cock at googlemail.com Fri Nov 25 11:40:49 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 25 Nov 2011 11:40:49 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: <87lir5wn4q.fsf@fastmail.fm> References: <87lir5wn4q.fsf@fastmail.fm> Message-ID: On Thu, Nov 24, 2011 at 3:33 PM, Brad Chapman wrote: > > Peter; > >> Aside from a problem with leaking handles, > > Is this from tempfile.mkstemp? This has tricked me an annoying number of > times, so I eventually wrote a wrapper. The trick is doing an os.close > on the file descriptor: > > https://github.com/chapmanb/bcbb/blob/master/nextgen/bcbio/utils.py#L118 Possibly in test_PDB.py but there are other handle leaks. >> the remaining problem >> with Biopython's test suite under PyPy is in Bio.Restriction, >> specifically this line in the RestrictionType class __init__ method, >> >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > That seems strange: the __init__ is calling super on itself. You'd > normally expect this from a derived class. I'm not sure why this > doesn't trigger an infinite recursion initializing the object. I'm +1 > on commenting it out. > > Brad I suppose we could be cautious and skip that line under PyPy only. How about that as a compromise - that way if is really is important for something not covered in the unit test, we only break it under PyPy, but C Python and Jython would be fine? Peter From chapmanb at 50mail.com Sat Nov 26 01:24:25 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 25 Nov 2011 20:24:25 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> Message-ID: <8762i7he0m.fsf@fastmail.fm> Peter; > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > > > That seems strange: the __init__ is calling super on itself. You'd > > normally expect this from a derived class. I'm not sure why this > > doesn't trigger an infinite recursion initializing the object. I'm +1 > > on commenting it out. > I suppose we could be cautious and skip that line under PyPy > only. How about that as a compromise - that way if is really > is important for something not covered in the unit test, we only > break it under PyPy, but C Python and Jython would be fine? My vote would be to comment it out generally instead of if_pypy flags. I don't want to break anything, but if we do I'd rather find out straight away instead of chasing down platform specific bugs later. I'd be happy to hear other's opinions, especially if they ynderstand the super magic going on. Brad From eric.talevich at gmail.com Sat Nov 26 03:00:04 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 25 Nov 2011 22:00:04 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: <8762i7he0m.fsf@fastmail.fm> References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman wrote: > > Peter; > > > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > > > > > > That seems strange: the __init__ is calling super on itself. You'd > > > normally expect this from a derived class. I'm not sure why this > > > doesn't trigger an infinite recursion initializing the object. I'm +1 > > > on commenting it out. > > > I suppose we could be cautious and skip that line under PyPy > > only. How about that as a compromise - that way if is really > > is important for something not covered in the unit test, we only > > break it under PyPy, but C Python and Jython would be fine? > > My vote would be to comment it out generally instead of if_pypy > flags. I don't want to break anything, but if we do I'd rather find out > straight away instead of chasing down platform specific bugs later. I'd > be happy to hear other's opinions, especially if they ynderstand the > super magic going on. > > I support that, and maybe we can add some more unit tests to see if we can find out what breaks, if anything. Looking at the Bio/Restriction/Restriction.py, I can suggest these candidates: 1. In the implementation of the class RestrictionType, a few of the magic methods use the test "if isinstance(other, RestrictionType)" -- can you see any way these might break without the super().__init__ call? 2. Other classes in the same file derive from RestrictionType, but don't define their own __init__ methods (e.g. AbstractCut, and indirectly NoCut, OneCut, etc.). All the methods seem to be class methods, also. (NB: maybe use the @classmethod decorator everywhere for clarity.) As far as I can tell, the unit test only uses class methods on EciRI, not any instance methods -- if I'm reading that right, then maybe there should be a unit test that hits that. This and #1 can be done at the same time with the magic methods __add__, __ne__ and __gt__, for example. 3. In Bio/Restriction/__init__.py, I see this comment: When testing for the presence of a Restriction enzyme in a RestrictionBatch, the user can use: 1) a class of type 'RestrictionType' 2) a string of the name of the enzyme (it's repr) i.e: >>> from Bio.Restriction import RestrictionBatch, EcoRI >>> MyBatch = RestrictionBatch(EcoRI) >>> #!/usr/bin/env python >>> EcoRI in MyBatch # the class EcoRI. True >>> >>> 'EcoRI' in MyBatch # a string representation True I don't see this included in the unit test, test_Restriction.py. I don't think the super().__init__ combo has anything to do with this feature, but maybe it should be tested anyway, since it relies on some substantial magic. -Eric From p.j.a.cock at googlemail.com Sat Nov 26 13:38:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 26 Nov 2011 13:38:26 +0000 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Saturday, November 26, 2011, Eric Talevich wrote: > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman wrote: >> >> Peter; >> >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) >> > > >> > > That seems strange: the __init__ is calling super on itself. You'd >> > > normally expect this from a derived class. I'm not sure why this >> > > doesn't trigger an infinite recursion initializing the object. I'm +1 >> > > on commenting it out. >> >> > I suppose we could be cautious and skip that line under PyPy >> > only. How about that as a compromise - that way if is really >> > is important for something not covered in the unit test, we only >> > break it under PyPy, but C Python and Jython would be fine? >> >> My vote would be to comment it out generally instead of if_pypy >> flags. I don't want to break anything, but if we do I'd rather find out >> straight away instead of chasing down platform specific bugs later. I'd >> be happy to hear other's opinions, especially if they ynderstand the >> super magic going on. >> > > I support that, and maybe we can add some more unit tests to > see if we can find out what breaks, if anything. OK > Looking at the Bio/Restriction/Restriction.py, I can suggest these > candidates: Great - do you want to try to turn those into unit tests? Thanks, Peter From eric.talevich at gmail.com Sat Nov 26 19:49:35 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 26 Nov 2011 14:49:35 -0500 Subject: [Biopython-dev] Bio.Restriction "super" call (Python vs PyPy?) In-Reply-To: References: <87lir5wn4q.fsf@fastmail.fm> <8762i7he0m.fsf@fastmail.fm> Message-ID: On Sat, Nov 26, 2011 at 8:38 AM, Peter Cock wrote: > On Saturday, November 26, 2011, Eric Talevich > wrote: > > On Fri, Nov 25, 2011 at 8:24 PM, Brad Chapman > wrote: > >> > >> Peter; > >> > >> > >> super(RestrictionType, cls).__init__(cls, name, bases, dct) > >> > > > >> > > That seems strange: the __init__ is calling super on itself. You'd > >> > > normally expect this from a derived class. I'm not sure why this > >> > > doesn't trigger an infinite recursion initializing the object. I'm > +1 > >> > > on commenting it out. > >> > >> > I suppose we could be cautious and skip that line under PyPy > >> > only. How about that as a compromise - that way if is really > >> > is important for something not covered in the unit test, we only > >> > break it under PyPy, but C Python and Jython would be fine? > >> > >> My vote would be to comment it out generally instead of if_pypy > >> flags. I don't want to break anything, but if we do I'd rather find out > >> straight away instead of chasing down platform specific bugs later. I'd > >> be happy to hear other's opinions, especially if they ynderstand the > >> super magic going on. > >> > > > > I support that, and maybe we can add some more unit tests to > > see if we can find out what breaks, if anything. > > OK > > > > Looking at the Bio/Restriction/Restriction.py, I can suggest these > > candidates: > > Great - do you want to try to turn those into unit tests? > > Sure thing. Here's the relevant commit: https://github.com/biopython/biopython/commit/eb1c163909801731dc0a3d7fbcb2ee514f212da3 Unit tests for most of the magic methods were already there, I just didn't notice them earlier. I also commented out the offending line in Restriction.py and stirred the code a bit in that file and in the test suite. I tested with Python 2.7 and Pypy 1.7 on Ubuntu; we'll see what the build bots say now. -Eric