From biopython at maubp.freeserve.co.uk Wed Oct 1 04:36:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 09:36:38 +0100 Subject: [Biopython-dev] Numpy conversion In-Reply-To: <228132.43778.qm@web62402.mail.re1.yahoo.com> References: <37659.57326.qm@web62402.mail.re1.yahoo.com> <228132.43778.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00810010136h8f48506nd9b81f1f6a827e70@mail.gmail.com> On Wed, Oct 1, 2008 at 1:24 AM, Michiel de Hoon wrote: > Bio.kNN is the only module that imports Bio.distance. Bio.distance is > written in Python, but it also imports a C version of Bio.distance if it > is available. From the comments in the code, I gather that the > purpose of the C-version is to get fast distance calculations without > using Numeric / NumPy. However, Bio.kNN itself uses Numeric / > NumPy, which defeats the purpose of the C-version of Bio.distance. > > I would therefore like to propose to add a NumPy-aware version of > the code in Bio.distance to Bio.kNN, and to deprecate Bio.distance. > > Any objections? If Bio.kNN is the only usage of Bio.distance, then that sounds very sensible. However, there is a small chance that someone out there is using Bio.distance (perhaps because it doesn't use Numeric/NumPy). As a courtesy, we could ask on the main mailing list if anyone is using it before its deprecation, but otherwise I have no objections. Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 1 04:42:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 04:42:15 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810010842.m918gFCp026095@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #28 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 04:42 EST ------- In comment #26 Peter wrote: >> Using shell=False [with subprocess.Popen] works while shell=True >> fails on Windows (I tested on Windows XP with Python 2.5 from IDLE). >> However, the opposite is true on Mac OS X with python 2.5 from IDLE. >> This is a pain. In comment #27 Patnaik wrote: > I tried the subprocess routine through a test.py file on a Mac OS X > 10.5.5 with Python 2.5.2, but w/o using Biopython. I had to use > 'shell=True', otherwise with 'shell=False',I get: > > File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", > ... > _execute_child > raise child_exception > > With 'shell=True', it works even when there is a space in the > file-path/names of the BLAST executable, the database or the input > sequence file (the escaping of the spaces needs to be properly done). Good - at least that confirms the shell option differences I found between Windows and Mac. We'd need to check on Linux before we can write something using subprocess which should work on all the main platforms. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 04:42:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 04:42:39 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200810010842.m918gdnD026134@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #5 from lpritc at scri.sari.ac.uk 2008-10-01 04:42 EST ------- (In reply to comment #4) > (In reply to comment #3) > Good, as where is the fun otherwise? :-) I think that the discussion has been useful. > > I like the idea of > > making Seq.py more string-like, in part because when I first started using > > Biopython, I missed being able to slice, and other conveniently string-y > > things. > > Okay, so what is still missing with these new changes? I like the new, and proposed, changes to Seq. "When I first started" was nearly eight years ago, now... > > string.find() has the behaviour of only returning a single match - that which > > is closest to the string start. This might be useful to some (in ORF-finding, > > perhaps), but I expect I would use a finditer() method that returned all > > matches (for which there is no equivalent string method) almost exclusively > > It is not correct to compare finditer (a re method) to find (a string method) > or for that matter re.match or re.search. I think that it's perfectly valid to compare pretty much anything to pretty much anything else, such as now when we have an opportunity to get the pattern-finding functionality we want/need into the Seq object. Substitution (e.g. using string.find() in place of re.finditer()) is a different matter. To me, string.find() and re.search() are pretty much equivalent, except for their internal implementation, query argument type and return value. re.match() is like string.startswith(), with the same caveats. re.finditer() has no string.method() equivalent, but I would still find such a method useful. I think the abstract distinction between search types here is: 1) Find match at start of sequence (re.match() and string.startswith()) 2) Find first match in sequence (re.search() and string.find()) 3) Find all non-overlapping matches in sequence (re.finditer() only) 4) Find all overlapping matches in sequence (neither re nor string) 1a) 2a) 3a) 4a) The same, but in the reverse complement. Moving down the list, the problem becomes more general. The type of search I need most often in biological sequences is number (4a), or (4) for proteins. Each of search types (1) to (3) (a or not) has a theoretically faster implementation than doing (4) then filtering the results. I don't mind having more than one search method with different names, or having to specify arguments to get a particular kind of search. I do mind not having (4a) as an option... BTW, for reverse complement searches, I'm happy for this to be an optional argument - when I wrote the code above, I didn't need anything but two-strand searches. > I definitely think that the user has to decide whether or not they > want overlapping matches not the developer. There is no option under this > implementation. There is no option in string or re, either - not because the developer has guessed that the user always wants it, but because they have effectively guessed that the user *never* wants it (or that, if they do, they'll generalise the search themselves). This is probably because they were writing more general libraries with different use cases (and, in the case of re, actual implementation restrictions) than the Seq object. We have an opportunity to have the find()/search()/whatever() method be biologically-relevant, and I think we should take it. I think that, because overlapping matches are biologically-informative, and I see no reason other than consistency with the re module (which is constrained for reasons that do not apply to biological sequences) not to do so, that we make the default behaviour to find overlapping matches, and provide an option to exclude overlaps (which will probably make internal implementation faster). > I am not for or against having an method that returns overlapping matches > rather I am against only having returning overlapping matches as the only > choice. I'm actually in full agreement with you on this. > > I don't think I understand this point. Would you prefer an re.search() like > > implementation that takes a Seq object as its query argument? I don't think > > I'd find that as useful, myself, as a method that just takes a string. Such a > > method could also maybe parse arguments so as to compile the regex from the > > Seq.data attribute though, fulfilling your requirement. > > What I mean is that a user should be able to either specify the pattern or > specify a regular expression object. In either case the optional flags that are > often useful to have like ignorecase are ignored. Ah, I see. I think that, because we are working with a restricted symbol set, we do not strictly need the full functionality that is present in re. We would need as a minimum for a domain-specific re-a-like syntax: o symbols in the sequence alphabet, including correctly-interpreted ambiguity codes o .*+$^ etc. wildcards o {m,n} - like syntax for repeats o [] and [^] set notation o lookahead and lookbehind All of which, except for correct interpretation of ambiguity codes, is already in re and with a few tweaks we could just use re methods internally for this. The ambiguity codes could perhaps be implemented by substitution of sets of symbols for each ambiguity code, and the conformance of the regular expression to the sequence alphabet ensured by a filter on the query. Having a method that intelligently accepts both strings and compiled regexes would suit me. I suggest reversing the query rather than reversing the subject sequence because reverse-complementing larger sequences is likely to take a comparatively long time... > Regardless of what a user actually wants, they must wait for two searches along > the sequence. After that finishes the user must examine each and every entry > (due to the match_locations.sort()) to find the strand regardless of what they > want to do. In my code, yes - because that was the functionality I wanted when searching whole genomes for exact pattern matches. It may not have come across in my first post, but I was proposing the code as a potential starting point (for discussion as much as for an implementaion), not as the finished article. > I do not any advantage in this than someone calling the function > twice to get match_locations and rev_locations, doing 'match_locations += > rev_locations' and match_locations.sort(). Assuming that the return value was the same as in my code above then yes, there is no particular computational advantage (except the negligible ones of making one instead of two function calls, and fewer calls/lines of code implying less opportunity for user error). But, and again I stress this, I wrote the code with a particular purpose in mind and not as an enhancement for all possible uses of the Seq object. Had I needed to perform single-strand searches on nucleotide sequences, I'd probably have hacked the code in the way you've been suggesting, with strandedness as an optional argument. > Okay, then more Zen: > "In the face of ambiguity, refuse the temptation to guess." Damn! I'm out of quotes... ;) Time to ask the question on the Biopython-users/BiP lists? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 06:03:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 06:03:11 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200810011003.m91A3BB5030638@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1000 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 06:03 EST ------- Created an attachment (id=1002) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1002&action=view) Patch to Bio/Seq.py for Seq object split, strip, lstrip and rstrip methods (v2) Revised patch, will now accept Seq or string arguments to the strip and split methods. Still needs proper unit tests, probably added to test_seq.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 1 12:29:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 17:29:02 +0100 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> Message-ID: <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> Peter wrote: > From a quick look at approach taken in the matplotlib > code, we could add something like this to setup.py > > __version__ = "Undefined" > for line in open('Bio/__init__.py'): > if (line.startswith('__version__')): > exec(line.strip()) > > setup( > name='biopython', > version=__version__, > author='The Biopython Consortium', > ... > > I'm happy to deal with this if we are agreed that we > should add a __version__ to Bio/__init__.py > (variations on the naming are possible, but this seems > to be a de-facto standard in python libraries). Any objections to making this change now? Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 1 16:06:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:06:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810012006.m91K6lPQ001470@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #29 from drpatnaik at yahoo.com 2008-10-01 16:06 EST ------- It seems the BLAST executables can accept multiple databases (database pointers) in the 'd' argument, but they need to be space-separated. When there is a space in a single database's path-value, BLAST can interpret the provided argument as two databases and then fail. This can be the reason why the path-values for input sequence files and databases need to be quoted/escaped in different ways: blast_db = r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin.db\""' input_seq = r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\30a.seq" If this is correct, it might be helpful if Biopython had a functionality to accept multiple databases (for BLAST) by using the list data-type: blast_db = [r'"C:\Documents and Settings\patnaik\My Documents\blast\data\mouse.db", r'"C:\Documents and Settings\patnaik\My Documents\blast\data\rat.db"] Biopython can then collapse the list items into a properly quoted/escaped string for BLAST's 'd' argument. If this is not feasible, then a note in the documentation will also be of help. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 16:30:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:30:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810012030.m91KUkTx002912@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #30 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 16:30 EST ------- Peter wrote in comment 26 >> I've been having trouble with specifying BLAST databases >> with spaces in the path. Have you been able to demonstrate >> this with more than one database? Patnaik wrote in comment #29 > It seems the BLAST executables can accept multiple databases > (database pointers) in the 'd' argument, but they need to be > space-separated. Yes, that is correct. > When there is a space in a single database's path-value, BLAST > can interpret the provided argument as two databases and then > fail. This can be the reason why the path-values for input > sequence files and databases need to be quoted/escaped in > different ways: Yes, I agree. This is clear from some of the error messages BLAST gives when it cannot understand a BLAST database with a space in the name. I don't even know if using multiple databases with spaces is even possible. > If this is correct, it might be helpful if Biopython had > a functionality to accept multiple databases (for BLAST) > by using the list data-type: > ... > Biopython can then collapse the list items into a properly > quoted/escaped string for BLAST's 'd' argument. This is a nice idea *IF* we can establish what the rules for for making a properly quoted/escaped string for BLAST's 'd' argument (which may be different for different operating systems). We may want to email the NCBI for clarification here. > If this is not feasible, then a note in the documentation will > also be of help. For any documentation I would want to recommend using the win32api.GetShortPathName() function to avoid the spaces, with an example showing how to do this for the database name(s). To me this seems much simpler than the complex quoting solution. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 1 17:28:59 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 22:28:59 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> Message-ID: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> On Wed, Sep 24, 2008, Peter wrote: > Using CVS Biopython compiled from source, the unit tests all seem fine > on the following three setups: > > Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 > Test suite looks fine > > Linux, python 2.5, Numeric 24.2 and numpy 1.0 > Fine, ignoring the Numeric eigenvalue problem in > test_SVDSuperimposer.py previously discussed > > Linux, python 2.3, numpy 1.1.1 [no Numeric] > Fine, after fixing some broken imports ... > > Note that testing where there is NO version of Numeric is important > (as in this third example), as if both numpy and Numeric are installed > currently most of the pure python modules will use Numeric by choice. Testing on some other machines, with Biopython compiled from source using CVS as of today: Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 Fine (including BioSQL). I can't remove Numeric on this machine due to other libraries still using it. Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 Fine (not testing BioSQL), except for a precision issue on test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this is probably due to Numeric vs numpy. Windows XP, python 2.3.5, numpy 1.0 [no Numeric] Fine (not testing BioSQL). Now that NumPy 1.2.0 has been released (announced on the numpy mailing list on 26 Sept, but their website still needs updating), we should make sure we test Biopython with that too. Bruce tried with 1.2rc2 earlier so we should be fine. Testing on a python 2.6 release candidate might be a good idea too... Peter From bugzilla-daemon at portal.open-bio.org Thu Oct 2 01:03:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 01:03:19 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020503.m9253JrJ031547@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #31 from drpatnaik at yahoo.com 2008-10-02 01:03 EST ------- > For any documentation I would want to recommend using the > win32api.GetShortPathName() function to avoid the spaces, with an example > showing how to do this for the database name(s). To me this seems much simpler > than the complex quoting solution. To me it seems win32api.GetShortPathName() will not work for database paths because the specified values are not really files (e.g., BLAST uses the /data/mouse.db value to look for /data/mouse.db.nin, etc.), and win32api.GetShortPathName works only on files. For BLAST's 'd' argument, to specify multiple databases, one uses the space separator, and double-quotes the entire argument value ("Db1 Db2"). If a database value has spaces within, one backslash-double-quotes that database value (\"Db 3\") and BLAST is supplied with "Db1 Db2 \"Db 3\"". The following BLAST 2.2.18 console command, using multiple databases with spaces in the pointers, e.g., work on Windows XP SP2: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\data\db 1\" \"C:\Documents and Settings\patnaik\My Documents\blast\data\db 2\"" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 01:16:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 01:16:36 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020516.m925Garn032423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #32 from drpatnaik at yahoo.com 2008-10-02 01:16 EST ------- (Foolow-up to comment #31) Some more working BLAST command (multiple databases) examples, on Windows XP: 1. No need to use the backslash as a directory-separator: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:/Documents and Settings/patnaik/My Documents/blast/data/db 1\" \"C:/Documents and Settings/patnaik/My Documents/blast/data/db 2\"" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 2. Multiple databases, with no spaces in the database pointers: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\DB1~1 C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\DB2~1 C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\db3" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 3. Multiple databases, with database pointers with and without spaces: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:/Documents and Settings/patnaik/My Documents/blast/data/db 1\" \"C:/Documents and Settings/patnaik/My Documents/blast/data/db 2\" C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\db3" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 02:12:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 02:12:26 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020612.m926CPi4002947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #33 from drpatnaik at yahoo.com 2008-10-02 02:12 EST ------- Re: using subprocess.Popen with shell=True/False (comment #27), while 'shell=True' works on Mac OS X, and probably other Unix/like systems, one gets 'C:\Documents is not a recogniz...' type of error in Windows. With 'shell=False', also the deafult 'shell' value on Unix, Windows works but Mac OS X fails. So this should be cross-platform (works on Windows XP): my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=(True, False)[sys.platform == "win32"]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 05:41:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 05:41:45 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020941.m929fjGp014191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-02 05:41 EST ------- (In reply to comment #31) > > For any documentation I would want to recommend using the > > win32api.GetShortPathName() function to avoid the spaces, > > with an example showing how to do this for the database > > name(s). To me this seems much simpler than the complex > > quoting solution. > > To me it seems win32api.GetShortPathName() will not work for > database paths because the specified values are not really > files (e.g., BLAST uses the /data/mouse.db value to look for > /data/mouse.db.nin, etc.), and win32api.GetShortPathName > works only on files. I believe win32api.GetShortPathName works on paths (directories) and files. But by the nature of the filing system, it can only work on existing files/directories - the short names cannot be calculated in advance. As you have found this means the function cannot be used on a database name (which is not a full filename). Thus any example in the documentation would have to use win32api.GetShortPathName on the folder and then add on the name. This alternative approach (from comment 24) would have to known about multiple extensions (nucleotide and protein databases differ): my_blast_db = win32api.GetShortPathName('C:/Documents and Settings/patnaik/My Documents/blast/bin/mine.nin')[:-4] > For BLAST's 'd' argument, to specify multiple databases, one uses the space > separator, and double-quotes the entire argument value ("Db1 Db2"). If a > database value has spaces within, one backslash-double-quotes that database > value (\"Db 3\") and BLAST is supplied with "Db1 Db2 \"Db 3\"". If we extend the Biopython BLAST API to require multiple databases as a list of strings this could be possible. Otherwise, how do we know if we are dealing with two databases (e.g. "Db1 Db2") or a single database whose name contains a space (e.g. "expressed genes")? We might also want to cope with the situation where the user has already pre-quoted their database string. (In reply to comment 33) > Re: using subprocess.Popen with shell=True/False (comment #27), > while 'shell=True' works on Mac OS X, and probably other Unix/like > systems, ... Probably, but we need to check this rather than assuming it. > my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE, > stdout=subprocess.PIPE, stderr=subprocess.PIPE, > shell=(True,False)[sys.platform == "win32"]) Using shell=(sys.platform<>"win32") would be much simpler ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 08:50:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 08:50:40 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810021250.m92Coeb4023844@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #35 from drpatnaik at yahoo.com 2008-10-02 08:50 EST ------- (In reply to comment #34) If the subprocess routine can be implemented there hopefully will not be any issue causes by spaces in path values for the BLAST executable or the input file. For the database values, there is no reason to change the API; the documentation can just state that double-quoting is needed if there are spaces in a database pointer, and that when multiple databases with at least one of the pointers having spaces are specified, then such pointers need to be additionally put inside escaped double-quotes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Oct 2 09:51:18 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 08:51:18 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E4D1D6.9030406@gmail.com> Peter wrote: > On Wed, Sep 24, 2008, Peter wrote: > >> Using CVS Biopython compiled from source, the unit tests all seem fine >> on the following three setups: >> >> Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 >> Test suite looks fine >> >> Linux, python 2.5, Numeric 24.2 and numpy 1.0 >> Fine, ignoring the Numeric eigenvalue problem in >> test_SVDSuperimposer.py previously discussed >> >> Linux, python 2.3, numpy 1.1.1 [no Numeric] >> Fine, after fixing some broken imports ... >> >> Note that testing where there is NO version of Numeric is important >> (as in this third example), as if both numpy and Numeric are installed >> currently most of the pure python modules will use Numeric by choice. >> > > Testing on some other machines, with Biopython compiled from source > using CVS as of today: > > Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 > Fine (including BioSQL). I can't remove Numeric on this machine due > to other libraries still using it. > > Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 > Fine (not testing BioSQL), except for a precision issue on > test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this > is probably due to Numeric vs numpy. > > Windows XP, python 2.3.5, numpy 1.0 [no Numeric] > Fine (not testing BioSQL). > > Now that NumPy 1.2.0 has been released (announced on the numpy mailing > list on 26 Sept, but their website still needs updating), we should > make sure we test Biopython with that too. Bruce tried with 1.2rc2 > earlier so we should be fine. > > Testing on a python 2.6 release candidate might be a good idea too... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi, Please note numpy 1.2 does require Python 2.4+ so if BioPython will still support Python 2.3+ then we need someone to test with numpy 1.1. Download numpy 1.2 from the links at: http://sourceforge.net/projects/numpy/ There are two Windows installation files for Python 2.4 and 2.5 attempts to install the appropriate binary for the processor type and instruction set (such as i.e. SSE). This avoids people installing the wrong version and associated 'bugs' and 'crashes' that may result. Also, it was noted by David on the numpy list "that updated packages for various linux distributions (Fedora, Centos/RHEL, OpenSuse) are available": http://download.opensuse.org/repositories/home:/ashigabou/ Regards Bruce From bsouthey at gmail.com Thu Oct 2 12:06:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 11:06:06 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E4F16E.1030900@gmail.com> Peter wrote: > On Wed, Sep 24, 2008, Peter wrote: > >> Using CVS Biopython compiled from source, the unit tests all seem fine >> on the following three setups: >> >> Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 >> Test suite looks fine >> >> Linux, python 2.5, Numeric 24.2 and numpy 1.0 >> Fine, ignoring the Numeric eigenvalue problem in >> test_SVDSuperimposer.py previously discussed >> >> Linux, python 2.3, numpy 1.1.1 [no Numeric] >> Fine, after fixing some broken imports ... >> >> Note that testing where there is NO version of Numeric is important >> (as in this third example), as if both numpy and Numeric are installed >> currently most of the pure python modules will use Numeric by choice. >> > > Testing on some other machines, with Biopython compiled from source > using CVS as of today: > > Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 > Fine (including BioSQL). I can't remove Numeric on this machine due > to other libraries still using it. > > Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 > Fine (not testing BioSQL), except for a precision issue on > test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this > is probably due to Numeric vs numpy. > > Windows XP, python 2.3.5, numpy 1.0 [no Numeric] > Fine (not testing BioSQL). > > Now that NumPy 1.2.0 has been released (announced on the numpy mailing > list on 26 Sept, but their website still needs updating), we should > make sure we test Biopython with that too. Bruce tried with 1.2rc2 > earlier so we should be fine. > > Testing on a python 2.6 release candidate might be a good idea too... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Actually the 'final' Python 2.6 was released yesterday (October 1st, 2008)! Bruce From bugzilla-daemon at portal.open-bio.org Thu Oct 2 14:07:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 14:07:48 -0400 Subject: [Biopython-dev] [Bug 2604] New: test_Restriction failure with Python 2.6 (also cause error in test_CAPS) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Summary: test_Restriction failure with Python 2.6 (also cause error in test_CAPS) Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Running 'python setup.py test --no-gui' gives the following error for test_Restriction. This is the same line that cause test_CAPS to fail. Both test outputs below. This with Linux x86_64 with Python 2.6 compiled using gcc v4.3.2 ====================================================================== ERROR: test_Restriction ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_Restriction.py", line 8, in from Bio.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 2351, in newenz = T(k, bases, enzymedict[k]) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 217, in __init__ super(RestrictionType, cls).__init__(name, bases, dict) TypeError: descriptor '__init__' requires a 'type' object but received a 'str' ---------------------------------------------------------------------- ====================================================================== ERROR: test_CAPS ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_CAPS.py", line 3, in from Bio.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 2351, in newenz = T(k, bases, enzymedict[k]) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 217, in __init__ super(RestrictionType, cls).__init__(name, bases, dict) TypeError: descriptor '__init__' requires a 'type' object but received a 'str' ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 14:11:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 14:11:08 -0400 Subject: [Biopython-dev] [Bug 2605] New: test_PDB failure with Python 2.6 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2605 Summary: test_PDB failure with Python 2.6 Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Running 'python setup.py test --no-gui' results in failure for test_PDB on linux x86_64 running Python 2.6 compiled with gcc 4.3.2 ====================================================================== ERROR: test_PDB ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_PDB.py", line 68, in run_test() File "test_PDB.py", line 22, in run_test structure=p.get_structure("example", "PDB/a_structure.pdb") File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 69, in get_structure self._parse(file.readlines()) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 89, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 186, in _parse_coordinates structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/StructureBuilder.py", line 224, in init_atom residue.add(atom) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/Residue.py", line 81, in add raise PDBConstructionException, "Atom %s defined twice in residue %s" % (atom_id, self) TypeError: exceptions must be classes or instances, not str ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Oct 2 14:23:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 13:23:05 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E51189.5060603@gmail.com> Hi, I just built and installed Python 2.6 with gcc version 4.3.2. I then installed numpy 1.2 with it (so no Numeric). I did a cvs update on biopython and installed with Python 2.5.2 and Python 2.6. In both cases I noticed many gcc warnings 'differ in signedness' (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and also Bio/triemodule.c has a couple of other warnings. In both cases 'python setup.py test' opened graphical window - it did not do that before when I tested. What should the default be? All expected tests (ie I do not have biosql setup) passed with Python 2.5.2. With Python 2.6, running tests gave two warnings (should I file a bug report?): biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: DeprecationWarning: the sets module is deprecated from sets import Set biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message Also I got three errors with Python 2.6 (without the -3 flag as that provides warnings for Python 3) so I filed bug reports: test_CAPS test_PDB test_Restriction The failure for test_CAPS and and test_Restriction is due to the same line "Bio/Restriction/Restriction.py" (line 217). Apart from these everything else passed. Bruce From biopython at maubp.freeserve.co.uk Fri Oct 3 05:06:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 10:06:46 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <48E51189.5060603@gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> <48E51189.5060603@gmail.com> Message-ID: <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> > Hi, > I just built and installed Python 2.6 with gcc version 4.3.2. I then > installed numpy 1.2 with it (so no Numeric). I thought you might be the first to try Biopython with python 2.6 was I knew it was out. > I did a cvs update on biopython and installed with Python 2.5.2 and Python > 2.6. In both cases I noticed many gcc warnings 'differ in signedness' > (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and > also Bio/triemodule.c has a couple of other warnings. I've noticed differ in signedness warnings from trie before with an older gcc - we propably should fix these so please file a (low priority) bug for that. > In both cases 'python setup.py test' opened graphical window - it did not do > that before when I tested. What should the default be? python setup.py test --no-gui If the relevant GUI python framework isn't present, it defaults to no GUI. It has been suggested that we drop the GUI - as a relative new comer to Biopython what do you think? > All expected tests (ie I do not have biosql setup) passed with Python 2.5.2. Excellent. > With Python 2.6, running tests gave two warnings (should I file a bug > report?): > biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: > DeprecationWarning: the sets module is deprecated > from sets import Set As of python 2.4, set (note lower case) became a built in function (like list). As we still support Python 2.3, avoiding this deprecation would need something like: try : #This should work on python 2.4+ Set = set except NameError: from sets import Set #The remaining code can use Set as before... Or something similar if we switch all the calls to Set() to the newer set() instead. > biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: > DeprecationWarning: BaseException.message has been deprecated as of Python > 2.6 > self.message = message I'm not immediatley sure how to fix that, lets see if anyone on the list has a quick suggestion. > Also I got three errors with Python 2.6 (without the -3 flag as that > provides warnings for Python 3) so I filed bug reports: > test_CAPS > test_PDB > test_Restriction > > The failure for test_CAPS and and test_Restriction is due to the same line > "Bio/Restriction/Restriction.py" (line 217). OK - we'll have to look at those. > Apart from these everything else passed. Thanks. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 06:26:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 06:26:48 -0400 Subject: [Biopython-dev] [Bug 2605] test_PDB failure with Python 2.6 In-Reply-To: Message-ID: <200810031026.m93AQmRV007453@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2605 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 06:26 EST ------- I think this is related to the old style exceptions which were being used in Bio.PDB, see http://www.python.org/dev/peps/pep-0008/ > When raising an exception, use "raise ValueError('message')" instead of > the older form "raise ValueError, 'message'". > > The paren-using form is preferred because when the exception arguments > are long or include string formatting, you don't need to use line > continuation characters thanks to the containing parentheses. The older > form will be removed in Python 3000. It looks like the old form was removed in Python 2.6 (or I have mis-identified the problem). I've switched all the exception raises in Bio.PDB in CVS, and made the exceptions into proper classes, which I hope will address this bug under python 2.6. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 08:27:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 08:27:29 -0400 Subject: [Biopython-dev] [Bug 2605] test_PDB failure with Python 2.6 In-Reply-To: Message-ID: <200810031227.m93CRT0o014499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2605 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 08:27 EST ------- I've now tested this on Linux with python 2.6 and numpy 1.2 and this has indeed fixed the test failure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 3 08:44:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 13:44:25 +0100 Subject: [Biopython-dev] Python 2.6 Message-ID: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> I've now got Python 2.6, numpy 1.2 and Biopython CVS installed on a linux machine and can confirm Bruce's observations. I haven't yet installed MySQLdb in order to verify BioSQL is still fine with Python 2.6. I have fixed the exception problem with test_PDB.py on python 2.6. >> With Python 2.6, running tests gave two warnings (should I file a bug >> report?): >> biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: >> DeprecationWarning: the sets module is deprecated >> from sets import Set > > As of python 2.4, set (note lower case) became a built in function > (like list). As we still support Python 2.3, avoiding this > deprecation would need something like: > > try : > #This should work on python 2.4+ > Set = set > except NameError: > from sets import Set > #The remaining code can use Set as before... > > Or something similar if we switch all the calls to Set() to the newer > set() instead. I could test this change, or work on variant using set() by default. Does anyone have a preference? >> biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: >> DeprecationWarning: BaseException.message has been deprecated as of Python >> 2.6 >> self.message = message > > I'm not immediately sure how to fix that, lets see if anyone on the > list has a quick suggestion. This is probably also due to an exception class change in python 2.6, similar to that which broke test_PDB (bug 2605). >> Also I got three errors with Python 2.6 (without the -3 flag as that >> provides warnings for Python 3) so I filed bug reports: >> test_CAPS >> test_PDB >> test_Restriction >> >> The failure for test_CAPS and and test_Restriction is due to the same line >> "Bio/Restriction/Restriction.py" (line 217). Bug 2604, http://bugzilla.open-bio.org/show_bug.cgi?id=2604 This seems to be due to the python 2.6 changes to the python built in super. I've tried emailing Fr?d?ric Sohm who wrote this code, but I'm not sure if the email address I used is still valid. Bug 2605, test_PDB failure is now fixed (it was using old style exceptions) http://bugzilla.open-bio.org/show_bug.cgi?id=2605 Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:00:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:00:42 -0400 Subject: [Biopython-dev] [Bug 2607] New: Gcc "differ in signedness" warning with cstringfnsmodule.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2607 Summary: Gcc "differ in signedness" warning with cstringfnsmodule.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc version 4.3.2 gives the "differ in signedness" warning below when building Biopython. While my C is not very good, changing line 34 from 'unsigned char' to just 'char' removed the warnings. Bio/cstringfnsmodule.c: In function ???cstringfns_splitany???: Bio/cstringfnsmodule.c:34: warning: pointer targets in initialization differ in signedness Bio/cstringfnsmodule.c:71: warning: pointer targets in passing argument 1 of ???PyString_FromStringAndSize??? differ in signedness Bio/cstringfnsmodule.c:85: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/cstringfnsmodule.c:87: warning: pointer targets in passing argument 1 of ???PyString_FromStringAndSize??? differ in signedness -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:08:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:08:37 -0400 Subject: [Biopython-dev] [Bug 2608] New: Gcc "differ in signedness" warnings with trie.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2608 Summary: Gcc "differ in signedness" warnings with trie.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc 4.3.2 provides multiple "differ in signedness" warning with trie.c as given below. This may be related to multiple declarations of 'unsigned char' instead of 'char'. Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???__builtin_strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???__builtin_strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???__builtin_strncat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???__builtin_strncat??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???__builtin_strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???__builtin_strncat??? differ in signedness Bio/trie.c: In function ???_serialize_transition???: Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:15:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:15:12 -0400 Subject: [Biopython-dev] [Bug 2609] New: Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2609 Summary: Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc 4.3.2 gives an 'initialization from incompatible pointer type' warning with triemodule.c. Bio/triemodule.c:389: warning: initialization from incompatible pointer type Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:488: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:35:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:35:44 -0400 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: Message-ID: <200810031435.m93EZi2j022259@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2608 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 10:35 EST ------- Interestingly looking at the CVS history, in Bio/triemodule.c revision 1.5 it looks like we used to have lots of "char *" casts/variables which were changed to "unsigned char *" to solve complaints from the SGI cc compiler (the comment doesn't say if these were warnings or errors). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/triemodule.c?cvsroot=biopython We should probably be using whatever PyString_AS_STRING, PyExc_KeyError, PyString_FromString etc use. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Oct 3 11:01:42 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 03 Oct 2008 10:01:42 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> <48E51189.5060603@gmail.com> <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> Message-ID: <48E633D6.9090100@gmail.com> Peter wrote: >> Hi, >> I just built and installed Python 2.6 with gcc version 4.3.2. I then >> installed numpy 1.2 with it (so no Numeric). >> > > I thought you might be the first to try Biopython with python 2.6 was > I knew it was out. > > >> I did a cvs update on biopython and installed with Python 2.5.2 and Python >> 2.6. In both cases I noticed many gcc warnings 'differ in signedness' >> (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and >> also Bio/triemodule.c has a couple of other warnings. >> > > I've noticed differ in signedness warnings from trie before with an > older gcc - we propably should fix these so please file a (low > priority) bug for that. > Filed bug reports for these and I think that the Bio/cstringfnsmodule.c and Bio/trie.c are related to the declaration of 'unsigned char' . I change this to 'char' in cstringfnsmodule.c and the warning goes away. However, that probably is not be the best thing to do without checking that reason for using 'unsigned char' in the code (may be essential to maintain sign). There are some interesting comments on the usage of strlen() warnings such Linus Torvalds: "..and my argument is that a warning which doesn't allow you to call "strlen()" on a "unsigned char" array without triggering is a bogus warning, and must be removed." So perhaps these can be ignored. Regards Bruce From bugzilla-daemon at portal.open-bio.org Fri Oct 3 11:54:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 11:54:19 -0400 Subject: [Biopython-dev] [Bug 2611] New: Message corrections when tests are skipped Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2611 Summary: Message corrections when tests are skipped Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com With the latest cvs version and Python 2.6 there are some corrections needed to the message output. I also think these should be consistent. A few messages include a ')' but have not opening '(': test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics). test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics). test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics). I think these tests should have similar message to the previous ones: test_PopGen_FDist ... skipping. Fdist not found (not a problem if you do not intend to use it). test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if you do not intend to use it). Perhaps: test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen_FDist. test_PopGen.SimCoal ... skipping. Install SimCoal if you want to use Bio.PopGen.SimCoal. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 12:35:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 12:35:19 -0400 Subject: [Biopython-dev] [Bug 2611] Message corrections when tests are skipped In-Reply-To: Message-ID: <200810031635.m93GZJ1v031955@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2611 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 12:35 EST ------- Fixed extra bracket in: test_GraphicsChromosome.py revision 1.4 test_GraphicsDistribution.py revision 1.3 test_GraphicsGeneral.py revision 1.3 Standardised MissingExternalDependencyError wording in: test_PopGen_SimCoal.py revision 1.2 test_PopGen_FDist.py revision 1.6 and also requires_wise.py revision 1.5 (used by test_wise.py) Thanks for your attention to detail here Bruce. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 3 12:52:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 17:52:34 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> Message-ID: <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> One of the python 2.6 issues Bruce flagged up was the deprecation of the Sets module. Based on a quick grep, this affects several modules: Seq.py - used in the self test only, which could be removed Align/AlignInfo.py AlignIO/__init__.py - used in the self test only, which could be removed AlignIO/PhylipIO.py Data/CodonTable.py Nexus/Nexus.py Nexus/Trees.py Restriction/Restriction.py SeqIO/__init__.py- used in the self test only, which could be removed SeqIO/PhylipIO.py Most of these do either "from sets import Set" or "import sets". On balance I think it would make sense to convert all these to use the new built in "set(...)" instead of "Set(...)", with a fall back for python 2.3 like this: #TODO - Remove this work around once we drop python 2.3 support try: #Check the built in set function is present (python 2.4+) set = set except NameError: #For python 2.3 fall back on the sets module (deprecated in python 2.6) from sets import Set as set and replace all use of Set(...) with set(...) in the main code. See also http://www.python.org/dev/peps/pep-0218/ Of course, dropping support for python 2.3 as part of supporting 2.6 isn't out of the question, and would make dealing with the set/Set issue much simpler. Personally I still use python 2.3 on Windows XP because I have the build environment all setup using MSVC 6.0, and switching python versions would require me to setup a whole new compiler suite. I'd rather not drop support for python 2.3 in Biopython unless/until I've got a Windows machine setup with a working python compatible compiler. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 13:37:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 13:37:23 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810031737.m93HbNsW003167@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #1 from bsouthey at gmail.com 2008-10-03 13:37 EST ------- I should add that this error is new in Python 2.6, see 'Porting to Python 2.6' section of http://docs.python.org/whatsnew/2.6.html " object.__init__() previously accepted arbitrary arguments and keyword arguments, ignoring them. In Python 2.6, this is no longer allowed and will result in a TypeError. This will affect __init__() methods that end up calling the corresponding method on object (perhaps through using super()). See issue 1683368 for discussion. " -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalke at dalkescientific.com Sat Oct 4 18:09:35 2008 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 5 Oct 2008 00:09:35 +0200 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: <200810031435.m93EZi2j022259@portal.open-bio.org> References: <200810031435.m93EZi2j022259@portal.open-bio.org> Message-ID: <82FA5FF2-576B-43E8-9628-3D740989113F@dalkescientific.com> > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk > 2008-10-03 10:35 EST ------- > Interestingly looking at the CVS history, in Bio/triemodule.c > revision 1.5 it > looks like we used to have lots of "char *" casts/variables which > were changed > to "unsigned char *" to solve complaints from the SGI cc compiler > (the comment > doesn't say if these were warnings or errors). Those were almost certainly warnings. The "char" type on IRIX is unsigned. I once tracked down a bug in some code which used a char field to store formal charges. On IRIX the charges were 0, +1, +2, +254 and +255. :) Andrew dalke at dalkescientific.com From mjldehoon at yahoo.com Sat Oct 4 22:07:53 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Oct 2008 19:07:53 -0700 (PDT) Subject: [Biopython-dev] Bio.MarkovModel Message-ID: <217322.81025.qm@web62402.mail.re1.yahoo.com> Hi everybody, When I was looking at the NumPy-dependent modules, I got the impression that Bio.MarkovModel can be simplified now that it's using the new NumPy. As far as I can tell, there is no documentation for Bio.MarkovModel, and the code seems to have some (trivial) bugs that (I think) would be noticed if anybody is actively using Bio.MarkovModel. So I am wondering 1) Has anybody looked at Bio.MarkovModel in detail? 2) If not, should this module be kept? On the one hand, Markov models are a core part of computational biology and as such are an appropriate module for Biopython. On the other hand, the code is useful only if people are actually using it. Another option is to have MarkovModel.py as a stand-alone example script of Python in computational biology instead of a full-blown module. --Michiel. From biopython at maubp.freeserve.co.uk Sun Oct 5 07:44:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 5 Oct 2008 12:44:02 +0100 Subject: [Biopython-dev] Bio.MarkovModel In-Reply-To: <217322.81025.qm@web62402.mail.re1.yahoo.com> References: <217322.81025.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> On Sun, Oct 5, 2008 at 3:07 AM, Michiel de Hoon wrote: > Hi everybody, > > When I was looking at the NumPy-dependent modules, I got the impression > that Bio.MarkovModel can be simplified now that it's using the new NumPy. That's good, but of limited benefit in itself. > As far as I can tell, there is no documentation for Bio.MarkovModel, and There isn't even a copyright statement - but there are at least docstrings, which is something. Looking at the CVS log, Jeff Chang checked this in originally, so either he wrote it or he should at least know who did. > the code seems to have some (trivial) bugs that (I think) would be noticed > if anybody is actively using Bio.MarkovModel. So I am wondering > > 1) Has anybody looked at Bio.MarkovModel in detail? Not personally. > 2) If not, should this module be kept? I would say yes. > On the one hand, Markov models are a core part of computational > biology and as such are an appropriate module for Biopython. On the > other hand, the code is useful only if people are actually using it. > Another option is to have MarkovModel.py as a stand-alone example > script of Python in computational biology instead of a full-blown module. As you say, Markov models are an important tool in computational biology, so having some useful code to work with them in Biopython is a good thing. To me, having this remain as a "top level" module in Biopython would give it higher status and visibility than hiding it away in the example scripts. If you can see a few little things that need fixing, then making those improvements would be worthwhile. If you don't really have the time to deal with them now, even just filing bugs would be worth doing. If anyone is actively using the module, then contributing something for the tutorial would be very welcome. If you don't know LaTeX (used for the typesetting), then just plain text is fine - I'm happy to deal with the formatting. Peter From mjldehoon at yahoo.com Mon Oct 6 06:13:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 6 Oct 2008 03:13:18 -0700 (PDT) Subject: [Biopython-dev] Bio.MarkovModel; Bio.Popgen, Bio.PDB documentation In-Reply-To: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> Message-ID: <163677.27280.qm@web62408.mail.re1.yahoo.com> > > When I was looking at the NumPy-dependent modules, I > > got the impression that Bio.MarkovModel can be > > simplified now that it's using the new NumPy. > > That's good, but of limited benefit in itself. Well, currently Bio.MarkovModel uses a C extension module Bio.cMarkovModel. If we can achieve the same speed or better by making use of NumPy, then we won't need this C extension module and we can simplify Biopython. > > 2) If not, should this module be kept? > > I would say yes. > ... > To me, having this remain as a "top level" module in > Biopython would give it higher status and visibility than > hiding it away in the example scripts. OK, let's keep it as a module then. We now have several small modules related to supervised learning as separate Bio.s (LogisticRegression, MaxEntropy, kNN, NaiveBayes, and arguably MarkovModel), which to me looks a bit messy. It may be a good idea to collect these in one Bio.Supervised, though this is not urgent. I'd be happy to set up a new chapter in the tutorial about these supervised learning modules (I wrote a section a long time ago for the cookbook about logistic regression). While we're on the subject, I think that the Bio.PopGen and Bio.PDB sections of the cookbook chapter in the tutorial should be promoted to separate chapters in the tutorial, since these modules are fairly big and have a good documentation. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 06:24:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 06:24:57 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061024.m96AOuhV000861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 06:24 EST ------- I've contacted Fr??d??ric Sohm by email, and this is his suggested fix for the super issue: -------------------------------------- I replaced line 221 : super(RestrictionType, cls).__init__(name, bases, dict) #dict was an error for dct by the way By : if sys.version < '2.6' : # sys is imported at the beginning to check # for set anyway. super(RestrictionType,cls).__init__(name, bases, dct) else : super(RestrictionType,cls).__init__(cls, name, bases, dct) # cls is the equivalent of self there. It's different to mark the fact # that the class is a metaclass not a normal python class. This should support both 2.6 and 2.3; The biopython test is now working with 2.6 (I did not try with 2.3 but this should not have changed anything for this version). I have not much time for testing it thoroughly right now, sorry. -------------------------------------- End quote. Interestingly using the following, test_Restriction.py works on python 2.4: super(RestrictionType,cls).__init__(cls, name, bases, dct) This is consistent with the documentation Bruce found about arbitrary arguments and keyword arguments being accepted and ignored before python 2.6. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Oct 6 06:33:09 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 11:33:09 +0100 Subject: [Biopython-dev] Bio.MarkovModel; Bio.Popgen, Bio.PDB documentation In-Reply-To: <163677.27280.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> <163677.27280.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00810060333nc4c8840xab35976e4aaff447@mail.gmail.com> On Mon, Oct 6, 2008 at 11:13 AM, Michiel de Hoon wrote: >> > When I was looking at the NumPy-dependent modules, I >> > got the impression that Bio.MarkovModel can be >> > simplified now that it's using the new NumPy. >> >> That's good, but of limited benefit in itself. > > Well, currently Bio.MarkovModel uses a C extension module > Bio.cMarkovModel. If we can achieve the same speed or better > by making use of NumPy, then we won't need this C extension > module and we can simplify Biopython. I'd missed the C extension module - yes, if we can drop that by making more use of numpy this does sound worth while. >> > 2) If not, should this module be kept? >> >> I would say yes. >> ... >> To me, having this remain as a "top level" module in >> Biopython would give it higher status and visibility than >> hiding it away in the example scripts. > > OK, let's keep it as a module then. We now have several small > modules related to supervised learning as separate Bio.s > (LogisticRegression, MaxEntropy, kNN, NaiveBayes, and arguably > MarkovModel), which to me looks a bit messy. It may be a good > idea to collect these in one Bio.Supervised, though this is not urgent. > > I'd be happy to set up a new chapter in the tutorial about these > supervised learning modules (I wrote a section a long time ago > for the cookbook about logistic regression). http://biopython.org/DIST/docs/cookbook/LogisticRegression.html Using that as a basic for a whole chapter sounds excellent. > While we're on the subject, I think that the Bio.PopGen and > Bio.PDB sections of the cookbook chapter in the tutorial should > be promoted to separate chapters in the tutorial, since these > modules are fairly big and have a good documentation. Bio.PDB also has a whole separate document, but I am not sure off hand how much this overlaps. http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf I agree that those two sections could be promoted to chapters. Would we want to stick with a global authorship? If so Tiago should be listed for the PopGen chapter. Alternatively, we could list authors for each chapter (which will take a little leg work up front) and a few "editors" (which may well change over time). Peter From biopython at maubp.freeserve.co.uk Mon Oct 6 07:33:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 12:33:44 +0100 Subject: [Biopython-dev] Fwd: Biopython - Bio.Restriction problem with super on python 2.6 In-Reply-To: <320fb6e00810060309s449be9er67178dd184b789d5@mail.gmail.com> References: <320fb6e00810030406n67e7254ao1bdcbeebdd0b981@mail.gmail.com> <48E620CB.6090108@inaf.cnrs-gif.fr> <320fb6e00810030708t3dfa51an44410c0faaee0e77@mail.gmail.com> <48E9CA47.1000502@inaf.cnrs-gif.fr> <320fb6e00810060309s449be9er67178dd184b789d5@mail.gmail.com> Message-ID: <320fb6e00810060433td9cab92ie781a9cceaf9e8dd@mail.gmail.com> This is a forwarded email from Fr?d?ric Sohm about the python 2.6 super issue in Bio.Restriction (Bug 2604), with my replies included. See also http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Peter ---------- Forwarded message ---------- From: Peter Date: Mon, Oct 6, 2008 at 11:09 AM Subject: Re: Biopython - Bio.Restriction problem with super on python 2.6 To: Frederic Sohm On Mon, Oct 6, 2008 at 9:20 AM, Fr?d?ric Sohm wrote: > Hi Peter, > > I do not have access to the mailing list (I suspect my e-mail address > changed since my inscription to the biopython and biopython-dev mailing list > (from ... to ...) You can sign up again if you like, http://biopython.org/wiki/Main_Page Do you mind if I forward this to the mailling list (I'll remove your email addresses if you are worried about spam). > Concerning the sets problem : > I prefer the following solution rather than change all Set occurences : > > Replacing the import : > > from sets import Set > > by : > > import sys > if sys.version < '2.6' : > from sets import Set > else : > Set = set > > This should maintain backward compatibility with python 2.3 as you requested > on the mailing list and avoid to have to change too much code, on the other > hand its not as clean as changing Set occurences. Either would work - I don't really mind. I think we'll have to change all the Set occurances one day, so we might as well do it now. > Concerning the Restriction module, it was easiest than I thought it would be: > > I replaced line 221 : > super(RestrictionType, cls).__init__(name, bases, dict) > > #dict was an error for dct by the way I had wondered about dict/dct, so its good to have that confirmed. > By : > > if sys.version < '2.6' : # sys is imported at the beginning to check > # for set anyway. > super(RestrictionType,cls).__init__(name, bases, dct) > else : > super(RestrictionType,cls).__init__(cls, name, bases, dct) > > # cls is the equivalent of self there. It's different to mark the fact > # that the class is a metaclass not a normal python class. > > This should support both 2.6 and 2.3; The biopython test is now working with > 2.6 (I did not try with 2.3 but this should not have changed anything for > this version). I have not much time for testing it thoroughly right now, > sorry. I'll be able to check the unit test passes of a few different versions of python. > I attached the 2 files I modified : Restriction (set and super) and > CodonTable (set). Could you please take care of the uploading as I have no > access to it. Yes, of course. > ===================================================================== > To explain a bit what happen there : > I had problems of inheritance when I tried to build this module : > Restriction enzymes are defined by a serie of site characteristics and ways > to cut the DNA (blunt/3' overhang/5' overhang, one/two cut(s), > inside/outside the recognised sequence,...). > This implied I could not find a way to write a generic enzyme classes with > standard methods without being confronted to inheritance and mro (method > resolution order) problems when instantiating the final class. > > One way out would have been to check each single class instance for all the > characteristic with series of if/else in every method over and over. But > this would have been tedious to write, slow and not very much in the spirit > of an object-oriented programming language. > Moreover I was curious to see how metaclass worked. > So I used metaclass to build the class for the enzyme. > > That way each single enzyme is its own class, which is put together from a > serie of basic class. These classes are combined to build a metaclass for > each enzyme. By putting the class together that way I managed to overcome > the method resolution order problems (diamond rule). > > The main drawback is that Restriction uses directly classes to do the work > instead of "normal" python class instances. > Some magic is then necessary to initiate the classes and create the class > instances (hence the use of super and the magic at the end of the > Restriction.py module - from "for TYPE, (bases, enzymes) ..." onward). > > I am not certain this way is the recommended way to do it, but on the other > hand, it's working, fast enough and it was fun to write so... > ===================================================================== Great - thanks, Peter From biopython at maubp.freeserve.co.uk Mon Oct 6 09:22:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 14:22:43 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> Message-ID: <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> Peter wrote: > One of the python 2.6 issues Bruce flagged up was the deprecation of > the Sets module. Based on a quick grep, this affects several modules: > ... > On balance I think it would make sense to convert all these to use the > new built in "set(...)" instead of "Set(...)", with a fall back for python 2.3 Seq.py - fixed in CVS Align/AlignInfo.py - complicated by the lack of a union_update method for the built in set class, but fixed in CVS. AlignIO/__init__.py - removed unused import in CVS AlignIO/PhylipIO.py - fixed in CVS Data/CodonTable.py - fixed in CVS Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit test output will currently say Set or set depending on the version of python. A further minor change to test_Nexus.py would be needed to cope with this. Restriction/Restriction.py - this subclasses the Set object, so needs a little more checking. SeqIO/__init__.py - fixed in CVS SeqIO/PhylipIO.py - was deprecated, now removed Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 6 11:17:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 11:17:16 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061517.m96FHGBU018020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #3 from bsouthey at gmail.com 2008-10-06 11:17 EST ------- (In reply to comment #2) I changed the line: super(RestrictionType,cls).__init__(name, bases, dct) to super(RestrictionType,cls).__init__(cls, name, bases, dct) All the tests for Restriction passed for all my Python versions 2.3, 2.4, 2.5 and 2.6. So it appears that there no need to check the Python version - of course this needs at least a verification under Windows. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 11:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 11:40:23 -0400 Subject: [Biopython-dev] [Bug 2613] New: test_Wise and test_psw fail under Python 2.3 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2613 Summary: test_Wise and test_psw fail under Python 2.3 Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Under Python 2.3.7 on Linux x86_64 system, gcc 4.3.2 and numpy 1.1.1, both test_Wise and test_psw fail. The output is ====================================================================== FAIL: test_Wise ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 189, in runSafeTest expected_handle) File "run_tests.py", line 288, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'doctest of Bio.Wise._build_align_cmdline ... ok\n' Expected: 'Doctest: Bio.Wise._build_align_cmdline ... ok\n' ====================================================================== FAIL: test_psw ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 189, in runSafeTest expected_handle) File "run_tests.py", line 288, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'doctest of Bio.Wise.psw.parse_line ... ok\n' Expected: 'Doctest: Bio.Wise.psw.parse_line ... ok\n' ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 12:22:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 12:22:45 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810061622.m96GMjgk021883@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 12:22 EST ------- I think that's an annoying variation in doctest itself - we might need to add some magic to the test framework to cope with this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 12:30:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 12:30:20 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061630.m96GUKxx022407@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 12:30 EST ------- Since it seems to work for older versions of python too, I've checked in the one line "super" change. See Bio/Restriction/Restriction.py revision 1.7 in CVS. Fr??d??ric said to me by email that he will try and look into this further, so I'm leaving this bug open for now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From zac at zacbrown.org Mon Oct 6 12:35:30 2008 From: zac at zacbrown.org (Zac Brown) Date: Mon, 06 Oct 2008 12:35:30 -0400 Subject: [Biopython-dev] taxonomic labels Message-ID: <48EA3E52.3070704@zacbrown.org> Hi all, Just a quick question with regard to using the Entrez module. I am looking for a way to get a dictionary for an organism's taxonomy, that is something like: blah = {'domain':'xyz','family':'xyz','class':'xyz'...} and so on. Is there some uniform way to generate this type of information? Thanks, Zac From biopython at maubp.freeserve.co.uk Mon Oct 6 13:10:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 18:10:52 +0100 Subject: [Biopython-dev] taxonomic labels In-Reply-To: <48EA3E52.3070704@zacbrown.org> References: <48EA3E52.3070704@zacbrown.org> Message-ID: <320fb6e00810061010u490b9257x4f9a908917504d90@mail.gmail.com> On Mon, Oct 6, 2008 at 5:35 PM, Zac Brown wrote: > Hi all, > > Just a quick question with regard to using the Entrez module. I am looking > for a way to get a dictionary for an organism's taxonomy, that is something > like: > > blah = {'domain':'xyz','family':'xyz','class':'xyz'...} and so on. Is there > some uniform way to generate this type of information? > > Thanks, > > Zac This isn't really a question for the dev-mailing list, the general discussion list would be better. Anyway, have you looked at the taxonomy lineage entries? from Bio import Entrez ncbi_taxon_id = "9606" handle = Entrez.efetch(db="taxonomy",id=ncbi_taxon_id,retmode="XML") records = Entrez.read(handle) assert len(records)==1 lineage = records[0]["LineageEx"] print lineage This should contain the information you want, but there are a number of "no rank" entries. To turn it into a dictionary as requested, try something like the following (on python 2.4 or later): answer =dict((x["Rank"],x["ScientificName"]) for x in lineage if x["Rank"] <> "no rank") print answer Peter From fkauff at biologie.uni-kl.de Mon Oct 6 13:15:56 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Mon, 06 Oct 2008 19:15:56 +0200 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> Message-ID: <48EA47CC.5070202@biologie.uni-kl.de> > Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit > test output will currently say Set or set depending on the version of > python. A further minor change to test_Nexus.py would be needed to > cope with this. > > Nexus.py and Trees.py fixed in cvs (together with some other changes). test_Nexus.py has been changed by removing the troublesome output. I assume when printing the elements of a set, their order is undefined, and so such an output should not be part of a test because it could potentially fail. Frank From biopython at maubp.freeserve.co.uk Mon Oct 6 13:36:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 18:36:53 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <48EA47CC.5070202@biologie.uni-kl.de> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> Message-ID: <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> Frank wrote: > >Peter wrote: >> Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit >> test output will currently say Set or set depending on the version of >> python. A further minor change to test_Nexus.py would be needed to >> cope with this. > > Nexus.py and Trees.py fixed in cvs (together with some other changes). Great. > test_Nexus.py has been changed by removing the troublesome output. I assume > when printing the elements of a set, their order is undefined, and so such > an output should not be part of a test because it could potentially fail. Yes, in theory we cannot expect the order of the elements in a set to be consistent - so this looks like a simple solution :) I'll rerun the test suite tomorrow on Python 2.6, but apart from Bio.Restriction I think we are OK on the the set/Set issue. There's a complex __init__ / super issue in Bio.Restriction on Bug 2604 which may be solved (Eric is hoping to investigate further time permitting). Any additional eyes on this couldn't hurt. See http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Are there any other python 2.6 issues? Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 6 18:32:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:32:57 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200810062232.m96MWv4T017893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 18:32 EST ------- Could you try out the Bio/Seq.py revision 1.35 from CVS in which the Seq object now has a find() method which acts like that of a python string (plus strip and split - see Bug 2596). Comments/revisions/improvments/objections here or on the mailing list please. We can also discuss additional behaviour, either as additional Seq methods (e.g. search? finditer?) or perhaps via additional arguments to find(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 18:36:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:36:08 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200810062236.m96Ma8Ax018176@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 18:36 EST ------- Checked in a variant of this code but with alphabet checking and additions to test_seq.py as well (plus a provisional .find() method - see Bug 2601). CVS changes: Bio/Seq.py revision 1.35 Tests/test_seq.py revision 1.18 Tests/output/test_seq revision 1.15 I'm marking this bug as fixed, but feel free to add any comments/revisions/improvments/objections here or on the mailing list please. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 18:36:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:36:10 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200810062236.m96MaAnb018189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Bug 2351 depends on bug 2596, which changed state. Bug 2596 Summary: Add string like split, strip, rstrip and lstrip methods to the Seq object http://bugzilla.open-bio.org/show_bug.cgi?id=2596 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 19:45:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 19:45:05 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810062345.m96Nj5Ui021964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-10-06 19:45 EST ------- If you look at test_psw and test_wise, you'll see that these make use of Python's generic test framework, with asserts in the test code. Instead, Biopython's testing framework expects each test code to print out stuff, which then gets matched to an output file. Sometimes it makes more sense to use Python's testing framework directly; there are several more tests for which the output file required by Biopython does not contain useful information (output/test_Cluster is another example). In such cases, I suggest we stop requiring the output file and simply rely on Python's testing framework directly. This will solve the issue with test_Wise and test_psw, and will let us get rid of unnecessary output files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 04:03:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 04:03:53 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200810070803.m9783rxd015866@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from fkauff at biologie.uni-kl.de 2008-10-07 04:03 EST ------- Nexus.Trees has been extended to deal with internal node names, or "special comments" in the format [& blablalba]. Such comments comments can appear directly after the taxon label, after the closing parentheses, or between branchlength / support values attached to a node or a taxon labels, such as (a,(b,(c,d)[&hi there])) (a,(b[&hi there],c)) (a,(b:0.123[&hi there],c[&heyho]:0.3)) (a,(b,c)0.4[&comment]:0.95) The comments are stored without change in the corresponding node object and can be accessed like >>> t=Trees.Tree('(a,(b:0.123[&hi there],c[&heyho]:0.3))') >>> print t.node(3).data.comment [&hi there] >>> print t.node(4).data.comment [&heyho] >>> The comments are not parsed in any way - internal labels vary greatly in syntax, and are used to store all kinds of information. But at least they are now read and stored, and users can deal with them the way they like. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 13:07:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 13:07:33 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810071707.m97H7XhN015588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-07 13:07 EST ------- (In reply to comment #2) > If you look at test_psw and test_wise, you'll see that these make use of > Python's generic test framework, with asserts in the test code. Instead, > Biopython's testing framework expects each test code to print out stuff, > which then gets matched to an output file. Sometimes it makes more sense > to use Python's testing framework directly; there are several more tests > for which the output file required by Biopython does not contain useful > information (output/test_Cluster is another example). In such cases, I > suggest we stop requiring the output file and simply rely on Python's > testing framework directly. This will solve the issue with test_Wise and > test_psw, and will let us get rid of unnecessary output files. So if there is an expected output file, then run_tests.py will continue to do the comparison as now. However, if there is no output file it will instead just run the code - which presumably will throw an exception if something is wrong (even just an assert statement)? I haven't looked at run_tests.py to see how easy such a change would be, but in principle it sounds fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 19:17:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 19:17:50 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810072317.m97NHo5v024624@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-10-07 19:17 EST ------- (In reply to comment #3) > So if there is an expected output file, then run_tests.py will continue to do > the comparison as now. However, if there is no output file it will instead > just run the code - which presumably will throw an exception if something is > wrong (even just an assert statement)? > A safer approach might be to check if the test generates any output, since tests that use an output file now print the name of the test first. Another approach is to do the output comparison inside of each test script that produces output instead of in run_tests.py. Basically, this means that the compare_output function in run_tests.py should be moved to a separate script, which gets imported by each test script that wants to use compare_output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:23:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:23:21 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200810081523.m98FNLYW026623@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:23 EST ------- Bug 2530 and Bug 2457 are fixed in CVS as of Bio/Seq.py revision 1.37 (with the unit test updated in test_seq.py revision 1.20). Old behaviour (e.g. Biopython 1.48): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> TranslationError (Bug 2547) translate("NNN") -> TranslationError (Bug 2547) translate("TA?") -> "*" (Bug 2530) New behaviour (CVS as things stand): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> "X" translate("NNN") -> "X" translate("TA?") -> TranslationError Note that this new behaviour (translation of ambiguous possible stop codons) could be made optional for backwards compatibility, but I would be surprised if anyone would want the old behaviour. Also, we could make the possible stop character an optional argument, but that then brings up questions about how to represent this in the Alphabet objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:25:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:25:07 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200810081525.m98FP7cV026776@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:25 EST ------- (In reply to comment #13) > If there is agreement that changing the behaviour of Bio.Seq.translate() as > described in Bug 2547 is desirable, then we end up fixing both issues at the > same time. I think an agreement was reached. Bug 2530 and Bug 2457 are fixed in CVS as of Bio/Seq.py revision 1.37 (with the unit test updated in test_seq.py revision 1.20). Old behaviour (e.g. Biopython 1.48): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> TranslationError (Bug 2547) translate("NNN") -> TranslationError (Bug 2547) translate("TA?") -> "*" (Bug 2530) New behaviour (CVS as things stand): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> "X" translate("NNN") -> "X" translate("TA?") -> TranslationError I dare say the implementation might be improved or optimised, but I think this is a good improvement for the functionality. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:35:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:35:39 -0400 Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py In-Reply-To: Message-ID: <200810081535.m98FZdsd027605@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2583 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:35 EST ------- As per comment 2, I'm assuming this is a duplicate of the previously reported issue (for which no bug was filed). Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:37:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:37:11 -0400 Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace os.popen3 with subprocess.Popen In-Reply-To: Message-ID: <200810081537.m98FbBPr027708@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2528 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:37 EST ------- See also Bug 2480 which suggests using the subprocess module to deal with Windows only issues with spaces in filenames. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:48:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:48:18 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810081548.m98FmIDe028487@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:48 EST ------- I believe the only remaining issue on this bug is improving the failure/skip message from the BioSQL tests. On a CVS checkout, this can vary depending on if MySQLdb is installed or not. If MySQLdb is not installed, the message is: > Install MySQLdb or correct Tests/setup_BioSQL.py (not important if > you do not plan to use BioSQL). If MySQLdb is installed, currently setup_BioSQL.py includes the default settings used on http://www.biopython.org/wiki/BioSQL which if not setup gives: > Connection failed, check settings in Tests/setup_BioSQL.py > if you plan to use BioSQL: ... (The actual database driver error is included as I found this very helpful in actually getting BioSQL setup and working.) Alternatively, we can leave setup_BioSQL.py with missing settings, which would currently show the following message: > Enter your settings in Tests/setup_BioSQL.py > (not important if you do not plan to use BioSQL). My intention with setup_BioSQL.py was that it would all be "ready to go" for people trying out BioSQL following the wiki. People without mySQLdb installed wouldn't see a nasty message. The only downside (the message you saw) is for people who have mySQLdb installed, but have not setup BioSQL yet. I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to have no default settings (making setting up and testing BioSQL just a little bit harder). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:50:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:50:20 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810081550.m98FoKpU028611@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:50 EST ------- Does anyone have any objections to this three line change? Its "just" doing this to the Seq, MutableSeq and SeqRecord classes: old: class Seq : ... new: class Seq(object) : ... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:51:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:51:06 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200810081551.m98Fp6pK028664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:51 EST ------- I think this is all done now :) Marking as fixed -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 11:56:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:56:46 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200810081556.m98Fuk0s029044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #36 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:56 EST ------- as the main use cases are now covered, I'm marking this bug as fixed. For SeqRecord objects with no taxonomy information, nothing has changed. For SeqRecord objects with taxonomy information AND an NCBI taxon ID, we now record either the full taxonomy via Bio.Entrez if requested, or a stub entry which can be completed by running load_ncbi_taxonomy.pl later. For the atypical case of sequences with taxonomy information but NO NCBI taxon ID, the old behaviour continues - new entries will be created in the taxon tables for the given lineage, without attempting to match existing entries. To do this properly would require some clever heuristics. If this final situation is a real issue, we can re-visit this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 12:03:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 12:03:26 -0400 Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector In-Reply-To: Message-ID: <200810081603.m98G3QUA029538@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2592 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 12:03 EST ------- Now that we have decided to drop Numeric support, it would be possible to press ahead with a gradual move from the numpy.oldnumeric.* to the new numpy.* API. Note that the suggested code would need to be tweaked slightly not to use scipy for the determinant. See: http://lists.open-bio.org/pipermail/biopython/2008-September/004509.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 8 13:12:07 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 8 Oct 2008 18:12:07 +0100 Subject: [Biopython-dev] Time to deprecate Bio.Transcribe? Message-ID: <320fb6e00810081012x54d82b44ga0f7bc0dcb0cf4b9@mail.gmail.com> In Biopython 1.48 the module Bio.Transcribe was described as obsolete, both in the docstring and the tutorial which also warned it was likely to be deprecated: > 3.9 Transcription and Translation Continued > > In the previous sections we talked about the transcription and > translation functions in the Bio.Seq module, which are intended > to be very simple and easy to use. > > There is also an older Bio.Translate module which has a few > more advanced options, but is more complicated to use. > Additionally there is also an older Bio.Transcribe module, but > as this is now obsolete and likely to be deprecated, we will not > discuss it here. So, I'd like to now deprecate Bio.Transcribe for the next release. Any objections/comments? Peter P.S. I'm also hoping that for the next release we can finish Bug 2381 as well, and then mark Bio.Translate as obsolete. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 13:20:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:20:26 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810081720.m98HKQjU002529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2600 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 13:20 EST ------- Just to note that issuing a deprecation warning requires using new style properties, which requires making the Seq and MutableSeq objects into new style classes - this was filed as a separate issue, Bug 2600. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 13:20:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:20:29 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810081720.m98HKT9W002546@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2509 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 13:25:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:25:23 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810081725.m98HPNCt002893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 13:25 EST ------- (In reply to comment #4) > A safer approach might be to check if the test generates any output, since > tests that use an output file now print the name of the test first. That sounds fine - but currently test_Wise, test_psw and test_Cluster DO have some output, e.g. test_Cluster test_clusterdistance (test_Cluster.TestCluster) ... ok test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok test_kcluster (test_Cluster.TestCluster) ... ok test_matrix_parse (test_Cluster.TestCluster) ... ok test_median_mean (test_Cluster.TestCluster) ... ok test_somcluster (test_Cluster.TestCluster) ... ok test_treecluster (test_Cluster.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.015s OK > Another approach is to do the output comparison inside of each test script > that produces output instead of in run_tests.py. Basically, this means that > the compare_output function in run_tests.py should be moved to a separate > script, which gets imported by each test script that wants to use > compare_output. I can see what you have in mind here, but if we can avoid a separate "helper script" it would nicer (and reduce end user confusion). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:44:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 15:44:05 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810081944.m98Ji56C013548@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #6 from bsouthey at gmail.com 2008-10-08 15:44 EST ------- (In reply to comment #5) > I believe the only remaining issue on this bug is improving the failure/skip > message from the BioSQL tests. On a CVS checkout, this can vary depending on > if MySQLdb is installed or not. > > If MySQLdb is not installed, the message is: > > > Install MySQLdb or correct Tests/setup_BioSQL.py (not important if > > you do not plan to use BioSQL). > > If MySQLdb is installed, currently setup_BioSQL.py includes the default > settings used on http://www.biopython.org/wiki/BioSQL which if not setup gives: > > > Connection failed, check settings in Tests/setup_BioSQL.py > > if you plan to use BioSQL: ... > > (The actual database driver error is included as I found this very helpful in > actually getting BioSQL setup and working.) > > Alternatively, we can leave setup_BioSQL.py with missing settings, which would > currently show the following message: > > > Enter your settings in Tests/setup_BioSQL.py > > (not important if you do not plan to use BioSQL). > > My intention with setup_BioSQL.py was that it would all be "ready to go" for > people trying out BioSQL following the wiki. People without mySQLdb installed > wouldn't see a nasty message. The only downside (the message you saw) is for > people who have mySQLdb installed, but have not setup BioSQL yet. > > I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to > have no default settings (making setting up and testing BioSQL just a little > bit harder). > I think that a user must be forced to change Tests/setup_BioSQL.py or similar because these settings may not be correct. Especially if dbuser is not root, dbuser lacks permissions and necessary privileges or dbuser has a password (security). So the current message you get if DBDRIVER is not defined is okay: "Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL)." A stray thought is to create a BioSQL configuration file (which is what Tests/setup_BioSQL.py is) when a user installs BioSQL. This would permit removing the need to enter that information when using BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 16:30:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 16:30:59 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810082030.m98KUxO6016609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 16:30 EST ------- (In reply to comment #6) > > My intention with setup_BioSQL.py was that it would all be "ready to go" for > > people trying out BioSQL following the wiki. People without mySQLdb > > installed wouldn't see a nasty message. The only downside (the message > > you saw) is for people who have mySQLdb installed, but have not setup > > BioSQL yet. > > > > I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to > > have no default settings (making setting up and testing BioSQL just a little > > bit harder). > > > > I think that a user must be forced to change Tests/setup_BioSQL.py or similar > because these settings may not be correct. Especially if dbuser is not root, > dbuser lacks permissions and necessary privileges or dbuser has a password > (security). So the current message you get if DBDRIVER is not defined is okay: > > "Enter your settings in Tests/setup_BioSQL.py (not important if you do > not plan to use BioSQL)." Done in Tests/setup_BioSQL.py CVS revision 1.4, and I've also reworded http://www.biopython.org/wiki/BioSQL slightly as a result. Marking this bug as fixed. Thanks Bruce, Peter > A stray thought is to create a BioSQL configuration file (which is what > Tests/setup_BioSQL.py is) when a user installs BioSQL. This would permit > removing the need to enter that information when using BioSQL. Sadly I don't think that would be easy. Currently installing BioSQL is a largely manual process, with lots and lots of options (database program, name, username, password etc). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 9 11:08:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Oct 2008 11:08:00 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810091508.m99F80WA030837@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #836 is|0 |1 obsolete| | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-09 11:07 EST ------- (From update of attachment 836) I've just added transcribe and back_transcribe methods to the Seq object in CVS. Bio/Seq.py revision 1.40 Tests/test_seq.py revision 1.24 Tests/output/test_seq revision 1.18 This bug is still open to cover the translation method(s). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Oct 9 11:31:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 16:31:04 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: References: <20080923120809.GG13074@localdomain> Message-ID: <320fb6e00810090831x28015a2bg43931849acfecf34@mail.gmail.com> Leighton Pritchard wrote: > Hi all, > > It looks like Bio.DBXRef provides a dictionary of dictionaries that > associate database identifiers from a number of file formats with the > appropriate databases. This sort of thing might be useful to keep around > (i.e. not to have to rebuild from scratch) if there is an intention to > populate the dbxref table with consistent Dbnames for divergent identifiers. > However, Peter appears to have noted in the code for Loader.py that this > behaviour would be inconsistent with the other Bio* projects, and mentions > bug 2405 in that context. > > L. As things stand, we don't used this kind of mapping in BioSQL, so I see no reason not to deprecate Bio.DBXRefs now. Of course, I can be talked out of this if anyone has a good use case example. Brad wrote: >> DBXref is associated with all the Martel parsing, so it can be >> removed/deprecated as well. It was used in building SeqRecords from >> Martel descriptions (Bio.builders.SeqRecord.sequence). I've just marked Bio.DBXRef as deprecated for 1.49. Returning to an earlier point on this thread, I have also removed Bio.SGMLExtractor (which was deprecated in 1.46). I think that wraps up the Martel/Mindy deprecations for now - in a few releases time we'll have the much simpler task of removing these modules. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 12:22:05 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 17:22:05 +0100 Subject: [Biopython-dev] Bio.Ndb obsolete? Message-ID: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> Hi all, I just had a very superficial look at the Bio.Ndb module. This is an HTML parser written six years ago, with its last real update five years ago. The given URL doesn't work, but the server is still up - however from first glance the whole page layout has changed. For example, compare the old HTML example under Bio/Ndb/PR0004.html to what seems to be the current equivalent: http://ndbserver.rutgers.edu/servlet/IDSearch.NDBSearch1?id=PR0004 I think it is safe to say Bio.Ndb stopped working some time ago due to the website's HTML changing. Does anyone here use this database? Maybe we should ask on the mailing list, and assuming no one is interested, just deprecate this code. For future, should we have a statement http://www.biopython.org/wiki/Contributing and in the tutorial that we don't want to add any HTML parsers to Biopython? Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 13:19:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 18:19:11 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> Message-ID: <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> Peter wrote, > I'll rerun the test suite tomorrow on Python 2.6, but apart from > Bio.Restriction I think we are OK on the the set/Set issue. > > There's a complex __init__ / super issue in Bio.Restriction on Bug > 2604 which may be solved (Eric is hoping to investigate further time > permitting). Any additional eyes on this couldn't hurt. See > http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Using CVS, Bio.Restriction seems happy now - in addition to the "super" change for Bug 2604, I have also made the sets/set change. > Are there any other python 2.6 issues? I'd forgotten about the Bio.Crystal exception problem (we didn't file a bug on this): .../Bio/Crystal/__init__.py:42: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message Otherwise all core the tests pass on my Linux python 2.6 machine (skipping those needing reportlab, MySQLdb or other optional modules). Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 16:21:15 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 21:21:15 +0100 Subject: [Biopython-dev] Bio.Ndb obsolete? In-Reply-To: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> References: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> Message-ID: <320fb6e00810091321lb3ec34eua44aeeac462ced1c@mail.gmail.com> On Thu, Oct 9, 2008 at 5:22 PM, Peter wrote: > Hi all, > > I just had a very superficial look at the Bio.Ndb module. This is an > HTML parser written six years ago, with its last real update five > years ago. The given URL doesn't work, but the server is still up - > however from first glance the whole page layout has changed. > > For example, compare the old HTML example under Bio/Ndb/PR0004.html to > what seems to be the current equivalent: > http://ndbserver.rutgers.edu/servlet/IDSearch.NDBSearch1?id=PR0004 > > I think it is safe to say Bio.Ndb stopped working some time ago due to > the website's HTML changing. Does anyone here use this database? > Maybe we should ask on the mailing list, and assuming no one is > interested, just deprecate this code. If we do drop Bio.Ndb, then I wonder if the related Bio.Crystal module is still relevant? > For future, should we have a statement > http://www.biopython.org/wiki/Contributing and in the tutorial that we > don't want to add any HTML parsers to Biopython? I've made some fairly small changes to the wiki "Contributing" page, which includes this and also mentioning unit tests and documentation for code contributions. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 16:56:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 21:56:43 +0100 Subject: [Biopython-dev] Bio.mathfns obsolete? And Bio.clistfns too? Message-ID: <320fb6e00810091356k36f1fca5ib431504eaeb83818@mail.gmail.com> I'm still in clean up mode! Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now uses numpy more heavily instead. I think that Bio.mathfns (and its C implementation) are no longer used anywhere in Biopython (and I would be surprised if anyone else is using this module). I'm suggesting deprecating Bio.mathfns and Bio.cmathfns for the next release. Similarly, Bio.listfns and its C implementation Bio.clistfns might also be deprecated with a little effort. Some of this code seems to predate things like the python sets module (and its replacement, the built in set). Based on a quick grep, only three modules currently use Bio.listfns: Bio.MarkovModel - uses only listfns.itemindex Bio.NaiveBayes - uses listfns.itemindex, listfns.items and listfns.contents Bio.MaxEntropy - uses listfns.itemindex and listfns.items At first glance, listfns.items(...) might be replaced with list(set(...)) leaving just two trivial functions listfns.items and listfns.contents which don't really justify an entrie module (plus C code). On the other hand, these may be performance bottlenecks for Bio.NaiveBayes and Bio.MaxEntropy which could justify keeping the C code. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 17:03:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 22:03:42 +0100 Subject: [Biopython-dev] Bio.mathfns obsolete? And Bio.clistfns too? And Bio.stringfns? Message-ID: <320fb6e00810091403k2c0d09bbk4a6962bd9e614ab3@mail.gmail.com> On Thu, Oct 9, 2008 at 9:56 PM, Peter wrote: > I'm still in clean up mode! ... I think that Bio.mathfns (and its C > implementation) are no longer used anywhere in Biopython ... > I'm suggesting deprecating Bio.mathfns and Bio.cmathfns for > the next release. > > Similarly, Bio.listfns and its C implementation Bio.clistfns might > also be deprecated with a little effort. ... And on a related note, I think Bio.stringfns and its C implementation Bio.cstringfns are also now unused in Biopython, and like Bio.mathfns and Bio.cmathfns should be deprecated for the next release. Peter From biopython at maubp.freeserve.co.uk Fri Oct 10 05:42:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Oct 2008 10:42:46 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> Message-ID: <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> > I'd forgotten about the Bio.Crystal exception problem (we didn't file > a bug on this): Fixed in CVS. > Otherwise all core the tests pass on my Linux python 2.6 machine > (skipping those needing reportlab, MySQLdb or other optional modules). All core tests, plus the graphics ones using reportlab, and the BioSQL ones using MySQLdb, now pass on my Linux python 2.6 machine. The only things I have not covered are: test_GFF, test_PopGen_FDist, test_PopGen_SimCoal, test_Wise, test_psw which require additional command line tools etc. Note that with reportlab 2.2 under python 2.6 there is a deprecation warning from reportlab/pdfgen/canvas.py about md5, this has been fixed to use hashlib in the reportlab SVN. Note that with MySQLdb 1.2.2 under python 2.6 there is deprecation warning from MySQLdb/__init__.py about the sets module, which does not seem to have been fixed on the 1.2 branch or the trunk in their SVN. I have reported this issue as a bug on the MySQLdb sourceforge page. So, as far as I can see, we are OK with python 2.6 on Linux. We should probably try and get this tested on Windows and on the Mac too for completeness. Peter From biopython at maubp.freeserve.co.uk Fri Oct 10 10:39:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Oct 2008 15:39:50 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> Message-ID: <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> > > So, as far as I can see, we are OK with python 2.6 on Linux. We > should probably try and get this tested on Windows and on the Mac too > for completeness. > Something the unit tests didn't flag up is the deprecation of popen2, os.popen2, os.popen3, and os.popen4 in python 2.6 - see http://www.python.org/dev/peps/pep-0361/ Ignoring deprecated code, this affects the following modules: Bio.Application Bio.Blast.NCBIStandalone - see also Bug 2528 Bio.Clustalw Bio.Emboss.Applications Bio.PDB.NACCESS It might make sense to ensure all these used Bio.Application rather than re-inventing the wheel? We would then have a single point for calling command line tools, which could use the subprocess module on Python 2.4+, falling back on os.popen* for python 2.3. As a bonus this might cope with filenames with spaces better on Windows. While we are discussing this, does anyone know why Bio.Blast.NCBIStandalone doesn't use Bio.Blast.Application (which subclasses Bio.Application)? Looking over the CVS, eight years ago in revision 1.5 of Bio/Blast/NCBIStandalone.py Jeff added the code for calling standalone BLAST. Then Brad added Bio/Blast/Applications.py later (about six years ago). Note that we also have plenty of modules using os.system too (where there is no need to capture the command's output): Bio.PDB.DSSP Bio.PDB.NACCESS Bio.PDB.PDBList Bio.PDB.PSEA Bio.PDB.ResidueDepth Bio.Wise Bio.PopGen.FDist.Controller Bio.PopGen.SimCoal.Controller Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 10 17:07:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:07:54 -0400 Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace os.popen3 with subprocess.Popen In-Reply-To: Message-ID: <200810102107.m9AL7sSq013518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2528 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:07 EST ------- Note that os.popen3 is deprecated in python 2.6 which gives another reason for moving to subprocess. This issue is fixed in Bio/Blast/NCBIStandalone.py revision 1.82, based on changed discussed on See also Bug 2480. We use subprocess where present (i.e. python 2.4+) and fall back to os.popen3 (for python 2.3). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 17:12:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:12:16 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810102112.m9ALCGi8013788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #36 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:12 EST ------- Note that os.popen3 is deprecated in python 2.6 which gives another reason for moving to subprocess. After testing on Linux as well, I have updated Bio/Blast/NCBIStandalone.py in CVS revision 1.82, based on changes discussed here. See: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython We now use subprocess where present (i.e. python 2.4+) and fall back to os.popen3 (for python 2.3). This fixes Bug 2528, and should fix this as well (Bug 2480) - assuming we leave things as they are for spaces in the database argument. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 17:26:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:26:34 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810102126.m9ALQYMH014735@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:26 EST ------- Change made in CVS, marking as fixed. Bio/Seq.py revision 1.42 Bio/SeqRecord.py revision 1.21 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 17:26:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:26:37 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810102126.m9ALQb3Q014747@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 Bug 2509 depends on bug 2600, which changed state. Bug 2600 Summary: enhance Seq and SeqRecord to new style classes http://bugzilla.open-bio.org/show_bug.cgi?id=2600 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 18:03:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 18:03:21 -0400 Subject: [Biopython-dev] [Bug 2525] The unit tests GUI run_tests.py does not track skipped tests In-Reply-To: Message-ID: <200810102203.m9AM3LgK017028@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2525 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 18:03 EST ------- Unit test GUI removed in CVS, marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 01:08:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 01:08:34 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810110508.m9B58Y2K013621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #37 from drpatnaik at yahoo.com 2008-10-11 01:08 EST ------- Thank you. Confirming that CVS version 1.82 of the file works fine on Windows XP SP2 with Python 2.5.2. A note: A custom script using Bio/Blast can appear to hang, and the results file truncated, if the 'error handle' is used before the 'result handle': res_hdl, err_hdl = NCBIStandalone.blastall(my_blast, 'blastn', my_db, my_seq) # OK my_result = res_hdl.read() my_error = err_hdl.read() # Not OK my_error = err_hdl.read() my_result = res_hdl.read() Some recapitulated notes: 1. File-names, file-paths, or database values can contain spaces. 2. There is no special, Windows-specific requirement to use backslash (\) as the directory separator. 3. There is no special, Windows-specific requirement to enclose a value inside double-quotes (") instead of single-quotes ('), or to use Python's 'r'. 4. Except for database values, DOS 8.3 file-names (short file-names) can be used. 5. If the database value contains a space, it should be enclosed in double-quotes ("). 6. If the database value refers to multiple databases, and at least one of them has a space in it, then the pointer for that database should be additionally enclosed in backslash-escaped double-quotes (\"). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 01:44:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 01:44:50 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810110544.m9B5iouP016206@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #38 from drpatnaik at yahoo.com 2008-10-11 01:44 EST ------- (In reply to comment #37) > 4. Except for database values, DOS 8.3 file-names (short file-names) can be > used. Sorry, short file-names _can_ be used for database values [but they cannot be generated by win32api.GetShortPathName, etc.]. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 07:52:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 07:52:37 -0400 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or reportlab in run_tests.py In-Reply-To: Message-ID: <200810111152.m9BBqb6x006207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Summary|Handle missing libraries |Handle missing libraries |like TextTools in |like numpy or reportlab in |run_tests.py |run_tests.py ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-11 07:52 EST ------- After the switch from Numeric to numpy, and the deprecation of Martel/Mindy, this only applies to two libraries: import numpy import reportlab Retitling bug, and downgrading to minor. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 08:37:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 08:37:49 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810111237.m9BCbndK009847@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-11 08:37 EST ------- For the sake of discussion, here is a simple (i.e. minimal) translate method for the Seq object (any checked in code should also simplify the current Seq module's translate function to call this for Seq objects). def translate(self, table = "Standard", stop_symbol = "*"): """Terms a nucleotide sequence into a protein sequence (amino acids). This method will translate DNA or RNA sequences, but for a protein sequence an exception is raised. table - Which codon table to use? This can be either a name (string) or an NCBI identifier (integer). NOTE - Ambiguous codons like "TAN" or "NNN" could be an amino acid or a stop codon. These are translated as "X". Any invalid codon (e.g. "TA?" or "T-A") will throw a TranslationError. NOTE - Does NOT support gapped sequences. NOTE - This does NOT behave like the python string's translate method. For that use str(my_seq).translate(...) instead. """ try: table_id = int(table) except ValueError: table_id = None if isinstance(self.alphabet, Alphabet.ProteinAlphabet) : raise ValueError, "Proteins cannot be translated!" if self.alphabet==IUPAC.unambiguous_dna: if table_id is None: codon_table = CodonTable.unambiguous_dna_by_name[table] else: codon_table = CodonTable.unambiguous_dna_by_id[table_id] elif self.alphabet==IUPAC.ambiguous_dna: if table_id is None: codon_table = CodonTable.ambiguous_dna_by_name[table] else: codon_table = CodonTable.ambiguous_dna_by_id[table_id] elif self.alphabet==IUPAC.unambiguous_rna: if table_id is None: codon_table = CodonTable.unambiguous_rna_by_name[table] else: codon_table = CodonTable.unambiguous_rna_by_id[table_id] elif self.alphabet==IUPAC.ambiguous_rna: if table_id is None: codon_table = CodonTable.ambiguous_rna_by_name[table] else: codon_table = CodonTable.ambiguous_rna_by_id[table_id] else: if table_id is None: codon_table = CodonTable.ambiguous_generic_by_name[table] else: codon_table = CodonTable.ambiguous_generic_by_id[table_id] protein = _translate_str(str(self), codon_table, stop_symbol) if stop_symbol in protein : alphabet = Alphabet.HasStopCodon(codon_table.protein_alphabet, stop_symbol = stop_symbol) else : alphabet = codon_table.protein_alphabet return Seq(protein, alphabet) Unlike my earlier comment 11, I'm now leaning to a single trnaslation method (perhaps with extra arguments). You'll notice here I am suggesting using the method name "translate" even though this clashes with the python string method of the same name. This could cause confusion if the Seq object is passed to non-Biopython code which expects a string, but overall seems much simpler for end users. Other method names could be: * translate_ (trailing underscore, see PEP8) which I think is ugly. * translation (noun rather than verb), differs from established style. * bio_translate which is I think too long. I'm thinking we could also support "start" and "end" optional arguments (named after those used in the python string methods, and behaving in the same way) for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would give the three forward reading frames. An optional boolean argument could enable treating the sequence as a CDS - verifying it starts with a start codon (which would always be translated as M) and verifying it ends with a stop codon (with no other stop codons in frame), which would not be translated. Following BioPerl, this argument could be called "complete". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Oct 13 08:00:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Oct 2008 13:00:08 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> Message-ID: <320fb6e00810130500mf1f20c1gc04f1aa782d5e1f@mail.gmail.com> > Something the unit tests didn't flag up is the deprecation of popen2, > os.popen2, os.popen3, and os.popen4 in python 2.6 - see > http://www.python.org/dev/peps/pep-0361/ Some progress: Bio.Blast.NCBIStandalone - fixed in CVS, uses subprocess where available Bio.Application - fixed in CVS, uses subprocess where available These two changes passed my own hand testing, but as we don't have any unit tests covering these having a 3rd party double check would be a good idea. Bio.Clustalw - actually only uses os.popen which is still OK Bio.Emboss.Applications - only via Bio.Application, so OK Leaving just: Bio.PDB.NACCESS - uses os.popen3, looks simple to update but I don't have naccess installed yet. I suppose this would make a nice unit test too. See http://www.bioinf.manchester.ac.uk/naccess/ Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 14 06:16:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 06:16:17 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810141016.m9EAGHma005952@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-14 06:16 EST ------- We seem to have reached a consensus on the mailing list to use "translate" for the Seq object method (even though this clashes with the python string method of the same name). See: http://lists.open-bio.org/pipermail/biopython/2008-October/004575.html I've checked some code based on that in comment 17 into CVS, and updated the test_seq.py unit test to cover this: Bio/Seq.py revision 1.44 Tests/test_seq.py revision 1.26 I'm leaving this bug open to discuss possible further optional arguments for the translate method (and perhaps for the Bio.Seq.translate function too). e.g. As I wrote in comment 17, > I'm thinking we could also support "start" and "end" optional arguments (named > after those used in the python string methods, and behaving in the same way) > for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would > give the three forward reading frames. This would give an alternative to: my_seq[i:j].translate(table) as: my_seq.translate(table, start=i, end=j) As with the python string methods, potentially the implementation could be slightly faster as a new Seq object doesn't need to be created for the slice. On the other hand, it does then offer two ways of doing the same thing. > An optional boolean argument could enable treating the sequence as a CDS - > verifying it starts with a start codon (which would always be translated as M) > and verifying it ends with a stop codon (with no other stop codons in frame), > which would not be translated. Following BioPerl, this argument could be > called "complete". Related to this, it would be useful to have a boolean option to stop translation at the first in frame stop codon (possible argument names for this include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). For comparison, see the translate_to_stop method in the semi-obsolete Bio.Translate.Translator object. We will also need to support back_translate before we can deprecate the old Bio.Translate module (see comment 6 and comment 7). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 12:42:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 12:42:41 -0400 Subject: [Biopython-dev] [Bug 2616] New: BioSQL support for Psycopg2 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2616 Summary: BioSQL support for Psycopg2 Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com Biopython 1.48 BioSQL does not support the psycopg2 PostgreSQL driver (http://www.initd.org/pub/software/psycopg/). Current support is for the psycopg1 driver only - the latest of which is 3 yrs old and no longer developed. As far as I can tell the only change is to how autocommit is flagged. PATCH: ========================================================================= diff -ruN BioSQL/BioSeqDatabase.py /usr/local/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py --- BioSQL/BioSeqDatabase.py 2008-08-27 17:34:16.000000000 +0100 +++ /usr/local/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py 2008-10-14 15:57:07.000000000 +0100 @@ -53,7 +53,7 @@ if kw.has_key("passwd"): kw["password"] = kw["passwd"] del kw["passwd"] - if driver == "psycopg" and not kw.get("database"): + if driver in ["psycopg", "psycopg2"] and not kw.get("database"): kw["database"] = "template1" try: conn = connect(**kw) @@ -134,7 +134,7 @@ # 1. PostgreSQL can load it all at once and actually needs to # due to FUNCTION defines at the end of the SQL which mess up # the splitting by semicolons - if self.module_name in ["psycopg"]: + if self.module_name in ["psycopg", "psycopg2"]: self.adaptor.cursor.execute(sql) # 2. MySQL needs the database loading split up into single lines of # SQL executed one at a time diff -ruN BioSQL/DBUtils.py /usr/local/lib/python2.5/site-packages/BioSQL/DBUtils.py --- BioSQL/DBUtils.py 2008-03-21 10:48:32.000000000 +0000 +++ /usr/local/lib/python2.5/site-packages/BioSQL/DBUtils.py 2008-10-14 15:57:28.000000000 +0100 @@ -68,7 +68,17 @@ def autocommit(self, conn, y = True): conn.autocommit(y) + _dbutils["psycopg"] = Psycopg_dbutils + +class Psycopg2_dbutils(Psycopg_dbutils): + def autocommit(self, conn, y = True): + if y: + conn.set_isolation_level(0) + else: + conn.set_isolation_level(1) + +_dbutils["psycopg2"] = Psycopg2_dbutils class Pgdb_dbutils(Generic_dbutils): """Add support for pgdb in the PyGreSQL database connectivity package. ======================================================================== Tests/test_BioSQL.py : [cymon at chara Tests]$ python test_BioSQL.py Load SeqRecord objects into a BioSQL database. ... ok Get a list of all items in the database. ... ok Test retrieval of items using various ids. ... ok Make sure Seqs from BioSQL implement the right interface. ... ok Check SeqFeatures of a sequence. ... ok Make sure SeqRecords from BioSQL implement the right interface. ... ok Check that slices of sequences are retrieved properly. ... ok Make sure all records are correctly loaded. ... ok Indepth check that SeqFeatures are transmitted through the db. ... ok ---------------------------------------------------------------------- Ran 9 tests in 19.749s OK With a tweak to test_BioSQL_SeqIO.py : 154 else : 155 #Should both be lists of strings... 156 old_f.qualifiers[key].sort() 157 new_f.qualifiers[key].sort() 158 assert old_f.qualifiers[key] == new_f.qualifiers[key] One record in the tests has two \allele features "T" and "C" so they need to be sorted before comparison. $ python test_BioSQL_SeqIO.py > out $ diff out output/test_BioSQL_SeqIO 0a1 > test_BioSQL_SeqIO $ BUT both _FAIL_ when run with the run_tests.py. The short exercises in the BioSQL wiki (after the unit tests) also run fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 13:25:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 13:25:24 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810141725.m9EHPOgt003394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-14 13:25 EST ------- Supporting psycopg2 sounds good :) What version of Biopython do you have? 1.48 or CVS as there have been some BioSQL changes recently (mostly to do with the taxonomy tables). I'm surprised the order of the qualifiers isn't being preserved - I think we should fix that rather than tweaking test_BioSQL_SeqIO.py to ignore this. Which version of the BioSQL schema do you have? It is possible that this is a BioSQL issue/difference in the PostgreSQL schema compared to the BioSQL schema which I have been using when running the tests. Also your problem about the two tests failing when run via run_tests.py is concerning. What happens if you do this: python run_tests.py -g test_BioSQL python run_tests.py -g test_BioSQL_SeqIO python run_tests.py test_BioSQL test_BioSQL_SeqIO cvs diff output/test_BioSQL output/test_BioSQL_SeqIO Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 14:09:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 14:09:08 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810141809.m9EI98Bs007989@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #2 from cymon.cox at gmail.com 2008-10-14 14:09 EST ------- (In reply to comment #1) > Supporting psycopg2 sounds good :) > > What version of Biopython do you have? 1.48 or CVS 1.48 > as there have been some > BioSQL changes recently (mostly to do with the taxonomy tables). I loaded taxonomy in with the Pg driver and load_ncbi_taxonomy.pl with no problem. > I'm surprised the order of the qualifiers isn't being preserved - I think we > should fix that rather than tweaking test_BioSQL_SeqIO.py to ignore this. Sure, that's probably a better approach :) > Which version of the BioSQL schema do you have? biosql-1.0.1 > It is possible that this is a > BioSQL issue/difference in the PostgreSQL schema compared to the BioSQL schema > which I have been using when running the tests. > > Also your problem about the two tests failing when run via run_tests.py is > concerning. What happens if you do this: > > python run_tests.py -g test_BioSQL > python run_tests.py -g test_BioSQL_SeqIO > python run_tests.py test_BioSQL test_BioSQL_SeqIO > cvs diff output/test_BioSQL output/test_BioSQL_SeqIO It really is broken when using run_tests.py, after running with the -g flag: $ cat output/test_BioSQL test_BioSQL Load SeqRecord objects into a BioSQL database. ... ERROR Get a list of all items in the database. ... ERROR Test retrieval of items using various ids. ... ERROR Make sure Seqs from BioSQL implement the right interface. ... ERROR Check SeqFeatures of a sequence. ... ERROR Make sure SeqRecords from BioSQL implement the right interface. ... ERROR Check that slices of sequences are retrieved properly. ... ERROR Make sure all records are correctly loaded. ... ERROR Indepth check that SeqFeatures are transmitted through the db. ... ERROR etc... Probably not the solution we're looking for... The problem is that run_test.py is not picking up the psycopg2 adapter and is deferring to the generic adapter, consequently it throws on "InternalError: DROP DATABASE cannot run inside a transaction block". Why that's the case when individually the test work OK is something I tried to track this down but just couldn't figure it... Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 05:14:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 05:14:47 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810150914.m9F9Elra032490@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 05:14 EST ------- (In reply to comment #2) > (In reply to comment #1) > > Supporting psycopg2 sounds good :) > > > > What version of Biopython do you have? 1.48 or CVS as there have > > been some BioSQL changes recently (mostly to do with the taxonomy > > tables). > > 1.48 > Could you update to Biopython CVS please? This now populates the taxon/taxon_name tables differently when there is an NCBI taxon ID (with the option to fetch lineages from Entrez). Once you're running CVS, could you attach a patch to this bug. That should make it easier for me to look at this. > > Which version of the BioSQL schema do you have? > > biosql-1.0.1 Good. > > Also your problem about the two tests failing when run via run_tests.py is > > concerning. What happens if you do this: > > > > python run_tests.py -g test_BioSQL > > python run_tests.py -g test_BioSQL_SeqIO > > python run_tests.py test_BioSQL test_BioSQL_SeqIO > > cvs diff output/test_BioSQL output/test_BioSQL_SeqIO > > It really is broken when using run_tests.py, after running with the -g flag: > $ cat output/test_BioSQL > test_BioSQL > Load SeqRecord objects into a BioSQL database. ... ERROR > Get a list of all items in the database. ... ERROR > Test retrieval of items using various ids. ... ERROR > Make sure Seqs from BioSQL implement the right interface. ... ERROR > Check SeqFeatures of a sequence. ... ERROR > Make sure SeqRecords from BioSQL implement the right interface. ... ERROR > Check that slices of sequences are retrieved properly. ... ERROR > Make sure all records are correctly loaded. ... ERROR > Indepth check that SeqFeatures are transmitted through the db. ... ERROR > etc... > > Probably not the solution we're looking for... This was really a diagnostic step, rather than a solution. > The problem is that run_test.py is not picking up the psycopg2 adapter and is > deferring to the generic adapter, consequently it throws on "InternalError: > DROP DATABASE cannot run inside a transaction block". Why that's the case when > individually the test work OK is something I tried to track this down but just > couldn't figure it... You must have edited test_setup_BioSQL.py correctly, so that's probably not the problem. Where did you install psycopg2? Using run_tests.py does some magic with the python path to make sure the local copy of Biopython you've just built is used, rather than any existing system installation of Biopython. Perhaps this is preventing python from finding psycopg2 somehow. You don't have any test files present called psycopg2.py do you? Alternatively, maybe there is something wrong with your adaptor code - but presumably this works outside the test suite? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From lpritc at scri.ac.uk Wed Oct 15 06:00:31 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 11:00:31 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram Message-ID: Hi, A while ago I wrote the GenomeDiagram library for drawing images of genomes and other large biological sequences, and collections of sequences (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). This library already uses Biopython objects (Seq, SeqFeature, etc.) and, like other modules in Bio.Graphics, has a dependency on Reportlab only. It's been published, and has found use in other groups, who seem to be using it without any issues - there's been a trickle of maintenance requests, but nothing of late other than questions from people new to Python. Now that I have managed to free up a little bit of time I'd like to revisit GenomeDiagram, tidy up the internals some more (there's some clunky stuff in there...), and contribute it to Bio.Graphics - which hasn't seen much traffic for a while. Looking at the current Bio.Graphics structure, I think that incorporating the (revised) library as Bio.Graphics.GenomeDiagram in a directory under Bio.Graphics would be a suitable approach. I'm happy to maintain this code for the foreseeable future, also - though help is, of course, welcome. There is written documentation, which I would happily move over to the wiki, and some testing in __name__ == '__main__', which could be expanded upon and moved over to a unit test format for consistency. One of the things I would like to do to expand on current functionality is to provide some library methods that produce commonly-desired output, similar to that in GenomeAtlas (http://www.cbs.dtu.dk/services/GenomeAtlas/), so that users don't have to know about the internals of GenomeDiagram, and something like a Bio.Graphics.GenomeDiagram.draw_seqrecord_cds(style='circular', gc_content=True, outfile='cds1.pdf') call would produce a simple circular diagram of CDS features with accompanying graph of GC content. I suggested something doing similar a while ago and got no feedback - does anyone object to this contribution, in principle or in practice? Or are there any other comments? I'm all (well, mostly) ears... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 06:24:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 11:24:42 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: References: Message-ID: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard wrote: > Hi, > > A while ago I wrote the GenomeDiagram library for drawing images of genomes > and other large biological sequences, and collections of sequences > (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... > Now that I have managed to free up a little bit of time I'd like to revisit > GenomeDiagram, tidy up the internals some more ..., and contribute it > to Bio.Graphics .. > I suggested something doing similar a while ago and got no feedback - does > anyone object to this contribution, in principle or in practice? Or are > there any other comments? I'm all (well, mostly) ears... I'm in favour of this (and have actually chatted to Leighton about this off list). One small thing I would change is switching colour to color for the argument/properties (the American spelling of color is the norm in all programming usage). Anyone using the existing stand alone GenomeDiagram library would have to make some small changes anyway (new import statements), so if there are going to be any other API changes it would be best to do them at the same time. Peter From lpritc at scri.ac.uk Wed Oct 15 06:39:11 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 11:39:11 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> Message-ID: On 15/10/2008 11:24, "Peter" wrote: > On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard > wrote: >> Hi, >> >> A while ago I wrote the GenomeDiagram library for drawing images of genomes >> and other large biological sequences, and collections of sequences >> (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... >> Now that I have managed to free up a little bit of time I'd like to revisit >> GenomeDiagram, tidy up the internals some more ..., and contribute it >> to Bio.Graphics .. >> I suggested something doing similar a while ago and got no feedback - does >> anyone object to this contribution, in principle or in practice? Or are >> there any other comments? I'm all (well, mostly) ears... > One small thing I would change is switching colour to color for the > argument/properties (the American spelling of color is the norm in all > programming usage). Fair point - I'll do that. Though, like those pesky Canadians (see Maple), I'm inclined to permit either spelling out of sheer bloody-mindedness ;). Historically (rather than etymologically), it's a holdover from working with interim EMBL-ish .tab files from Sanger, which use the British English spelling: """ FT /class="3.1.03" FT /colour=7 FT /gene="asnA" """ Would people see permitting either form of colour/color as potentially confusing? If so, I'm happy to go with the majority spelling. > Anyone using the existing stand alone > GenomeDiagram library would have to make some small changes anyway > (new import statements), so if there are going to be any other API > changes it would be best to do them at the same time. I agree. I see this as a break from the standalone library, and would be branching this version of GenomeDiagram from what had gone before. While I'd like to make API changes as low-impact as possible, I'm in favour of such changes where they support functional improvement. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 07:41:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 12:41:25 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu> <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com> Message-ID: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> On Fri, Sep 26 Peter wrote: > On Fri, Sep 26 Jared wrote: >> On Sep 26 Peter wrote: >> >>> Did you try the dot-plot example? >> >> I didn't, but it looked good. > > Hopefully I've pitched it right - I've tried to make it as simple as > possible, but the nested list comprehension is perhaps non-obvious. Old output: http://biopython.org/DIST/docs/tutorial/images/dot_plot.png I recently wanted to draw a dot plot for a larger pair of sequences, and found that the example code didn't scale well. There were two issues, the naive calculation and the fact that pylab.imshow has an upper limit for the size of matrix (due to memory). I've added a second more complicated version to the Tutorial in CVS using pylab.scatter for the plotting: http://biopython.org/DIST/docs/tutorial/images/dot_plot_scatter.png #Load two SeqRecord objects from Bio import SeqIO handle = open("ls_orchid.fasta") record_iterator = SeqIO.parse(handle, "fasta") rec_one = record_iterator.next() rec_two = record_iterator.next() handle.close() window = 7 step = 1 #Map every window sized sub-sequence's location in a dict dict_one = {} dict_two = {} for (seq, section_dict) in [(rec_one.seq.tostring().upper(), dict_one), (rec_two.seq.tostring().upper(), dict_two)] : for i in range(0, len(seq)-window, step) : section = seq[i:i+window] try : section_dict[section].append(i) except KeyError : section_dict[section] = [i] #Now find any sub-sequences found in both sequences matches = set(dict_one).intersection(dict_two) print "%i unique matches" % len(matches) #Create lists of x and y co-ordinates for scatter plot x = [] y = [] for section in matches : for i in dict_one[section] : for j in dict_two[section] : x.append(i) y.append(j) #Now draw it import pylab pylab.gray() pylab.scatter(x,y) pylab.xlim(0, len(seq_one)-window) pylab.ylim(0, len(seq_two)-window) pylab.xlabel("%s (length %i bp)" % (rec_one.id, len(rec_one))) pylab.ylabel("%s (length %i bp)" % (rec_two.id, len(rec_two))) pylab.title("Dot plot using window size %i\n(allowing no mis-matches)" % window) pylab.show() Using pylab.scatter is still a bit slow, but it does actually work. I was wondering if this dot-plot code were to use reportlab instead, would it make a sensible addition to the Bio.Graphics module? Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 15 08:29:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 08:29:33 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151229.m9FCTXmq017109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #4 from cymon.cox at gmail.com 2008-10-15 08:29 EST ------- Created an attachment (id=1006) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1006&action=view) Psycopg2 support for BioSQL -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 08:31:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 08:31:34 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151231.m9FCVYMn017238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #5 from cymon.cox at gmail.com 2008-10-15 08:31 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > Supporting psycopg2 sounds good :) > > > > > > What version of Biopython do you have? 1.48 or CVS as there have > > > been some BioSQL changes recently (mostly to do with the taxonomy > > > tables). > > > > 1.48 > > > > Could you update to Biopython CVS please? This now populates the > taxon/taxon_name tables differently when there is an NCBI taxon ID (with the > option to fetch lineages from Entrez). > > Once you're running CVS, could you attach a patch to this bug. That should > make it easier for me to look at this. OK, so a clean install from CVS seemed to do the trick and now both tests pass from within the test suite after applying the patch (attached) [cymon at chara Tests]$ cat setup_BioSQL.py |grep "DBDRIVER \=" 16:#DBDRIVER = 'MySQLdb' 19:DBDRIVER = 'psycopg2' [cymon at chara Tests]$ python run_tests.py test_BioSQL.py test_BioSQL_SeqIO.py test_BioSQL ... ok test_BioSQL_SeqIO ... ok ---------------------------------------------------------------------- Ran 2 tests in 32.882s OK (NB: the buglet in preserving qualifier order - sorted the lists in test_BioSQL_SeqIO.py in order to pass test). Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From lpritc at scri.ac.uk Wed Oct 15 08:57:37 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 13:57:37 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> Message-ID: On 15/10/2008 12:41, "Peter" wrote: > Using pylab.scatter is still a bit slow, but it does actually work. I > was wondering if this dot-plot code were to use reportlab instead, > would it make a sensible addition to the Bio.Graphics module? I'd welcome it as an addition there. Maybe there are other small functions of convenience that might find a home there? A graphical rendering of a BLAST record, for example... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Wed Oct 15 09:42:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 09:42:55 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151342.m9FDgtc3022442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 09:42 EST ------- (In reply to comment #5) > OK, so a clean install from CVS seemed to do the trick and now both tests pass > from within the test suite after applying the patch (attached) OK, good. I guess you had CVS tests against old Biopython or something like that happening. I've had a quick look at the patch - it looks fine to me, but I have not actually tested it. In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being required for older releases of psycopg - do we still need this for psycopg2? I guess it doesn't hurt. > (NB: the buglet in preserving qualifier order - sorted the lists in > test_BioSQL_SeqIO.py in order to pass test). I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py (conditional on the database driver) with a "TODO" comment next to it about checking why the order wasn't preserved. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 15 10:09:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 15:09:30 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: References: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> Message-ID: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> On Wed, Oct 15, 2008 at 1:57 PM, Leighton Pritchard wrote: > > On 15/10/2008 12:41, Peter wrote: > >> Using pylab.scatter is still a bit slow, but it does actually work. I >> was wondering if this dot-plot code were to use reportlab instead, >> would it make a sensible addition to the Bio.Graphics module? > > I'd welcome it as an addition there. Maybe there are other small functions > of convenience that might find a home there? A graphical rendering of a > BLAST record, for example... Sequence logos are another obvious little addition. Peter From lpritc at scri.ac.uk Wed Oct 15 10:16:23 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 15:16:23 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> Message-ID: On 15/10/2008 15:09, "Peter" wrote: > On Wed, Oct 15, 2008 at 1:57 PM, Leighton Pritchard wrote: >> >> On 15/10/2008 12:41, Peter wrote: >> >>> Using pylab.scatter is still a bit slow, but it does actually work. I >>> was wondering if this dot-plot code were to use reportlab instead, >>> would it make a sensible addition to the Bio.Graphics module? >> >> I'd welcome it as an addition there. Maybe there are other small functions >> of convenience that might find a home there? A graphical rendering of a >> BLAST record, for example... > > Sequence logos are another obvious little addition. Also eyecharts (like sequence logos, but in a grid), and graphical rendering of HMMs as state diagrams could be useful. There's some Python code for rendering logos at http://code.google.com/p/weblogo/ - maybe they'd like to contribute, or the code could be adapted? L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 10:42:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 15:42:52 +0100 Subject: [Biopython-dev] Biopython with python 2.6 on Windows Message-ID: <320fb6e00810150742s44a0eacdm4e50cbbcc7d560c0@mail.gmail.com> Has anyone been able to try out Biopython CVS with python 2.6 on Windows? I don't think ANY version of numpy is available pre-compiled for python 2.6 on Windows yet, so we can't easily try the numpy dependent parts of Biopython. However, checking everything without a numpy and/or C dependency should be fairly straightforward... Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:39:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:39:18 -0400 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200810151639.m9FGdILQ008553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 12:39 EST ------- Until recently we supported only: align[r] gives a row as a SeqRecord Updated in CVS to support row-slicing: align[start:end:step] gives a new (sub)alignment e.g. align[1:5] - new four row sub-alignment align[::2] - sub alignment using every second row align[:] - makes a copy align[::-1] - makes a copy with the row order reversed The current implementation could be improved after fixing enhancement Bug 2554. This leaves the door open for double indexes as previously outlined (blocking on Bug 2507). -- In reply to Jose's comment 1 and comment 2, this really is a complete replacement for the current alignment object, and would be better off on a separate bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:43:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:43:30 -0400 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200810151643.m9FGhUZM009070@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 12:43 EST ------- Hi Joel, Did you get any reply from the NCBI on this issue? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 13:21:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 13:21:03 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810151721.m9FHL3Xc012897@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 13:21 EST ------- Checking in Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.46; previous revision: 1.45 done Checking in ../DEPRECATED; /home/repository/biopython/biopython/DEPRECATED,v <-- DEPRECATED new revision: 1.31; previous revision: 1.30 done The Seq object's .data is now a new style property and will issue a warning if written to. We can then easily make this into a read only property for the next release (and perhaps make even reading the property trigger a warning). If we do keep the MutableSeq's data property as read/write, it should check the alphabet if Bug 2597 is fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From robert.cadena at gmail.com Wed Oct 15 13:32:59 2008 From: robert.cadena at gmail.com (Robert Cadena) Date: Wed, 15 Oct 2008 10:32:59 -0700 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: References: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> Message-ID: I'd been working on and off on writing clones of bioperl's and bioruby's graphics libraries: http://www.bioperl.org/wiki/HOWTO:Graphics http://bio-graphics.rubyforge.org/ I have very rudimentary drawing of tracks and vertically shifting subfeatures as can be seen here: http://machine501.com/images/bio_graphics_test_1.jpg The prep code to draw that is very similar to the bioruby example: -- p = Panel(100, start=10, width=480, pad_left=10, pad_right=10) generic_track = p.add_track('generic', glyph=GenericGlyph, label="Constant") directed_track = p.add_track('directed', glyph=DirectedBoxGlyph, label="Variable Test") generic_track.add_feature(SeqFeature(FeatureLocation(250, 375), 'clone1')) generic_track.add_feature(SeqFeature(FeatureLocation(54, 124), 'clone2')) --- I'd be happy to volunteer some time to help with GenomeDiagram and maybe there's the possibility of incorporating the bit of code I have to create a bioperl::graphics library clone. thanks. /r On Wed, Oct 15, 2008 at 3:39 AM, Leighton Pritchard wrote: > On 15/10/2008 11:24, "Peter" wrote: > >> On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard >> wrote: >>> Hi, >>> >>> A while ago I wrote the GenomeDiagram library for drawing images of genomes >>> and other large biological sequences, and collections of sequences >>> (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... >>> Now that I have managed to free up a little bit of time I'd like to revisit >>> GenomeDiagram, tidy up the internals some more ..., and contribute it >>> to Bio.Graphics .. >>> I suggested something doing similar a while ago and got no feedback - does >>> anyone object to this contribution, in principle or in practice? Or are >>> there any other comments? I'm all (well, mostly) ears... > >> One small thing I would change is switching colour to color for the >> argument/properties (the American spelling of color is the norm in all >> programming usage). > > Fair point - I'll do that. Though, like those pesky Canadians (see > Maple), I'm inclined to permit either spelling out of sheer > bloody-mindedness ;). Historically (rather than etymologically), it's a > holdover from working with interim EMBL-ish .tab files from Sanger, which > use the British English spelling: > > """ > FT /class="3.1.03" > FT /colour=7 > FT /gene="asnA" > """ > > Would people see permitting either form of colour/color as potentially > confusing? If so, I'm happy to go with the majority spelling. > >> Anyone using the existing stand alone >> GenomeDiagram library would have to make some small changes anyway >> (new import statements), so if there are going to be any other API >> changes it would be best to do them at the same time. > > I agree. I see this as a break from the standalone library, and would be > branching this version of GenomeDiagram from what had gone before. While > I'd like to make API changes as low-impact as possible, I'm in favour of > such changes where they support functional improvement. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by > guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views > expressed by the sender are not necessarily the views of SCRI and its > subsidiaries. This email and any files transmitted with it are > confidential > > to the intended recipient at the e-mail address to which it has been > addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this > > confidentiality and you must not use, disclose, copy, print or rely on > this > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are > present in this email, neither the Institute nor the sender accepts any > responsibility for any viruses, and it is your responsibility to scan > the email and the attachments (if any). > ______________________________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From lpritc at scri.ac.uk Thu Oct 16 05:17:57 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 16 Oct 2008 10:17:57 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: Message-ID: On 15/10/2008 18:32, "Robert Cadena" wrote: > I'd been working on and off on writing clones of bioperl's and > bioruby's graphics libraries: > http://www.bioperl.org/wiki/HOWTO:Graphics > http://bio-graphics.rubyforge.org/ [...] > maybe there's the possibility of incorporating the bit of code I have > to create a bioperl::graphics library clone. I would see GenomeDiagram as existing alongside a Bioperl::Graphics clone, providing extra functionality that (for now) is not present in Bioperl/Bioruby, so I don't see our approaches clashing on that level. > I'd be happy to volunteer some time to help with GenomeDiagram Thanks Robert, that's very welcome. The way I would like to move forward is to branch code off from the current version of GenomeDiagram, to make it work as though it's part of Biopython (sitting under Bio.Graphics), then neaten up the internals, add unit tests and documentation, before adding enhancement features/fixing a couple of outstanding issues. You can get a copy of the current GenomeDiagram code and documentation at http://bioinf.scri.ac.uk/lp/programs.php, and I'm happy to field design questions/comments either here or off-list. Initially I had thought to handle the first stages of this (up to and including neatening internals) myself before seeking code inclusion in Biopython, as I have an informal plan for what needs to be done, already. I'm open to advice and suggestions - including "I can do all that, if you like" - though ;) Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 05:53:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 05:53:50 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810160953.m9G9roWp029842@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #7 from cymon.cox at gmail.com 2008-10-16 05:53 EST ------- (In reply to comment #6) > (In reply to comment #5) > > OK, so a clean install from CVS seemed to do the trick and now both tests pass > > from within the test suite after applying the patch (attached) > > OK, good. I guess you had CVS tests against old Biopython or something like > that happening. > > I've had a quick look at the patch - it looks fine to me, but I have not > actually tested it. > > In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being > required for older releases of psycopg - do we still need this for psycopg2? Apparently not (tests run after editing out code...). > I > guess it doesn't hurt. I assume we want to maintain support for the older psycopg driver. Unfortunately, I cant get the old drivers to compile on my box - they configure, but gcc chokes on the make... I cant see why the patch should effect the operation of the old driver, but it would be worth checking. > > (NB: the buglet in preserving qualifier order - sorted the lists in > > test_BioSQL_SeqIO.py in order to pass test). > > I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py > (conditional on the database driver) with a "TODO" comment next to it about > checking why the order wasn't preserved. Unfortunately, I dont have direct CVS access - proxy hassles - I downloaded the cvs tarball previously. Besides I'm not a developer. Perhaps, someone else would check it in. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 06:18:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 06:18:28 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810161018.m9GAISK8031655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-16 06:18 EST ------- (In reply to comment #7) > (In reply to comment #6) > > > > In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being > > required for older releases of psycopg - do we still need this for psycopg2? > > Apparently not (tests run after editing out code...). > I assume we want to maintain support for the older psycopg driver. > Unfortunately, I cant get the old drivers to compile on my box - they > configure, but gcc chokes on the make... I cant see why the patch should > effect the operation of the old driver, but it would be worth checking. OK - but I'll leave it in rather than risk breaking the old psycopg driver. > > > (NB: the buglet in preserving qualifier order - sorted the lists in > > > test_BioSQL_SeqIO.py in order to pass test). > > > > I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py > > (conditional on the database driver) with a "TODO" comment next to it > > about checking why the order wasn't preserved. > > Unfortunately, I dont have direct CVS access - proxy hassles - I > downloaded the cvs tarball previously. Besides I'm not a developer. > Perhaps, someone else would check it in. Sorry - I knew Frank had CVS access and had assumed you did too. If you think you'll need CVS access, send an email on the dev-list. In the meantime, I'm happy to check this in on your behalf. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 06:57:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 06:57:52 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810161057.m9GAvqRp001465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #9 from cymon.cox at gmail.com 2008-10-16 06:57 EST ------- (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6)In the meantime, I'm > happy to check this in on your behalf. Yes, please do - thanks Peter. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 11:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 11:57:00 -0400 Subject: [Biopython-dev] [Bug 2618] New: back_translate method for the Seq object (in Bio.Seq)? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2618 Summary: back_translate method for the Seq object (in Bio.Seq)? Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Should we add a back_translate method to the Seq object (mirroring the translate method added on Bug 2381)? Mailing list discussion: http://lists.open-bio.org/pipermail/biopython/2008-October/004588.html Issues include how to cope with the ambiguous nature of the genetic code, e.g. "P" -> "CCT" or "CCN"? What about "L" -> "CTN" versus "TTR" or other options? Possible implementation to follow as a patch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 12:09:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 12:09:23 -0400 Subject: [Biopython-dev] [Bug 2618] back_translate method for the Seq object (in Bio.Seq)? In-Reply-To: Message-ID: <200810161609.m9GG9NSB032767@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2618 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-16 12:09 EST ------- Created an attachment (id=1009) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1009&action=view) Patch to Bio/Seq.py for back translation This follows Bio.Translate and simply uses whatever arbitrary unambiguous codon Bio.Data.CodonTable object supplies via its back_table. e.g. >>> from Bio import Seq >>> Seq("ACBDEF*").back_translate() Seq('GCUUGUNNNGAUGAGUUUUAA', IUPACAmbiguousRNA()) >>> Seq("ACBDEF*").back_translate().translate() Seq('ACXDEF*', HasStopCodon(ExtendedIUPACProtein(), '*')) If instead we want to return ambiguous codons (e.g. "P" -> "CCN"), then handling of back-transcription of special cases B (R/N) and J (I/L) or Z (E/Q) could also be improved (here just "NNN" is used). e.g. For the standard table, "SAR" codes for "Z". I haven't checked if something this is possible for all B, J and Z for all NCBI codon tables. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 17 07:54:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 12:54:13 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? Message-ID: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> I was wondering what benefits we would see by dropping support for Python 2.3 after the next release (or next couple of releases?). Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still be a fair number of people out there still interested in using Biopython on Python 2.3 (in addition to my own current Windows development machine). Before making any plans to drop Python 2.3 support we should canvas the main mailing list. See http://docs.python.org/dev/whatsnew/2.4.html There are two additions in python 2.4 which are interesting in regards to supporting 2.6, PEP 324: New subprocess Module http://www.python.org/dev/peps/pep-0324/ PEP 218 : PEP 218: Built-In Set Objects http://www.python.org/dev/peps/pep-0218/ In python 2.6, popen2 and os.popen3 etc are deprecated (so we need subprocess instead) and the sets module is deprecated (so we need the builtin set and frozenset). Most of Biopython now handles this gracefully with a import try/except handler. Once we drop python 2.3, these become slightly cleaner, but this in itself isn't a compelling reason. There are a couple more things I thought would be useful - but nothing pressing, e.g. PEP 289: Generator Expressions http://www.python.org/dev/peps/pep-0289/ There are a couple of places in the code where I have wanted to use a generator expressions, but have fallen back on a list comprehension or a generator function for Python 2.3 compatibility. PEP 318: Decorators for Functions and Methods http://www.python.org/dev/peps/pep-0318/ Again, decorators could be useful but I am not aware of any pressing need for their functionality in Biopython. Peter From biopython at maubp.freeserve.co.uk Fri Oct 17 10:15:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 15:15:57 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> Message-ID: <320fb6e00810170715i62b38308p2fdae9465be8bc05@mail.gmail.com> On Fri, Oct 17, 2008 at 12:54 PM, Peter wrote: > I was wondering what benefits we would see by dropping support for > Python 2.3 after the next release (or next couple of releases?). > ... > See http://docs.python.org/dev/whatsnew/2.4.html One other pretty trivial thing is the string object gained the rsplit method in python 2.4 (while partition and rpartition are in Python 2.5+). I've updated the Seq object's new rsplit method accordingly. Peter From bsouthey at gmail.com Fri Oct 17 10:19:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 17 Oct 2008 09:19:06 -0500 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> Message-ID: <48F89EDA.70307@gmail.com> Peter wrote: > I was wondering what benefits we would see by dropping support for > Python 2.3 after the next release (or next couple of releases?). > Support for Numpy 1.2 as I suspect that most people would have (or should have) upgraded to 2.4 for bug and performance gains. I have not looked at the major Linux distros like Fedora and Ubuntu to know when these dropped Python 2.3 for the standard Python install. But I also must add that there is no numpy Windows binary installation for Python 2.6 and does not seem likely to be an official one in the near future (technical issues with regards to the official Windows binary for Python 2.6). > Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still > be a fair number of people out there still interested in using > Biopython on Python 2.3 (in addition to my own current Windows > development machine). Before making any plans to drop Python 2.3 > support we should canvas the main mailing list. > Also some of the older Red Hat / Centos systems still run it - joys of these long-term releases. How many bug reports are with Python 2.3 from people with an interest in Python 2.3 not just testing it? To me the issue is about supporting different versions in the medium term (5 years) given that NumPy and Biopython will have been rewritten for Python 3.0 and most people will be using Python 3.0. I think that if the burden is too great to support a Python version it should be officially dropped. Of course any criteria bug or feature can be backported to earlier versions if requested. I would recommend that this starts a new minor version i.e 1.5 so it is clear that Biopython 1.5+ is Python 2.4+ only. (I also note the recent changes in the cvs that would justify this anyhow.) Bruce From biopython at maubp.freeserve.co.uk Fri Oct 17 10:58:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 15:58:40 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <48F89EDA.70307@gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> <48F89EDA.70307@gmail.com> Message-ID: <320fb6e00810170758n62815862i970498087d8dfacc@mail.gmail.com> >> I was wondering what benefits we would see by dropping support for >> Python 2.3 after the next release (or next couple of releases?). > > Support for Numpy 1.2 ... We've tested Biopython CVS works on python 2.3, 2.4, 2.5, and are almost ready for 2.6. We've also tested Biopython CVS works on Numpy 1.0, 1.1 and 1.2. The fact that Numpy 1.2 requires Python 2.4+ isn't really linked to weather or not Biopython continues to work on Python 2.3 > I have not looked at the major Linux distros like Fedora and Ubuntu to know > when these dropped Python 2.3 for the standard Python install. According to http://packages.ubuntu.com/intrepid/python and linked pages, Ubuntu hardy comes with Python 2.3 (very old) Ubuntu dapper comes with Python 2.4 (pretty old) Ubuntu gutsy, feisty and intrepid come with Python 2.5 > But I also must add that there is no numpy Windows binary installation for > Python 2.6 and does not seem likely to be an official one in the near future > (technical issues with regards to the official Windows binary for Python > 2.6). I've been keeping an eye on the numpy list and that is rather disappointing news - hopefully they can resolve this shortly and maybe there will be a numpy 1.2.x release for Windows. >> Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still >> be a fair number of people out there still interested in using >> Biopython on Python 2.3 (in addition to my own current Windows >> development machine). Before making any plans to drop Python 2.3 >> support we should canvas the main mailing list. > > Also some of the older Red Hat / Centos systems still run it - joys of these > long-term releases. Yes - this is why I am loath to just drop python 2.3 support without some benefits. Some of the linux machine I have access to at work still run python 2.3 for example. > How many bug reports are with Python 2.3 from people with an interest in > Python 2.3 not just testing it? Our Bugzilla doesn't track the python version, so we can't easily work that out. > To me the issue is about supporting different versions in the medium term (5 > years) given that NumPy and Biopython will have been rewritten for Python > 3.0 and most people will be using Python 3.0. I think that if the burden is > too great to support a Python version it should be officially dropped. Of > course any criteria bug or feature can be backported to earlier versions if > requested. > > I would recommend that this starts a new minor version i.e 1.5 so it is > clear that Biopython 1.5+ is Python 2.4+ only. Biopython doesn't currently have minor version numbers. On a related note, perhaps doing the first numpy supporting release as Biopython 1.50 rather than 1.49 would be more memorable / eye pleasing. > (I also note the recent changes in the cvs that would justify this anyhow.) Did you mean justify a version number bump, or justify dropping python 2.3? With hind sight, trying to support both Python 2.3 and Python 2.6 was more work than I expected - but I think its done now (apart from Bio.PDB.NACCESS). If Python 2.7 makes a similar volume of deprecations needing similar workarounds for Python 2.3, then we may have more of an incentive to drop Python 2.3. We've seen some of the drawbacks to continuing to support old python 2.3 while avoiding deprecation warnings in Python 2.6, but what I wanted to hear was ideas on how any of the newer language features added in python 2.4 could be useful (in the short to medium term). Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 17 11:24:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 11:24:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810171524.m9HFOlh9004587@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 11:24 EST ------- (In reply to comment #37) > Thank you. Confirming that CVS version 1.82 of the file works fine on Windows > XP SP2 with Python 2.5.2. Great - marking this bug as fixed. > A note: > > A custom script using Bio/Blast can appear to hang, and the results file > truncated, if the 'error handle' is used before the 'result handle': > > res_hdl, err_hdl = NCBIStandalone.blastall(my_blast, 'blastn', my_db, my_seq) > > # OK > my_result = res_hdl.read() > my_error = err_hdl.read() > > # Not OK > my_error = err_hdl.read() > my_result = res_hdl.read() This is known and mentioned in the tutorial: >> The error info can be hard to deal with, because if you try >> to do a error_handle.read() and there was no error info >> returned, then the read() call will block and not return, >> locking your script. In my opinion, the best way to deal >> with the error is only to print it out if you are not >> getting result_handle results to be parsed, but otherwise >> to leave it alone. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 17 11:59:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 11:59:12 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810171559.m9HFxC9r007553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 11:59 EST ------- (In reply to comment #9) > Yes, please do - thanks Peter. > > Cheers, C. OK, your patch is now in CVS: Checking in BioSeqDatabase.py; /home/repository/biopython/biopython/BioSQL/BioSeqDatabase.py,v <-- BioSeqDatabase.py new revision: 1.20; previous revision: 1.19 done Checking in DBUtils.py; /home/repository/biopython/biopython/BioSQL/DBUtils.py,v <-- DBUtils.py new revision: 1.8; previous revision: 1.7 done We still need to sort out the feature qualifiers loss of ordering... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 17 13:19:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 13:19:23 -0400 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200810171719.m9HHJNVZ014015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #853 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 13:19 EST ------- (From update of attachment 853) Something similar is now in CVS (covering both the Bio.SeqIO and Bio.AlignIO modules). I still need to extend the unit tests and update the documentation accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 18 15:31:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Oct 2008 15:31:40 -0400 Subject: [Biopython-dev] [Bug 2619] New: Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2619 Summary: Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: cjoldfield at gmail.com MMCIFParser is a documented feature of Bio.PDB, but it is broken by default because the MMCIFlex build is commented out in the distribution setup.py. According to http://osdir.com/ml/python.bio.devel/2006-02/msg00038.html this is because it doesn't compile on Windows. Though the function is documented, the changes need to enable are not, so this seems like an installation bug to me. The fix on linux is to uncomment setup.py lines 486 on. A general work around might be to condition the compile on the os.sys.platform variable. I'd offer a diff, but I'm new to biopython and python in general, so please forgive my ignorance. Source install of version 1.48, gentoo linux 2008, x86_64. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 08:46:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 08:46:43 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810191246.m9JCkhm6030332@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-19 08:46 EST ------- http://lists.open-bio.org/pipermail/biopython/2006-February/002923.html Michiel wrote: > This is a recurring problem and is not limited to > Windows, but to any machine without flex installed. Certainly, as things stood back in Feb 2006, getting Bio.PDB.mmCIF.MMCIFlex to compile on Windows was tricky (or impossible). However, even on Linux/Mac we really need to be able to check if flex is installed without blindly trying to compile it. A non-flex version would be another option - something Thomas didn't have the time or inclination to tackle. In the short term, a note in the documentation would help... were you refering to "The Biopython Structural Bioinformatics FAQ"? http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 12:02:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 12:02:55 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810191602.m9JG2tGJ010540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #19 from mmokrejs at ribosome.natur.cuni.cz 2008-10-19 12:02 EST ------- (In reply to comment #18) > e.g. As I wrote in comment 17, > > I'm thinking we could also support "start" and "end" optional arguments > > (named > > after those used in the python string methods, and behaving in the same way) > > for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would > > give the three forward reading frames. > > This would give an alternative to: > > my_seq[i:j].translate(table) > > as: > > my_seq.translate(table, start=i, end=j) > > As with the python string methods, potentially the implementation could be > slightly faster as a new Seq object doesn't need to be created for the slice. > On the other hand, it does then offer two ways of doing the same thing. The second approach would be I think often handy. > > An optional boolean argument could enable treating the sequence as a CDS - > > verifying it starts with a start codon (which would always be translated as M) > > and verifying it ends with a stop codon (with no other stop codons in frame), > > which would not be translated. Following BioPerl, this argument could be > > called "complete". The "complete" is a cryptic naming, I wouldn't be fond of it. I think everybody would rather him/herself rather check is a.startswith('M') and a.endswith('*') instead. But, what would be useful is a.find_orf(offset=0). > > Related to this, it would be useful to have a boolean option to stop > translation at the first in frame stop codon (possible argument names for this > include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). Yes, find_orf(offset) with default offset=0. I hope there always will be a way to get translate whole NA sequence into prot residues in a desired frame so one could inspect the positions of various STOP codons, etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 12:06:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 12:06:24 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810191606.m9JG6OmD010729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #20 from mmokrejs at ribosome.natur.cuni.cz 2008-10-19 12:06 EST ------- (In reply to comment #17) > def translate(self, table = "Standard", stop_symbol = "*"): > """Terms a nucleotide sequence into a protein sequence (amino acids). > > This method will translate DNA or RNA sequences, but for a protein > sequence an exception is raised. > > table - Which codon table to use? This can be either a name > (string) or an NCBI identifier (integer). Would be nice to document a URL to a page documenting the translation tables in the doc string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 04:24:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 04:24:25 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200824.m9K8OPf0029113@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #21 from lpritc at scri.sari.ac.uk 2008-10-20 04:24 EST ------- (In reply to comment #19) > (In reply to comment #18) > The "complete" is a cryptic naming, I wouldn't be fond of it. I think everybody > would rather him/herself rather check is a.startswith('M') and a.endswith('*') > instead. But, what would be useful is a.find_orf(offset=0). Ditto the 'complete' naming - it's not clear at all. > > Related to this, it would be useful to have a boolean option to stop > > translation at the first in frame stop codon (possible argument names for this > > include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). > > Yes, find_orf(offset) with default offset=0. I would like to raise the issue that 'ORF' has taken on (at least) two meanings over the years, and it's not yet clear which is being discussed here. The correct definition of 'Open Reading Frame' is an uninterrupted sequence of nucleotides that do not contain an in-frame stop codon. However, more restrictive definitions have found a way in erroneously over the years, asserting that the sequence must have an in-frame start codon, or additionally that the ORF begins at that start codon. This latter case in particular would be a putative coding sequence (CDS), rather than an ORF. See a Google define: orf search for details... (http://www.google.com/search?q=define:+orf). As an implementation examply, Sanger's Artemis (http://www.sanger.ac.uk/Software/Artemis/) correctly identifies ORFs. See also Doolittle's 'Of URFS and ORFS', available on Google Books: http://books.google.com/books?id=jIlMMx6Ji-sC - it's 22 years old now, and a good candidate for the first manual on bioinformatics. The Wikipedia page for ORF is typically egregious, and also incorrect. Also, by 'offset' in the proposed syntax above, is 'reading_frame' intended? If so I think it would be clearer to indicate that the reading frame is what is desired, as specifying a reading frame of -1 implies something different to an offset of -1. I propose that the default behaviour is to find all ORFs in all reading frames, leaving it to the user to decide whether that behaviour is appropriate for their sequence and optionally specify a reading frame. For discussion purposes, I'm attaching code for an ORF search I implemented locally in a subclass of the Seq object. As ever, I don't claim that it's perfect, but it did what I needed at the time. In particular the returned index for ORFs is 1-based, as that is what I wanted then. def find_ORFs(self, codon_table=1, min_length=100): """ find_ORFs(self, codon_table=1, min_length=100) codon_table Integer, must be one of the integers in Bio.Data.CodonTable.generic_by_id; these are the standard codon table numbers used by sequence databases. min_length Integer, the shortest length of consecutive nucleotides to consider as an ORF Finds ORFs within the SeqRecord sequence, and returns them as a list of tuples in the format: (frame, start, end, sequence) where start and end are the start and end points on the sequence (i.e. the first and last base positions, NOT the values you should use when indexing sequences in Python), and sequence is a Seq object. """ assert self.alphabet.__class__ in dna_alphabets, \ "Alphabet is not a known DNA alphabet" # Get the codon table; raises a KeyError if an invalid table number codon_table = CodonTable.generic_by_id[codon_table] # Loop over the record's sequence in all six forward and reverse # frames, returning a list of (frame, start, end, sequence) tuples # List of tuples orflist = [] # Forward frames first forward_orfs = self.__find_orfs_in_sequence(self.data, codon_table) for frame, start, end, sequence in forward_orfs: if len(sequence) >= min_length: orflist.append(('+%d' % frame, start, end, Seq(sequence, self.alphabet))) # Then reverse frames seq = reverse_complement(self.data) reverse_orfs = self.__find_orfs_in_sequence(seq, codon_table) for frame, start, end, sequence in reverse_orfs: if len(sequence) >= min_length: start = len(self.data) - start + 1 end = len(self.data) - end + 1 start, end = end, start orflist.append(('-%d' % frame, start, end, Seq(sequence, self.alphabet))) return orflist def __find_orfs_in_sequence(self, sequence, codon_table): """ Returns a list of ORFs for a passed sequence, in three forward frames, as tuples (frame, start, end, sequence) """ orflist = [] for frame, offset in [(1, 0), (2, 1), (3, 2)]: tmporf = [] orfstart = offset i = offset while i < len(sequence): codon = sequence[i:i+3] if len(codon) == 3 and codon not in codon_table.stop_codons: tmporf.append(codon) else: if codon in codon_table.stop_codons: tmporf.append(codon) tmporf = ''.join(tmporf) orflist.append((frame, orfstart+1, orfstart+len(tmporf), tmporf)) orfstart += len(tmporf) tmporf = [] i += 3 # Catch ORFs that run up to the end of the sequence, by checking # for an empty tmporf list if tmporf != []: tmporf = ''.join(tmporf) orflist.append((frame, orfstart+1, orfstart+len(tmporf), tmporf)) return orflist In order to obtain a potential coding sequence that begins with a methionine, I would translate, and then use this method in a subclass of Seq for the translated sequence: def trim_to_first_met(self): """ Assuming that the sequence is a protein sequence, trims to the first methionine in the sequence and returns a Seq object If the sequence has no methionine, then the full sequence is returned """ # Crop the sequence to the first Methionine. If there is no methionine # the full sequence is returned # We assert that we have a protein sequence assert self.alphabet.__class__ in protein_alphabets, \ "Sequence alphabet is not a known ProteinAlphabet" if self.data.count('M'): seq = self.data[self.data.index('M'):] else: seq = self.data return Seq(seq, self.alphabet) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 05:14:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 05:14:42 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200914.m9K9EgDv031522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-20 05:14 EST ------- Martin wrote in comment #19: > Peter wrote in comment #18: > > > e.g. As I wrote in comment 17, > > > I'm thinking we could also support "start" and "end" optional arguments > > > (named after those used in the python string methods, and behaving in > > > the same way) for specifying a sub-sequence to be translated. Using > > > start=0, 1 or 2 would give the three forward reading frames. > > > > This would give an alternative to: > > > > my_seq[i:j].translate(table) > > > > as: > > > > my_seq.translate(table, start=i, end=j) > > > > As with the python string methods, potentially the implementation could > > be slightly faster as a new Seq object doesn't need to be created for > > the slice. On the other hand, it does then offer two ways of doing the > > same thing. > > The second approach would be I think often handy. If we did add this, then arguably we should do this for all the other methods too (transcribe, reverse_complement, etc). I'm not convinced this adds any value. Martin, why do you like the second approach (using start & end arguments) over the first (slicing the sequence before translation)? ------------------------------------------------------ Using BioPerl's idea of a "complete" argument (boolean) isn't popular: Martin wrote in comment #19 >> >> The "complete" is a cryptic naming, I wouldn't be fond of it... >> Leighton wrote in comment #21 > > Ditto the 'complete' naming - it's not clear at all. > This was to control two related features: (a) Validate the first codon is a valid start codon, and translate it as M (even if going on the genetic code it would normally be say L). This should be a boolean argument defaulting to False, possible names "start", "check_start", "from_start", ... Variations on this like "find the first in frame start codon" are getting into gene/ORF finding and I don't see this are part of the remit for a translate method. (b) Stop translating at the first in frame stop codon (see my comment 18). Again, a boolean argument, and for compatibility with previous Biopython conventions, defaulting to False (i.e. read through). Possible names "stop", "to_stop", "auto_stop", "terminate", ... In this case, how should the method behave if there is no final stop codon - raise an error or not? Also should the stop codon be included in the returned sequence (note that the Bio.Translate module did not include the stop symbol). You might want to control these two options independently, so having them as two arguments is more flexible. ------------------------------------------------------ This bug has started discussing ORF/gene finding - I see this as separate to the translate method. Could we do this on the mailing list or a separate bug please? ------------------------------------------------------ Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 05:48:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 05:48:16 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200948.m9K9mGVo001679@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #23 from lpritc at scri.sari.ac.uk 2008-10-20 05:48 EST ------- (In reply to comment #22) > (a) Validate the first codon is a valid start codon, and translate it as M > (even if going on the genetic code it would normally be say L). This should be > a boolean argument defaulting to False, possible names "start", "check_start", > "from_start", ... > (b) Stop translating at the first in frame stop codon (see my comment 18). > Again, a boolean argument, and for compatibility with previous Biopython > conventions, defaulting to False (i.e. read through). Possible names "stop", > "to_stop", "auto_stop", "terminate", ... [...] > In this case, how should the method behave if there is no final stop codon - > raise an error or not? Also should the stop codon be included in the returned > sequence (note that the Bio.Translate module did not include the stop symbol). > > You might want to control these two options independently, so having them as > two arguments is more flexible. Further to the above (and keeping away from ORF-finding) another use-case would be translation of ESTs, which may come with or without either a start or a stop codon. Often I am handed compilations of EST sets that have been obtained using different experimental methods, and are not consistently 3` or 5` sequenced (nor, to be fair, are they uniformly in the correct orientation...), and in those cases I would wish to translate the entire sequence without regard to the presence of a start or stop codon (really I'd like to find ORFs, but I promised I'd keep away from that, for now ;) ). I would prefer that default behaviour did not enforce either a start or stop codon check, but that each of these could be optional arguments. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 09:36:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 09:36:56 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810201336.m9KDauP6014867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #929 is|0 |1 obsolete| | ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-20 09:36 EST ------- (From update of attachment 929) This patch is now obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 07:28:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 07:28:57 -0400 Subject: [Biopython-dev] [Bug 2622] New: Parsing between position locations like 5933^5934 in GenBank/EMBL files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2622 Summary: Parsing between position locations like 5933^5934 in GenBank/EMBL files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk GenBank and EMBL files can contain features with locations like 123^456, handled in Biopython as BetweenPosition objects. Quoting ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt > A site between two residues, such as an endonuclease cleavage site, is > indicated by listing the two bases separated by a carat (e.g., 23^24). A small GenBank example containing examples of this is NC_005816.gbk available here: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Yersinia_pestis_biovar_Microtus_91001/NC_005816.gbk e.g. variation 5933^5934 /note="compared to AL109969" /replace="a" variation 5933^5934 /note="compared to AF053945" /replace="aa" For a larger example, see NC_005027.gbk ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Pirellula_sp/NC_005027.gbk e.g. misc_feature 41855^41856 /note="cosmid pircos-a3a12/ cosmid pircos-a1d04 joining point" See also one of the Biopython unit test examples, SC10H5.embl, a pre-2006 style EMBL file from BioPerl. As the following example script and its output will show, Biopython CVS (and I presume several releases) does not parse these locations sensibly. There are at least two issues, firstly there is a numerical error from treating 5933^5934 as 5932^11866 (position versus extension) and secondly the representation of these locations might be better not using separate start/end objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 07:30:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 07:30:50 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810211130.m9LBUoE3032234@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-21 07:30 EST ------- Sample script showing the problem, from Bio import SeqIO #filename = "NC_005027.gbk" filename = "NC_005816.gbk" print "=" * 50 for line in open(filename) : if "^" in line : print line.rstrip() print "=" * 50 record = SeqIO.read(open(filename), "genbank") print record.id for feature in record.features : if "^" in str(feature.location) : print feature And its output: ================================================== variation 5933^5934 variation 5933^5934 variation 8529^8530 ================================================== NC_005816.1 type: variation location: [(5932^11866):(5932^11866)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['a'] type: variation location: [(5932^11866):(5932^11866)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AF053945'] Key: replace, Value: ['aa'] type: variation location: [(8528^17058):(8528^17058)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['tt'] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 08:07:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 08:07:01 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810211207.m9LC71c1002617@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-21 08:07 EST ------- Part of the problem is in Bio/GenBank/__init__.py around line 793, # case 4 -- we've got 100^101 elif isinstance(position, LocationParser.Between): final_pos = SeqFeature.BetweenPosition(position.low.val, position.high.val) # case 5 -- we've got (100.101) elif isinstance(position, LocationParser.TwoBound): final_pos = SeqFeature.WithinPosition(position.low.val, position.high.val) The BetweenPosition and WithinPosition objects expect the (low) position and the extension, not the low position and the high position. Thus instead: # case 4 -- we've got 100^101 => position 100, extension 1 elif isinstance(position, LocationParser.Between): final_pos = SeqFeature.BetweenPosition(position.low.val, position.high.val-position.low.val) # case 5 -- we've got (100.101) => position 100, extension 1 elif isinstance(position, LocationParser.TwoBound): final_pos = SeqFeature.WithinPosition(position.low.val, position.high.val-position.low.val) However, things still don't seem quite right with the SeqFeature.location object (even with this change) as the same object is used for both the start and end, which means both have zero-based locations: ================================================== variation 5933^5934 variation 5933^5934 variation 8529^8530 ================================================== NC_005816.1 type: variation location: [(5932^5933):(5932^5933)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['a'] type: variation location: [(5932^5933):(5932^5933)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AF053945'] Key: replace, Value: ['aa'] type: variation location: [(8528^8529):(8528^8529)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['tt'] Note that a location string "5933..5934" (2bp) becomes in Biopython a typical range between two exact positions, representing the slice [5932:5934] (2bp). Perhaps locations like 5933^5934 (0bp) should be held similarly, akin to a slice [5933:5933] (0bp). e.g. for a sequence "ACTG...", a location string "2^3" means between "AC" and "TG...", or in python speak the empty slice [2:2] The GenBank release notes do say: > 3. A site between two bases; > ... > A site between two residues, such as an endonuclease cleavage site, is > indicated by listing the two bases separated by a carat (e.g., 23^24). I think they mean implicitly two neighbouring bases - after all "23^25" can just be written as "24" or "23^26" as "24..26". The need for the caret "23^25" is a result of the one-based counting system - avoided in python slice notation. Finally, it is not clear to me from the GenBank release notes if locations like "23^34" can be joined as part of more complex location, or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 13:42:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 13:42:47 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810211742.m9LHglaY020907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #2 from cjoldfield at gmail.com 2008-10-21 13:42 EST ------- > In the short term, a note in the documentation would help... were you refering > to "The Biopython Structural Bioinformatics FAQ"? > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf The FAQ in part, but there is also a link from RCSB that claims BioPython can parse mmCIF: http://sw-tools.rcsb.org/ I've run the Bio.PDB mmCIF parser over all of PDB, and it plain fails on >10% of files (>40,000 files, >5,000 failures, mostly spurious missing key exceptions). From what I've seen, it seems that an inconsistency in one table of a mmCIF file throws a wrench in the whole parse. I tried the C++ mmCIF parser from ncbi (only on a few files so far) and it doesn't suffer these parse problems (though it reports the faulty entries). If Bio.PDB were to be updated, this seems like a good candidate for a back end (assuming its portable). I have the inclination, maybe not the time ;), to do this, unless this should fall to Thomas or others. Chris -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 04:51:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 04:51:33 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810220851.m9M8pXb3002091@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 04:51 EST ------- (In reply to comment #2) > > In the short term, a note in the documentation would help... were you > > refering to "The Biopython Structural Bioinformatics FAQ"? > > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf > > The FAQ in part, but there is also a link from RCSB that claims BioPython can > parse mmCIF: > > http://sw-tools.rcsb.org/ I'll make some documentation updates along the lines of "The Bio.PDB mmCIF parser isn't installed by default due to cross platform compilation problems", and see if anyone on the dev mailing list has any bright ideas for detecting flex at install time. > I've run the Bio.PDB mmCIF parser over all of PDB, and it plain fails on >10% > of files (>40,000 files, >5,000 failures, mostly spurious missing key > exceptions). From what I've seen, it seems that an inconsistency in one table > of a mmCIF file throws a wrench in the whole parse. Would you mind reporting a separate bug on this (tiny sample script, the exception error(s), and URLs for a couple of the 5000+ failures)? > I tried the C++ mmCIF parser from ncbi (only on a few files so far) and it > doesn't suffer these parse problems (though it reports the faulty entries). Given the number of PDB problems Bio.PDB has to deal with, its sadly not surprising that mmCIF files also suffer from this kind of thing. > If Bio.PDB were to be updated, this seems like a good candidate for a back > end (assuming its portable). I have the inclination, maybe not the time ;), > to do this, unless this should fall to Thomas or others. I would worry about the cross platform support (in particular Windows), but also the additional complication to building Biopython. Its certainly worth discussing if you or Thomas are keen. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 22 05:32:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Oct 2008 10:32:44 +0100 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) Message-ID: <320fb6e00810220232if63772ejda05f6d5e692b24e@mail.gmail.com> Dear all, Back in Feb 2006 (shortly before Biopython 1.42), in CVS revision 1.109 setup.py was modified to comment out building & installation of the Bio.PDB.mmCIF module which requires flex to be installed. For background see: http://lists.open-bio.org/pipermail/biopython/2006-February/002923.html http://lists.open-bio.org/pipermail/biopython-dev/2006-February/002280.html This issue was recently re-opened with Bug 2619: http://bugzilla.open-bio.org/show_bug.cgi?id=2619 I looks like Bio.PDB.mmCIF didn't (and probably still doesn't) compile on Windows, but should compile on Unix provided flex is installed. Ideally setup.py would check the platform and if flex is installed, and if so install Bio.PDB.mmCIF - rather than the current situation never installing it (unless the user edits setup.py by hand). Alternatively, we could have a simple prompt (on Unix) asking if we should try and build/install Bio.PDB.mmCIF (like the ugly KDTree prompt when that was written in C++)? Does anyone have any code handy for checking if flex is installed from within python? Perhaps ideally we could replace the flex version of Bio.PDB.mmCIF with pure python - but this is a big job. Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 22 05:49:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 05:49:42 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810220949.m9M9ngE0005805@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 05:49 EST ------- I've been looking at modifying setup.py, but need to be able to tell if flex is installed AND if its headers are installed (required to compile the mmCFIG code). The following is only a partial solution: def is_flex_installed(): """try and work out if flex (and its headers) are installed.""" if sys.platform.startswith("win") : return False import commands #TODO - This only checks the command line tool, not the headers return "not found" not in commands.getoutput("flex --version") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 07:42:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 07:42:08 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221142.m9MBg8Hf012645@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 07:42 EST ------- Created an attachment (id=1011) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1011&action=view) Changes to Bio/Blast/NCBIStandalone.py and Bio/Blast/Record.py I'd like to make the XML and text parser agree on representing the HSP identities, positives and gaps as integers. Currently the text parser (and the default values in the HSP object) use a tuple of the value and the alignment length. The upside is it brings the objects returned by the XML and plain text parsers into better agreement. In this case I find storing these properties as simple integers makes much more sense than as a tuple (a choice probably based on the layout of the BLAST plain text output itself). The downside of applying this patch is it could break some existing scripts parsing the plain text output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 11:43:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 11:43:06 -0400 Subject: [Biopython-dev] [Bug 2618] back_translate method for the Seq object (in Bio.Seq)? In-Reply-To: Message-ID: <200810221543.m9MFh6Yu009327@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2618 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 11:43 EST ------- After some lively debate on the mailing list, we failed to come up with any real world examples where a simple back_translate method (or a Bio.Seq back_translate function) giving a string or Seq object would be useful. A simple string or the current Seq object simply cannot represent all the possible codons in a back translation. Consider the standard table for leucine, Leu/L = {TTA, TTG, CTT, CTC, CTA, CTG} = {TTR, CTN} which covers 6 unambiguous codons. This is a subset of YTN = {TTC, TTA, TTG, TTT, CTC, CTA, CTG, CTT} which covers 8 unambiguous codons. Having back_translate("L") == "CTN" means translate(back_translate("L")) == "L", but doesn't cover the two codons TTR (i.e. TTA or TTG). At least this is better than back_translate("L") == "TTR" which still has translate(back_translate("L")) == "L", but doesn't cover the four codons CTN. Picking any one of the six codons also ensures translate(back_translate("L")) == "L" but of course doesn't cover the other five codons. In all three cases, the utility of the back translation is limited (e.g. no help for searches). Having back_translate("L") == "YTN" means translate(back_translate("L")) == "X", which would surprise many. Using "YTN" covers all the codons plus some extra ones. This might be useful for searching purposes, but otherwise its very misleading. However, while I am marking this bug as WONTFIX, returning a more complex ambiguous sequence representation (e.g. using regular expressions) may have merit. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 12:08:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 12:08:29 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221608.m9MG8TcN011450@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 12:08 EST ------- Query Length ============ XML output includes this information once, currently recorded as .query_letters only. Plain text output includes this twice, recorded as .query_letters (associated with the query header) and .query_length (associated with the pairwise alignments). e.g. ... Query= gi|120291|sp|P21297|FLBT_CAUCR FLBT PROTEIN (141 letters) ... >gi|120291|sp|P21297|FLBT_CAUCR FLBT PROTEIN Length = 141 ... As far as I know, these are always the same. An assertion could be added to the plain text parser to verify this... For consistency, the XML parser could just populate both .query_length and .query_letters - a simple change that won't break any old code and makes migrating from the text parser to the XML parser a little easier. This does perpetuate the confusion of two names. We could go further and make one of these properties officially deprecated (e.g. using a property method to issue a warning). But which one should we keep? Currently the XML parser only supports .query_letters but .query_length is more natural. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 12:28:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 12:28:48 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221628.m9MGSmiV014465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 12:28 EST ------- Database Length =============== I wanted to record my notes on this based on findings reported on the mailing list. See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2008-August/004101.html The plain text BLAST format contains the database length information three times (!), once in the header (for each query) and then again at the end of the file in the database report and the parameters "total letters" and again as "length of database", e.g. http://bugzilla.open-bio.org/attachment.cgi?id=676 ... Database: Leigo 4,535,438 sequences; 1,573,298,872 total letters ... Database: Leigo Posted date: Jan 22, 2007 11:26 AM Number of letters in database: 1,573,298,872 Number of sequences in database: 4,535,438 ... Length of database: 1,573,298,872 ... The Bio.Record.Header class defines "database_letters" (this is repeated every query), Bio.Record.DatabaseReport defines "num_letters_in_database", and Bio.Record.Parameters class defines "database_length" (where the names reflect the NCBI strings). The Bio.Record.Record inherits from all three, so ends up with "database_letters", "database_length" and "num_letters_in_database" (all coming from different bits of a plain text BLAST file). If the -z option is used, only the last of these three databases in the plain text output is changed (tested using standalone BLAST 2.2.18, which Biopython can parse for single queries). Using the Biopython plain text parser, "database_letters" and "num_letters_in_database" reflect the real database size, while "database_length" reflects the -z argument (which is used in the statistics). If the -z option is used with XML output, then is updated. As far as I can tell, the "real" database size is not reported. The XML parser stores this as "num_letters_in_database". So from plain text BLAST we have two pieces of information, actual database size - "database_letters" and "num_letters_in_database specified database size - "database_length" While for XML BLAST we only get one piece of information, specified database size - "num_letters_in_database" while "database_letters" and "database_length" default to None. This is a horrid mess. In the short term I propose the XML parser also record the specified database size as "database_length", and perhaps also as "database_letters" which would facilitate anyone trying to migrate a script from the plain text parser to the XML parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 22 13:04:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Oct 2008 18:04:20 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? Message-ID: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> This is about three Biopython "support" modules: Bio.mathfns, Bio.listfns, Bio.stringfns, each of which has its own C implementation for speed. These haven't been touched for 6 years (which suggests they are stable and well tested), but they are now hardly used in Biopython. By removing these we not only reduce the amount of C code in Biopython (although here it is optional) which is a good thing for portability and supporting other python variants, but we also can reduce the "clutter" under the Bio.* namespace, e.g. >>> import Bio >>> help(Bio) On 9th Oct I wrote: > Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now > uses numpy more heavily instead. I think that Bio.mathfns (and its C > implementation) are no longer used anywhere in Biopython (and I would > be surprised if anyone else is using this module). I'm suggesting > deprecating Bio.mathfns and Bio.cmathfns for the next release. Any objections to deprecating Bio.mathfns and Bio.cmathfns? On 9th Oct I wrote: > I think Bio.stringfns and its C implementation Bio.cstringfns are also > now unused in Biopython, and like Bio.mathfns and Bio.cmathfns > should be deprecated for the next release. Any objections to deprecating Bio.stringfns and Bio.cstringfns? On 9th Oct I wrote: > Similarly, Bio.listfns and its C implementation Bio.clistfns might > also be deprecated with a little effort ... only three modules > currently use Bio.listfns We could just label Bio.listfns (and Bio.clistfns) as obsolete for the next release, or just add a note in the docstring that this might be deprecated shortly. Peter From bsouthey at gmail.com Thu Oct 23 12:28:48 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 23 Oct 2008 11:28:48 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> Message-ID: <4900A640.8050102@gmail.com> Peter wrote: > This is about three Biopython "support" modules: Bio.mathfns, > Bio.listfns, Bio.stringfns, each of which has its own C implementation > for speed. These haven't been touched for 6 years (which suggests > they are stable and well tested), but they are now hardly used in > Biopython. > > By removing these we not only reduce the amount of C code in Biopython > (although here it is optional) which is a good thing for portability > and supporting other python variants, but we also can reduce the > "clutter" under the Bio.* namespace, e.g. > >>>> import Bio >>>> help(Bio) >>>> > > On 9th Oct I wrote: > >> Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now >> uses numpy more heavily instead. I think that Bio.mathfns (and its C >> implementation) are no longer used anywhere in Biopython (and I would >> be surprised if anyone else is using this module). I'm suggesting >> deprecating Bio.mathfns and Bio.cmathfns for the next release. >> > > Any objections to deprecating Bio.mathfns and Bio.cmathfns? > Nope, the functions used by Bio/NaiveBayes.py are: mathfns.safe_log (also defines safe_log2) but is not very good because it sets a hard constant (1E-100) as a limit. mathfns.safe_exp The other functions included are: fcmp Compare two floating point numbers, up to a specified precision. intd Represent a floating point number as an integer. I presume that you mean adding mathfns.safe_log and mathfns.safe_exp to Bio/NaiveBayes.py first because these are needed by Bio/NaiveBayes.py. Note that the safe_log in Bio/MarkovModel.py is not the same as mathfns.safe_log. > On 9th Oct I wrote: > >> I think Bio.stringfns and its C implementation Bio.cstringfns are also >> now unused in Biopython, and like Bio.mathfns and Bio.cmathfns >> should be deprecated for the next release. >> > > Any objections to deprecating Bio.stringfns and Bio.cstringfns? > Nope, as you say these are not used. But just to be clear, the functions, lost are splitany Split a string using many delimiters. find_anychar Find one of a list of characters in a string. rfind_anychar Find one of a list of characters in a string, from end to start. starts_with Check whether a string starts with another string [DEPRECATED]. > On 9th Oct I wrote: > >> Similarly, Bio.listfns and its C implementation Bio.clistfns might >> also be deprecated with a little effort ... only three modules >> currently use Bio.listfns >> > > We could just label Bio.listfns (and Bio.clistfns) as obsolete for the > next release, or just add a note in the docstring that this might be > deprecated shortly. > Used by: Bio/MaxEntropy.py Bio/NaiveBayes.py Bio/MarkovModel.py Bio/pairwise2.py Functions directly used: itemindex Make an index of the items in the list. items Get one of each item in a list. contents Calculate percentage each item appears in a list. Functions indirectly or not used: asdict Make the list into a dictionary (for fast testing of membership). count Count the number of times each item appears. intersection Get the items in common between 2 lists. difference Get the items in 1 list, but not the other. indexesof Get a list of the indexes of some items in a list. take Take some items from a list. Also Bio.listfns used by pairwise2.py which also has a c implementation (cpairwise2) that I would also suggest is a candidate for removal. At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, and Bio/MarkovModel.py to indicate if Bio.listfns functions are really required or to port them to numpy. (I may try look at trying to port them but not soon.) In summary I have no objection to removing the c code associated with this code. Bruce From biopython at maubp.freeserve.co.uk Thu Oct 23 12:48:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 23 Oct 2008 17:48:23 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <4900A640.8050102@gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> Message-ID: <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> Bruce: >Peter: >> Any objections to deprecating Bio.mathfns and Bio.cmathfns? > > Nope, the functions used by Bio/NaiveBayes.py are ... You must be looking at Bio/NaiveBayes.py an older CVS checkout - it doesn't use Bio.mathfns at all now, but rather makes more use of numpy. >> We could just label Bio.listfns (and Bio.clistfns) as obsolete for the >> next release, or just add a note in the docstring that this might be >> deprecated shortly. > > Used by: > Bio/MaxEntropy.py > Bio/NaiveBayes.py > Bio/MarkovModel.py > Bio/pairwise2.py > > Functions directly used: > ... > At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, > and Bio/MarkovModel.py to indicate if Bio.listfns functions are really > required or to port them to numpy. (I may try look at trying to port them > but not soon.) I haven't dug too deeply either - which is why I wasn't going to push to deprecate Bio.listfns yet. I did mention some of this in the earlier email, but you have gone into more detail. http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004406.html As you will have noticed, many of the things in Bio.listfns could nowadays be done in pure python with a set. Bruce wrote: > Also Bio.listfns used by pairwise2.py which also has a c implementation > (cpairwise2) that I would also suggest is a candidate for removal. I think Bio.pairwise2 is actually potentially quite useful. It could do with a little documentation love - even a short "cookbook" entry for the Tutorial would help. Peter From bsouthey at gmail.com Thu Oct 23 14:50:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 23 Oct 2008 13:50:06 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> Message-ID: <4900C75E.9020907@gmail.com> Peter wrote: > Bruce: > >> Peter: >> >>> Any objections to deprecating Bio.mathfns and Bio.cmathfns? >>> >> Nope, the functions used by Bio/NaiveBayes.py are ... >> > > You must be looking at Bio/NaiveBayes.py an older CVS checkout - it > doesn't use Bio.mathfns at all now, but rather makes more use of > numpy. > Sorry, yes, I just pulled a new cvs version and I now see that it has been removed. > >>> We could just label Bio.listfns (and Bio.clistfns) as obsolete for the >>> next release, or just add a note in the docstring that this might be >>> deprecated shortly. >>> >> Used by: >> Bio/MaxEntropy.py >> Bio/NaiveBayes.py >> Bio/MarkovModel.py >> Bio/pairwise2.py >> >> Functions directly used: >> ... >> At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, >> and Bio/MarkovModel.py to indicate if Bio.listfns functions are really >> required or to port them to numpy. (I may try look at trying to port them >> but not soon.) >> > > I haven't dug too deeply either - which is why I wasn't going to push > to deprecate Bio.listfns yet. > > I did mention some of this in the earlier email, but you have gone > into more detail. > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004406.html > > As you will have noticed, many of the things in Bio.listfns could > nowadays be done in pure python with a set. > My brief look at the function defined by Bio.listfns indicated that these were only useful if a list could not be converted into a dictionary. I am not sure that this is required at least for Bio/MaxEntropy.py, Bio/NaiveBayes.py, and Bio/MarkovModel.py. Personally I would be inclined to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I know this is duplication but hopefully this would be addressed if someone updates the code. Bruce From biopython at maubp.freeserve.co.uk Thu Oct 23 17:25:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 23 Oct 2008 22:25:47 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <4900C75E.9020907@gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> Message-ID: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Bruce wrote: > Personally I would be inclined > to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I > know this is duplication but hopefully this would be addressed if someone > updates the code. If it were only pure python, I might agree with you. But given there is C code which might well be worthwhile for speed, I'm not in any hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to be deprecated and removed in the future should suffice for the time being. Peter From bugzilla-daemon at portal.open-bio.org Thu Oct 23 23:03:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Oct 2008 23:03:09 -0400 Subject: [Biopython-dev] [Bug 2626] New: Bio.PDB mmCIFParser parse exceptions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2626 Summary: Bio.PDB mmCIFParser parse exceptions Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: cjoldfield at gmail.com I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate). Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary. A few examples of files that failed to parse: http://www.rcsb.org/pdb/files/1alw.cif.gz http://www.rcsb.org/pdb/files/1det.cif.gz http://www.rcsb.org/pdb/files/1tmy.cif.gz A few with missing fields: http://www.rcsb.org/pdb/files/1mfl.cif.gz http://www.rcsb.org/pdb/files/1tfj.cif.gz http://www.rcsb.org/pdb/files/1zn8.cif.gz The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse. x86_64 gentoo linux 2008, src BioPython install __CODE__ import sys from Bio.PDB import * if len(sys.argv) != 2: print "usage: mmCifParseCheck.py " sys.exit(0) structFile = sys.argv[1] resultString = ""; #parse to structure object numRes = 0 parser=MMCIFParser() try: structure=parser.get_structure('test',structFile) for model in structure: for chain in model: for residue in chain: if(residue.id[0][:2] != "H_"): numRes += 1 except: resultString += "parse to structure object failed\n"; else: resultString += "parse to structure object succeeded\n"; #parse whole mmCIF file to dict try: mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile) except: resultString += "parse to dict failed\n"; else: resultString += "parse to dict succeeded\n"; #get a required entry try: id = mmcif_dict['_entry.id'] except: resultString += "key lookup failed\n"; else: resultString += "key lookup succeeded\n"; print resultString print "number of non-het residues " + str(numRes) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 09:30:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:30:15 -0400 Subject: [Biopython-dev] [Bug 2627] New: Updated Bio.MarkovModel to remove oldnumeric and listfns imports Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2627 Summary: Updated Bio.MarkovModel to remove oldnumeric and listfns imports Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have updated Bio.MarkovModel to remove using numpy.oldnumeric and Bio.listfns. Hopefully I found the correct places because of the usage of 'from import *'. The test_MarkovModel.py does pass and the commented section using Baum-Welch does run without errors. However, this is not my area so it may not be completely correct. So I would appreciate any other testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 09:32:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:32:03 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200810241332.m9ODW3wJ004124@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 ------- Comment #1 from bsouthey at gmail.com 2008-10-24 09:32 EST ------- Created an attachment (id=1012) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1012&action=view) Updated MarkovModel.py to remove numpy.oldnumeric and listfns -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 09:35:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:35:20 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200810241335.m9ODZKP4004615@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 ------- Comment #2 from bsouthey at gmail.com 2008-10-24 09:35 EST ------- Created an attachment (id=1013) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1013&action=view) Modified test_MarkovModel to remove numpy.oldnumeric import This is a modified version of the Bio test for Bio.MarkovModel. Remove the triple quotes to use Baum-Welch because this was commented out in the original test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Oct 24 09:50:01 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 24 Oct 2008 08:50:01 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Message-ID: <4901D289.6070503@gmail.com> Peter wrote: > Bruce wrote: > >> Personally I would be inclined >> to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I >> know this is duplication but hopefully this would be addressed if someone >> updates the code. >> > > If it were only pure python, I might agree with you. But given there > is C code which might well be worthwhile for speed, I'm not in any > hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes > and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to > be deprecated and removed in the future should suffice for the time > being. > > Peter > > Hi, Bug 2627 should include a modified version of Bio.MarkovModel and the associated test that removes Bio.listfns as well as the usage of numpy.oldnumeric (I hate the use of 'from x import *'). I really don't know enough about MarkovModel to know how correct it really is. I only worked with getting the test code to work including the Baum-Welsh part that is commented out in the test. I'll try to get to Bio.MaxEntropy.py and Bio.NaiveBayes over the next week as time permits. Also, Bio.pairwise2 appears to require the functionality of Bio.listfns. Since speed is relative, I think you need some type of benchmarking on this, which in turn needs a good example. Bruce From bugzilla-daemon at portal.open-bio.org Fri Oct 24 10:51:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 10:51:00 -0400 Subject: [Biopython-dev] [Bug 2628] New: Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2628 Summary: Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Motivation: When creating a sequence (or alignment) file, It is sometimes useful to know how many records (or alignments) were written out. This is easy if your records are in a list: records = list(...) SeqIO.write(records, handle, format) print "Wrote %i records" % len(records) If however your records are from a generator/iterator (e.g. a generator expression, or some other iterator) you cannot use len(records). You could turn this into a list just to count them, but this wastes memory. It would therefore be useful to have the count returned: records = some_generator count = SeqIO.write(records, handle, format) print "Wrote %i records" % count For a precedent, the BioSQL loader returns the number of records loaded into the database. Currently Bio.SeqIO.write(...) and Bio.AlignIO.write(...) have no return value, so adding a return value is a backwards compatible enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 16:38:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:38:39 -0400 Subject: [Biopython-dev] [Bug 2629] New: Updated Bio.NaiveBayes to listfns import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2629 Summary: Updated Bio.NaiveBayes to listfns import Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have attempted to modify Bio/NaiveBayes.py to remove the dependency on Bio.listfns functions. Also, made use of the numpy namespace rather than using 'from numpy import *'. Also, I made a testing file with two examples, the car data is not mine but has a worked example and the other is Fisher's iris data. Fisher's data is not really appropriate because it has continuous data and Bio.NaiveBayes only handles discrete data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 16:40:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:40:38 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200810242040.m9OKecUX004894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #1 from bsouthey at gmail.com 2008-10-24 16:40 EST ------- Created an attachment (id=1014) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1014&action=view) Modified NaiveBayes code This is the modified code for Bio/NaiveBayes.py that removes Bio.listfns requirements and a minor change to how numpy is imported. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 16:42:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:42:03 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200810242042.m9OKg3pT005077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #2 from bsouthey at gmail.com 2008-10-24 16:42 EST ------- Created an attachment (id=1015) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1015&action=view) Example code on using the new NaiveBayes code This has two example data sets that use the new NaiveBayes code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Sun Oct 26 17:34:15 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 26 Oct 2008 14:34:15 -0700 Subject: [Biopython-dev] CE implementation in Python Message-ID: Interesting donation from Jason Vertrees. CE is a well-known structural alignment program from Phil Bourne's lab at UCSD. -------- Original Message -------- Subject: BiopPython & Structure Alignments Date: Fri, 17 Oct 2008 23:45:35 -0400 From: Jason Vertrees To: idoerg at burnham.org Iddo, I'm not sure if this might be of assitance to the BioPython project, but I implemented a version of CE Align (structure alignment algorithm) as an extension to the PyMOL program. The code is open-source (BSDL) and I have two versions: (1) Pure Python -- slow; essentially unusable, but all Python. A typica alignment might take about 30 seconds. (2) Mixed Python/C implementation: the math is done in C/C++ (implemented both) so it's very fast. The good news for BioPython is that I used distutils and the typical Python Extending procedures to write the code: BioPython might very easily be able to plug in the code so you don't have to access a web server to do structure alignments. A typical alignment takes about 0.2 seconds. If you're at all interested, please look at http://www.pymolwiki.org/index.php/Cealign as that's there the code lives. I don't have time to port the code for the project, but I don't think it would be all that hard given some effort. I'm happy to answer any questions. HTH, -- Jason Vertrees -- Jason Vertrees, PhD Boston U. -- jasonv at bu.edu Dartmouth -- jv at cs.dartmouth.edu -- Iddo Friedberg, Ph.D. Atkinson Hall, mail code 0446 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0446, USA T: +1 (858) 534-0570 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Mon Oct 27 05:57:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Oct 2008 09:57:06 +0000 Subject: [Biopython-dev] CE implementation in Python In-Reply-To: References: Message-ID: <320fb6e00810270257x43f8de09scd544b3a3f43ef73@mail.gmail.com> On Sun, Oct 26, 2008 at 9:34 PM, Iddo Friedberg wrote: > Interesting donation from Jason Vertrees. CE is a well-known > structural alignment program from Phil Bourne's lab at UCSD. This sounds interesting - presumably to integrate it into Biopython we would need to make it work on the Bio.PDB chain/structure objects. For example, I would expect the CE code would need to be able to get the secondary structure of each residue (i.e. where did the PDB file say the alpha helices and beta sheets were). However, I will not have the time nor the motivation to do this myself. Its nice that in addition to pure python he has both C and C++ back ends. My gut instinct is to avoid the C++ code and stick with the C code (given previous cross platform fun with C++). Its a little surprising his python code is so much slower - but maybe it doesn't take full advantage of numpy at the moment? Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 27 09:37:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:37:53 -0400 Subject: [Biopython-dev] [Bug 2631] New: Updated Bio.MaxEntropy to remove listfns import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2631 Summary: Updated Bio.MaxEntropy to remove listfns import Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have updated Bio/MaxEntropy.py to remove the dependency on Bio/listfns.py and replaced the from numpy.oldnumeric import. Also, I created a small example of the usage. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 09:38:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:38:32 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271338.m9RDcWv5018326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #1 from bsouthey at gmail.com 2008-10-27 09:38 EST ------- Created an attachment (id=1016) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1016&action=view) Modified MaxEntropy code -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 09:46:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:46:43 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271346.m9RDkhGb018836@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #2 from bsouthey at gmail.com 2008-10-27 09:46 EST ------- Created an attachment (id=1017) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1017&action=view) Example code on using the new MaxEntrophy code This is an example of using the MaxEntrophy code. Perhaps the most important aspect of using MaxEntrophy is that it requires an iterable type that contains the functions for classification. MaxEntrophy will then iterate through each of these functions to predict the status based on that function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Oct 27 10:02:07 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 27 Oct 2008 09:02:07 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Message-ID: <4905C9DF.7000602@gmail.com> Peter wrote: > Bruce wrote: > >> Personally I would be inclined >> to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I >> know this is duplication but hopefully this would be addressed if someone >> updates the code. >> > > If it were only pure python, I might agree with you. But given there > is C code which might well be worthwhile for speed, I'm not in any > hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes > and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to > be deprecated and removed in the future should suffice for the time > being. > > Peter > > Hi, Just an update, I think that I have removed the listfns dependency from Bio.MaxEntropy.py (Bug 2631 ), Bio.NaiveBayes (Bug 2629 )and Bio.MarkovModel (Bug 2627 ) - the order is also in terms of low to high number of changes that were required. I also removed any 'from x import *' as well especially where x=numpy.oldnumeric. I also added examples that use MaxEntropy and NaiveBayes but these should have bioinformatics examples. Note that NaiveBayes is the discrete version. Bruce From bugzilla-daemon at portal.open-bio.org Mon Oct 27 12:16:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 12:16:52 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271616.m9RGGqBY031121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-27 12:16 EST ------- I don't know if it matters for MaxEntropy, but your re-implementation of Bio.listfns.itemindex does not preserve the current behaviour with duplicate entries: >>> x = [1,2,3,3,2,5] >>> from Bio.listfns import itemindex >>> itemindex(x) {1: 0, 2: 1, 3: 2, 5: 5} >>> class2index ={} >>> for i, j in enumerate(x): class2index.update({j:i}) >>> class2index {1: 0, 2: 4, 3: 3, 5: 5} -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 12:55:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 12:55:04 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271655.m9RGt43L001261@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #4 from bsouthey at gmail.com 2008-10-27 12:55 EST ------- (In reply to comment #3) > I don't know if it matters for MaxEntropy, but your re-implementation of > Bio.listfns.itemindex does not preserve the current behaviour with duplicate > entries: > > >>> x = [1,2,3,3,2,5] > >>> from Bio.listfns import itemindex > >>> itemindex(x) > {1: 0, 2: 1, 3: 2, 5: 5} > >>> class2index ={} > >>> for i, j in enumerate(x): > class2index.update({j:i}) > > >>> class2index > {1: 0, 2: 4, 3: 3, 5: 5} > In this case, x is a return type of the listfns.items() function, where the doc string of listnfs: """items(l) -> list of items Generate a list of one of each item in l. The items are returned in arbitrary order. """ Therefore duplicates are not allowed to occur (and duplicates would not make sense anyhow). But in order to be similar to the original code, just avoid updating if the key already exists: class2index ={} for i, j in enumerate(x): if not class2index.has_key(j): class2index.update({j:i}) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 15:08:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 15:08:48 -0400 Subject: [Biopython-dev] [Bug 2634] New: PAM30 Matrix doesn't work with Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2634 Summary: PAM30 Matrix doesn't work with Product: Biopython Version: 1.48 Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ngcrawfo at bu.edu I send it this code: result_handle = NCBIWWW.qblast("blastp", "nr", seq_record.seq.tostring(), matrix_name = 'PAM30', word_size='2', expect='30000', composition_based_statistics='no adjustment') And I get this: ValueError: invalid literal for int() with base 10: function qblast in NCBIWWW.py at line 769 rid, rtoe = _parse_qblast_ref_page(handle) function _parse_qblast_ref_page in NCBIWWW.py at line 828 return rid, int(rtoe) Note: if I change the matrix name to 'PAM70' or BlOSUM62 there is no error. I'm trying to emulate the short sequence parameters in BLAST so I need to use PAM30 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 16:14:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 16:14:53 -0400 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200810272014.m9RKErND028218@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|PAM30 Matrix doesn't work |PAM30 Matrix doesn't work |with |with qblast ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-27 16:14 EST ------- The error from Biopython is because it can't find the RID and RTOE references in what is normally a "please wait" HTML page. The reason being you have triggered an error. From dumping the HTML requested: ------------------------------- Message ID#33 Error: Cannot validate the Blast options: Gap existence and extension values of 11 and 1 not supported for PAM30 supported values are: 32767, 32767 7, 2 6, 2 5, 2 10, 1 9, 1 8, 1 This error message indicates that the combination of options for this Blast search is inconsistent or invalid. This can happen when the selected Blast program does not support one of the options provided, when two or more options have conflicting values, etc. If you are using URL API, please check the options mentioned in the error message string and re-submit your search. Please note that the current version of the Blast CGI application is stricter at validating Blast options than it has been historically. If this error persists, please, contact Blast-help at ncbi.nlm.nih.gov for more help. ------------------------------- Short of printing out the whole HTML dump, I'm not sure how best to tell the user about this kind of error - automatically extracting the error message looks unreliable. In anycase, I think you need to investigate the gap options and see if you can match what you are trying to mimic. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 16:24:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 16:24:44 -0400 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200810272024.m9RKOir7028873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 ------- Comment #2 from ngcrawfo at bu.edu 2008-10-27 16:24 EST ------- Peter, Thanks for the insight. I'll mess about with the gap and existence parameters. Thanks again! - Nick (In reply to comment #1) > The error from Biopython is because it can't find the RID and RTOE references > in what is normally a "please wait" HTML page. The reason being you have > triggered an error. From dumping the HTML requested: > > ------------------------------- > Message ID#33 Error: Cannot validate the Blast options: Gap existence and > extension values of 11 and 1 not supported for PAM30 > supported values are: > 32767, 32767 > 7, 2 > 6, 2 > 5, 2 > 10, 1 > 9, 1 > 8, 1 > > This error message indicates that the combination of options for this Blast > search is inconsistent or invalid. This can happen when the selected Blast > program does not support one of the options provided, when two or more options > have conflicting values, etc. If you are using URL API, please check the > options mentioned in the error message string and re-submit your search. Please > note that the current version of the Blast CGI application is stricter at > validating Blast options than it has been historically. If this error persists, > please, contact href="mailto:blast-help at ncbi.nlm.nih.gov">Blast-help at ncbi.nlm.nih.gov for > more help. > ------------------------------- > > Short of printing out the whole HTML dump, I'm not sure how best to tell the > user about this kind of error - automatically extracting the error message > looks unreliable. > > In anycase, I think you need to investigate the gap options and see if you can > match what you are trying to mimic. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 11:25:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 11:25:57 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810281525.m9SFPvCV031306@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 11:25 EST ------- (In reply to comment #22) > ... Stop translating at the first in frame stop codon (see my comment 18). > Again, a boolean argument, and for compatibility with previous Biopython > conventions, defaulting to False (i.e. read through). Possible names "stop", > "to_stop", "auto_stop", "terminate", ... > > In this case, how should the method behave if there is no final stop codon - > raise an error or not? Also should the stop codon be included in the returned > sequence (note that the Bio.Translate module did not include the stop symbol). Added in CVS with the optional argument named "to_stop" (boolean), defaulting to False (continue translating through any stops). See Bio/Seq.py revision 1.51 and Tests/test_seq.py revision 1.28 I'm happy to discuss alternative names for this argument (up until the next release of Biopython). This choice was influenced by the existing method name translate_to_stop in Bio.Translate (which can now be declared obsolete and awaiting deprecation). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 12:33:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 12:33:49 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810281633.m9SGXnBO017256@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 12:33 EST ------- I have updated the Tutorial, and also: The undocumented Bio.utils.translate() and Bio.utils.transcribe() etc have been deprecated. The undocumented Bio.SeqUtils.translate() has been deprecated. Bio.Translate has been labelled as obsolete. This was the documented way to do a translation, so I'd like to give people some advance warning before we add a deprecation message. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 14:13:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 14:13:42 -0400 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200810281813.m9SIDgt8027734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 14:13 EST ------- Created an attachment (id=1023) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1023&action=view) Patch for Bio.SeqIO and Bio.AlignIO and their unit tests Adds an integer return value to Bio.SeqIO.write() and Bio.AlignIO.write() giving the number of records/alignments written to the handle. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 18:12:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 18:12:19 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810282212.m9SMCJrr018731@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #3 from chapmanb at 50mail.com 2008-10-28 18:12 EST ------- Peter -- this is a great summary of the problem. The fix you propose in the _FeatureParser should definitely go in. In terms of the higher level how should we treat this question, it wasn't really thought about two heavily in the initial implementation since it's a fringe case. Your proposal for handling it sounds fine. In this case BetweenPosition would change to something like: def __init__(self, position, extension = 0): AbstractPosition.__init__(self, position + 1, 0) self._between_position = position self._between_extension = extension def __str__(self): return "(%s^%s)" % (self._between_position, self._between_position + self._between_extension) We would just hide the "between" junk from the standard Position stuff and represent it as your proposal. How does that sound generally? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:57:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:05 -0400 Subject: [Biopython-dev] [Bug 1492] Martel Parser fails on Bio.db["protein-genbank-cgi"] entry In-Reply-To: Message-ID: <200810291657.m9TGv5Y0006396@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1492 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:57:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:19 -0400 Subject: [Biopython-dev] [Bug 1589] Parsing fails at "operon" tag with RecordParser or FeatureParser In-Reply-To: Message-ID: <200810291657.m9TGvJ0H006452@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1589 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:57:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:36 -0400 Subject: [Biopython-dev] [Bug 1758] genbank parser chokes on /transl_except In-Reply-To: Message-ID: <200810291657.m9TGvaxs006520@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1758 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:58:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:58:23 -0400 Subject: [Biopython-dev] [Bug 2072] GenBank parser breaks: LOCUS line does not contain valid sequence type (DNA, RNA, ...) In-Reply-To: Message-ID: <200810291658.m9TGwNrH006661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2072 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:58 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:58:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:58:47 -0400 Subject: [Biopython-dev] [Bug 2076] EMBL to GenBank converter should fix unterminated lines In-Reply-To: Message-ID: <200810291658.m9TGwld5006730@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2076 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:58 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:59:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:03 -0400 Subject: [Biopython-dev] [Bug 1920] Bio.Geo does not support recent GEO files In-Reply-To: Message-ID: <200810291659.m9TGx3Fx006785@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1920 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:59:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:30 -0400 Subject: [Biopython-dev] [Bug 1773] Martel.Parser.ParserPositionException In-Reply-To: Message-ID: <200810291659.m9TGxUhd006827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1773 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 12:59:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:41 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200810291659.m9TGxf0Z006863@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Oct 30 17:35:31 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Oct 2008 21:35:31 +0000 Subject: [Biopython-dev] [BioPython] calculate F-Statistics from SNP data In-Reply-To: <6d941f120810251834q87495d5re558cf179356a8b0@mail.gmail.com> References: <5aa3b3570810160302q48df31d8h777cb760b763b77d@mail.gmail.com> <5aa3b3570810200657i4ff7ded1p5198a801ff9eccd7@mail.gmail.com> <5aa3b3570810220325g563f6a22x3f30185ae3a01b4e@mail.gmail.com> <320fb6e00810220334n6aedc5a2m7a560c25ff703917@mail.gmail.com> <6d941f120810220903s6cdc034fhec369677ac5896c9@mail.gmail.com> <5aa3b3570810221010h787c74c7h65084e05964de71d@mail.gmail.com> <6d941f120810230810k4e48c48cp5c55722a851005cf@mail.gmail.com> <5aa3b3570810230925k1eccff39kd47f022842576a46@mail.gmail.com> <6d941f120810251804o31ed44cat49b407db36a6891e@mail.gmail.com> <6d941f120810251834q87495d5re558cf179356a8b0@mail.gmail.com> Message-ID: <6d941f120810301435m7c151ad5u77def486eb24a70c@mail.gmail.com> Hi, FYI, I am going to continue this discussion to biopython-dev, as I think it makes more sense there. Especially the parts about implementation suggestions. On Sun, Oct 26, 2008 at 1:34 AM, Tiago Ant?o wrote: > I just want add on an extra comment explaining why I oppose doing an > individual object: > > I have the following questions (and others) in my mind, which I don't > know the answer. I am not looking for answers to them, I am just > trying to illustrate the difficulty of the problem. > > 1. For a certain marker, do we store the genomic position of the > marker? Some (most) statistics don't use this information. For many > species this information is not even available. But for some > statistics this information is mandatory... > 2. For a microsatellite do we store the motif and number of repeats or > the whole sequence? (see 4) > 3. If one is interested in SNPs and one has the full sequences does > one store the full sequences or just the SNPs? If you store just the > SNPs then you cannot do sequence based analysis in the future (say > Tajima D). If you store everything then you are consuming memory and > cpu. > 4. If one just wants to do frequency statistics (Fst), do you store > the marker or just the assign each one an ID and store the ID? It is > much cheaper to store an ID than a full sequence. > > Populations > 1. Support for landscape genetics? I mean geo-referentiation > 2. Support for hierarchical population structure? > 3. Do we cache statistics results on Population objects? > > > Let me take your class marker: > class Marker: > total_heterozygotes_count = 0 > total_population_count = 0 > total_Purines_count = 0 # this could be renamed, of course > total_Pyrimidines_count = 0 > > How would this be useful for microsatellites? Why purines, and if my > marker is a protein? If it is a SNP I want to know the nucleotide? And > if I am studying proteins and I want to have the aminoacid? > > Dont take me wrong, I have done this path. To solve my particular > problems is not very hard. To have a framework that is usable by > everybody, it is a damn hard problem. And we dont really need to solve > it (ok, it would be nice to do things to populations in general, that > I agree). But the fundamental is: read file, calculate statistics. > That doesnt need population and individual objects. > > If we end up having too many formats a consolidation step might be > needed in the future (to avoid having 10 split_in_pops). That I agree. > -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org From tiagoantao at gmail.com Thu Oct 30 19:58:57 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Oct 2008 23:58:57 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I Message-ID: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Hi, Statistics is the most important part of population genetics modules. In fact one could say that statistics where invented FOR population genetics (check http://en.wikipedia.org/wiki/Ronald_Fisher ). When I started to work on the population genetics module I decided to delay the statistics module a bit, in order to get experience with the whole biopython project before committing to do the most important thing. Irrespective of it is possible or not to link scipy or not, now seems to be the time to advance, especially considering that Giovanni is interested in participating. A few of points need to be said before suggesting on how to put statistics in Bio.PopGen 1. Whatever design is put in, it should be reasonably future proof: in a few releases it should not be a good idea to break older code. That should be avoided in as much as possible. 2. It goes without saying that the code should be useful to everybody doing population genetics and not only the authors of Bio.PopGen: all kinds of markers and population structures should be accommodatable in the future . 3. For reasons that I've partially explained on the biopython list, I don't think a OO model explicitly based on individuals or populations e good (or even necessary) 4. Any framework should be more pragmatic than anything else. I would envision a typical use case like this a) read data (from a certain data source) b) Do some basic processing (changing individuals or populations, converting markers) c) calculate statistics A few comments regarding each of these points: a) data sources, file formats: file formats in population genetics exist in large quantities and are essencialy completely ad-hoc, most made in a very naive way. Good or BAD, that is what there is. The most used format (some kind of de facto standard, GenePop) can only be used for frequency-based statistics, for all the rest things are fragmented (although, if there are no population structure and the data is sequences than standard sequence based formats can be used - but from my experience this is a small minority) b) basic processing: This is the point where a OO model of individuals and populations would pay, but I think it is not the "meat of the issue" c) statistics: there are of every type and for every taste. If you want to have an idea of what is out there an interesting place to look at is the arlequin3 manual: http://cmpg.unibe.ch/software/arlequin3/arlequin31.pdf (part of the manual is UI description, but especially starting at page 89 - the table there is a good overview - there are descriptions of the overall panorama). With time, and after at least 3 failed attempts to think in terms of individuals/populations I started to cristalize around a model centered on types of statistics. This model ends up actually having implicit models of populations and individuals, and that is, in fact, there. It is just implicit and not unified: different kinds of statistics have different implicit models. The model that I would like to propose, centered around statistics, will be the subject of my next email (which I will send in the next couple of days - still under design and lost sleep). I might split it in 2 parts (concepts and suggestions for implementation). From bsouthey at gmail.com Thu Oct 30 22:18:56 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 30 Oct 2008 21:18:56 -0500 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Message-ID: Hi, Can you please be more specific especially in terms of: What statistics do you want to compute? What type of data ? Obviously these are rather interdependent. In my experience, the statistic and the data type really dictate how to proceed. Typically you start with pedigree and data files then add more files for genetic markers (often chromosome specific) etc. Each requires a specific format and appropriate links between them. Again this really depends on what you want to calculate and how you do it. You will probably find that object orientated approach with individuals, families, populations, models and data type etc. may actually be helpful and necessary depending on what you want to do. This it really help me with QTL mapping code especially the overall design because you makes think exactly where things should go and that was far more important than the actual coding. While some of it is implicit, separating out some components will be necessary especially getting population-based statistics for data values recorded on individuals. Bruce On Thu, Oct 30, 2008 at 6:58 PM, Tiago Ant?o wrote: > Hi, > > Statistics is the most important part of population genetics modules. > In fact one could say that statistics where invented FOR population > genetics (check http://en.wikipedia.org/wiki/Ronald_Fisher ). > When I started to work on the population genetics module I decided to > delay the statistics module a bit, in order to get experience with the > whole biopython project before committing to do the most important > thing. > Irrespective of it is possible or not to link scipy or not, now seems > to be the time to advance, especially considering that Giovanni is > interested in participating. > A few of points need to be said before suggesting on how to put > statistics in Bio.PopGen > > 1. Whatever design is put in, it should be reasonably future proof: in > a few releases it should not be a good idea to break older code. That > should be avoided in as much as possible. > 2. It goes without saying that the code should be useful to everybody > doing population genetics and not only the authors of Bio.PopGen: all > kinds of markers and population structures should be accommodatable in > the future . > 3. For reasons that I've partially explained on the biopython list, I > don't think a OO model explicitly based on individuals or populations > e good (or even necessary) > 4. Any framework should be more pragmatic than anything else. I would > envision a typical use case like this > a) read data (from a certain data source) > b) Do some basic processing (changing individuals or populations, > converting markers) > c) calculate statistics > A few comments regarding each of these points: > a) data sources, file formats: file formats in population > genetics exist in large quantities and are essencialy completely > ad-hoc, most made in a very naive way. Good or BAD, that is what there > is. The most used format (some kind of de facto standard, GenePop) can > only be used for frequency-based statistics, for all the rest things > are fragmented (although, if there are no population structure and the > data is sequences than standard sequence based formats can be used - > but from my experience this is a small minority) > b) basic processing: This is the point where a OO model of > individuals and populations would pay, but I think it is not the "meat > of the issue" > c) statistics: there are of every type and for every taste. If > you want to have an idea of what is out there an interesting place to > look at is the arlequin3 manual: > http://cmpg.unibe.ch/software/arlequin3/arlequin31.pdf > (part of the manual is UI description, but especially starting at page > 89 - the table there is a good overview - there are descriptions of > the overall panorama). > > With time, and after at least 3 failed attempts to think in terms of > individuals/populations I started to cristalize around a model > centered on types of statistics. This model ends up actually having > implicit models of populations and individuals, and that is, in fact, > there. It is just implicit and not unified: different kinds of > statistics have different implicit models. > The model that I would like to propose, centered around statistics, > will be the subject of my next email (which I will send in the next > couple of days - still under design and lost sleep). I might split it > in 2 parts (concepts and suggestions for implementation). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Fri Oct 31 06:03:28 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 31 Oct 2008 10:03:28 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Message-ID: <6d941f120810310303q439c1225r26511944066ab49@mail.gmail.com> On Fri, Oct 31, 2008 at 2:18 AM, Bruce Southey wrote: > Can you please be more specific especially in terms of: > What statistics do you want to compute? > What type of data ? > > Obviously these are rather interdependent. I want a framework that can accommodate all statistics and all types of data (this will be subject of my next email). I personally am concerned for now with F statistics, allelic diversity, expected heterosigosity and such . I.e., frequency based statistics. To put it in another way: marker-independent. A great deal of studies in population genetics is actually frequency based. But, I don't want a particular view of the world (mine or other) to dictate the end result. My expectation is that, in a few weeks the statistics above will be in biopython (they are already implemented in functioning code) but that that doesn't impair the ability to continue in other directions (marker-dependent statistics, genome-wide statistics). > In my experience, the statistic and the data type really dictate how > to proceed. Typically you start with pedigree and data files then add > more files for genetic markers (often chromosome specific) etc. Each > requires a specific format and appropriate links between them. Again > this really depends on what you want to calculate and how you do it. I think the key point is precisely that diversity of statistics and data types, and how the drive the whole thing. I also have found that different people do completely different things. From people working with humans with lots of data and money, to people with model species, to people working in conservation of endangered species. Some people have thousands of markers and lots of individuals others have 10 individuals and 20 markers ("poor-man" markers like microsatellites). Not to talk about population and landscape genetics statistics. Or hierarchical population structure. Not to talk about new sequencing methods and the creative uses that we are starting to see with them. > You will probably find that object orientated approach with > individuals, families, populations, models and data type etc. may > actually be helpful and necessary depending on what you want to do. I've tried to implement several OO frameworks with these kinds of relations and they all failed. They fail precisely because of the immense diversity of statistics, data-formats and use-cases. I always ended trashing everything because of a use case/statistic that would render the model awkward or useless. It is bad over engineering. Correcting things is not bad, but in biopython we don't want to break interfaces in every release. Even if there is a good, future-proof model it will always be either a poor fit in some situations and have performance problems (performance is becoming a more serious issue every day). I think the first approach is thinking: lets do OO with populations, individuals, ... . But experience in trying to do that will lower the expectations of what can be delivered. > This it really help me with QTL mapping code especially the overall > design because you makes think exactly where things should go and that > was far more important than the actual coding. While some of it is > implicit, separating out some components will be necessary especially > getting population-based statistics for data values recorded on > individuals. Getting a correct future-proof design is above my pay-grade using concepts like individuals and populations. And I believe is above the pay grade of 100% of people that I know in this area. I think there is no need for it anyway. I will try to write about this in the next part of my emails. From mjldehoon at yahoo.com Wed Oct 1 00:24:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Sep 2008 17:24:18 -0700 (PDT) Subject: [Biopython-dev] Numpy conversion In-Reply-To: <37659.57326.qm@web62402.mail.re1.yahoo.com> Message-ID: <228132.43778.qm@web62402.mail.re1.yahoo.com> Bio.kNN is the only module that imports Bio.distance. Bio.distance is written in Python, but it also imports a C version of Bio.distance if it is available. From the comments in the code, I gather that the purpose of the C-version is to get fast distance calculations without using Numeric / NumPy. However, Bio.kNN itself uses Numeric / NumPy, which defeats the purpose of the C-version of Bio.distance. I would therefore like to propose to add a NumPy-aware version of the code in Bio.distance to Bio.kNN, and to deprecate Bio.distance. Any objections? --Michiel. --- On Thu, 9/18/08, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython-dev] Numpy conversion > To: "Peter" > Cc: biopython-dev at biopython.org > Date: Thursday, September 18, 2008, 10:10 AM > > I've not used it myself, but it sounds handy. > Michiel, > > does this overlap at all with your clustering module? > > No, it doesn't. Bio.Cluster contains unsupervised > clustering methods only. The k-nearest neighbors in Bio.kNN > is a supervised learning method. > > --Michiel. > > --- On Wed, 9/17/08, Peter > wrote: > > > From: Peter > > Subject: Re: [Biopython-dev] Numpy conversion > > To: mjldehoon at yahoo.com > > Cc: biopython-dev at biopython.org > > Date: Wednesday, September 17, 2008, 10:29 AM > > On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon > > wrote: > > > Hi everybody, > > > > > > I am now looking at the pure-python modules that > make > > use of Numerical Python / NumPy. > > > Bio.kNN is one of them; this also happens to be > the > > only module that imports Bio.distance, > > > which also depends on NumPy. > > > > > > What I am not sure about is the usage of Bio.kNN. > A > > quick google search didn't reveal much, > > > suggesting that it is not widely used. Bio.kNN > > currently is not documented in the tutorial, but > > > the code itself is reasonably well documented. > > > > > > How do you guys feel about this module? Should we > keep > > it? > > > > > > > I've not used it myself, but it sounds handy. > Michiel, > > does this > > overlap at all with your clustering module? > > > > Peter > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Wed Oct 1 08:36:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 09:36:38 +0100 Subject: [Biopython-dev] Numpy conversion In-Reply-To: <228132.43778.qm@web62402.mail.re1.yahoo.com> References: <37659.57326.qm@web62402.mail.re1.yahoo.com> <228132.43778.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00810010136h8f48506nd9b81f1f6a827e70@mail.gmail.com> On Wed, Oct 1, 2008 at 1:24 AM, Michiel de Hoon wrote: > Bio.kNN is the only module that imports Bio.distance. Bio.distance is > written in Python, but it also imports a C version of Bio.distance if it > is available. From the comments in the code, I gather that the > purpose of the C-version is to get fast distance calculations without > using Numeric / NumPy. However, Bio.kNN itself uses Numeric / > NumPy, which defeats the purpose of the C-version of Bio.distance. > > I would therefore like to propose to add a NumPy-aware version of > the code in Bio.distance to Bio.kNN, and to deprecate Bio.distance. > > Any objections? If Bio.kNN is the only usage of Bio.distance, then that sounds very sensible. However, there is a small chance that someone out there is using Bio.distance (perhaps because it doesn't use Numeric/NumPy). As a courtesy, we could ask on the main mailing list if anyone is using it before its deprecation, but otherwise I have no objections. Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 1 08:42:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 04:42:15 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810010842.m918gFCp026095@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #28 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 04:42 EST ------- In comment #26 Peter wrote: >> Using shell=False [with subprocess.Popen] works while shell=True >> fails on Windows (I tested on Windows XP with Python 2.5 from IDLE). >> However, the opposite is true on Mac OS X with python 2.5 from IDLE. >> This is a pain. In comment #27 Patnaik wrote: > I tried the subprocess routine through a test.py file on a Mac OS X > 10.5.5 with Python 2.5.2, but w/o using Biopython. I had to use > 'shell=True', otherwise with 'shell=False',I get: > > File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", > ... > _execute_child > raise child_exception > > With 'shell=True', it works even when there is a space in the > file-path/names of the BLAST executable, the database or the input > sequence file (the escaping of the spaces needs to be properly done). Good - at least that confirms the shell option differences I found between Windows and Mac. We'd need to check on Linux before we can write something using subprocess which should work on all the main platforms. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 08:42:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 04:42:39 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200810010842.m918gdnD026134@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #5 from lpritc at scri.sari.ac.uk 2008-10-01 04:42 EST ------- (In reply to comment #4) > (In reply to comment #3) > Good, as where is the fun otherwise? :-) I think that the discussion has been useful. > > I like the idea of > > making Seq.py more string-like, in part because when I first started using > > Biopython, I missed being able to slice, and other conveniently string-y > > things. > > Okay, so what is still missing with these new changes? I like the new, and proposed, changes to Seq. "When I first started" was nearly eight years ago, now... > > string.find() has the behaviour of only returning a single match - that which > > is closest to the string start. This might be useful to some (in ORF-finding, > > perhaps), but I expect I would use a finditer() method that returned all > > matches (for which there is no equivalent string method) almost exclusively > > It is not correct to compare finditer (a re method) to find (a string method) > or for that matter re.match or re.search. I think that it's perfectly valid to compare pretty much anything to pretty much anything else, such as now when we have an opportunity to get the pattern-finding functionality we want/need into the Seq object. Substitution (e.g. using string.find() in place of re.finditer()) is a different matter. To me, string.find() and re.search() are pretty much equivalent, except for their internal implementation, query argument type and return value. re.match() is like string.startswith(), with the same caveats. re.finditer() has no string.method() equivalent, but I would still find such a method useful. I think the abstract distinction between search types here is: 1) Find match at start of sequence (re.match() and string.startswith()) 2) Find first match in sequence (re.search() and string.find()) 3) Find all non-overlapping matches in sequence (re.finditer() only) 4) Find all overlapping matches in sequence (neither re nor string) 1a) 2a) 3a) 4a) The same, but in the reverse complement. Moving down the list, the problem becomes more general. The type of search I need most often in biological sequences is number (4a), or (4) for proteins. Each of search types (1) to (3) (a or not) has a theoretically faster implementation than doing (4) then filtering the results. I don't mind having more than one search method with different names, or having to specify arguments to get a particular kind of search. I do mind not having (4a) as an option... BTW, for reverse complement searches, I'm happy for this to be an optional argument - when I wrote the code above, I didn't need anything but two-strand searches. > I definitely think that the user has to decide whether or not they > want overlapping matches not the developer. There is no option under this > implementation. There is no option in string or re, either - not because the developer has guessed that the user always wants it, but because they have effectively guessed that the user *never* wants it (or that, if they do, they'll generalise the search themselves). This is probably because they were writing more general libraries with different use cases (and, in the case of re, actual implementation restrictions) than the Seq object. We have an opportunity to have the find()/search()/whatever() method be biologically-relevant, and I think we should take it. I think that, because overlapping matches are biologically-informative, and I see no reason other than consistency with the re module (which is constrained for reasons that do not apply to biological sequences) not to do so, that we make the default behaviour to find overlapping matches, and provide an option to exclude overlaps (which will probably make internal implementation faster). > I am not for or against having an method that returns overlapping matches > rather I am against only having returning overlapping matches as the only > choice. I'm actually in full agreement with you on this. > > I don't think I understand this point. Would you prefer an re.search() like > > implementation that takes a Seq object as its query argument? I don't think > > I'd find that as useful, myself, as a method that just takes a string. Such a > > method could also maybe parse arguments so as to compile the regex from the > > Seq.data attribute though, fulfilling your requirement. > > What I mean is that a user should be able to either specify the pattern or > specify a regular expression object. In either case the optional flags that are > often useful to have like ignorecase are ignored. Ah, I see. I think that, because we are working with a restricted symbol set, we do not strictly need the full functionality that is present in re. We would need as a minimum for a domain-specific re-a-like syntax: o symbols in the sequence alphabet, including correctly-interpreted ambiguity codes o .*+$^ etc. wildcards o {m,n} - like syntax for repeats o [] and [^] set notation o lookahead and lookbehind All of which, except for correct interpretation of ambiguity codes, is already in re and with a few tweaks we could just use re methods internally for this. The ambiguity codes could perhaps be implemented by substitution of sets of symbols for each ambiguity code, and the conformance of the regular expression to the sequence alphabet ensured by a filter on the query. Having a method that intelligently accepts both strings and compiled regexes would suit me. I suggest reversing the query rather than reversing the subject sequence because reverse-complementing larger sequences is likely to take a comparatively long time... > Regardless of what a user actually wants, they must wait for two searches along > the sequence. After that finishes the user must examine each and every entry > (due to the match_locations.sort()) to find the strand regardless of what they > want to do. In my code, yes - because that was the functionality I wanted when searching whole genomes for exact pattern matches. It may not have come across in my first post, but I was proposing the code as a potential starting point (for discussion as much as for an implementaion), not as the finished article. > I do not any advantage in this than someone calling the function > twice to get match_locations and rev_locations, doing 'match_locations += > rev_locations' and match_locations.sort(). Assuming that the return value was the same as in my code above then yes, there is no particular computational advantage (except the negligible ones of making one instead of two function calls, and fewer calls/lines of code implying less opportunity for user error). But, and again I stress this, I wrote the code with a particular purpose in mind and not as an enhancement for all possible uses of the Seq object. Had I needed to perform single-strand searches on nucleotide sequences, I'd probably have hacked the code in the way you've been suggesting, with strandedness as an optional argument. > Okay, then more Zen: > "In the face of ambiguity, refuse the temptation to guess." Damn! I'm out of quotes... ;) Time to ask the question on the Biopython-users/BiP lists? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 10:03:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 06:03:11 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200810011003.m91A3BB5030638@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1000 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 06:03 EST ------- Created an attachment (id=1002) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1002&action=view) Patch to Bio/Seq.py for Seq object split, strip, lstrip and rstrip methods (v2) Revised patch, will now accept Seq or string arguments to the strip and split methods. Still needs proper unit tests, probably added to test_seq.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 1 16:29:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 17:29:02 +0100 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> Message-ID: <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> Peter wrote: > From a quick look at approach taken in the matplotlib > code, we could add something like this to setup.py > > __version__ = "Undefined" > for line in open('Bio/__init__.py'): > if (line.startswith('__version__')): > exec(line.strip()) > > setup( > name='biopython', > version=__version__, > author='The Biopython Consortium', > ... > > I'm happy to deal with this if we are agreed that we > should add a __version__ to Bio/__init__.py > (variations on the naming are possible, but this seems > to be a de-facto standard in python libraries). Any objections to making this change now? Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 1 20:06:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:06:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810012006.m91K6lPQ001470@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #29 from drpatnaik at yahoo.com 2008-10-01 16:06 EST ------- It seems the BLAST executables can accept multiple databases (database pointers) in the 'd' argument, but they need to be space-separated. When there is a space in a single database's path-value, BLAST can interpret the provided argument as two databases and then fail. This can be the reason why the path-values for input sequence files and databases need to be quoted/escaped in different ways: blast_db = r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin.db\""' input_seq = r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\30a.seq" If this is correct, it might be helpful if Biopython had a functionality to accept multiple databases (for BLAST) by using the list data-type: blast_db = [r'"C:\Documents and Settings\patnaik\My Documents\blast\data\mouse.db", r'"C:\Documents and Settings\patnaik\My Documents\blast\data\rat.db"] Biopython can then collapse the list items into a properly quoted/escaped string for BLAST's 'd' argument. If this is not feasible, then a note in the documentation will also be of help. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 1 20:30:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 1 Oct 2008 16:30:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810012030.m91KUkTx002912@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #30 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-01 16:30 EST ------- Peter wrote in comment 26 >> I've been having trouble with specifying BLAST databases >> with spaces in the path. Have you been able to demonstrate >> this with more than one database? Patnaik wrote in comment #29 > It seems the BLAST executables can accept multiple databases > (database pointers) in the 'd' argument, but they need to be > space-separated. Yes, that is correct. > When there is a space in a single database's path-value, BLAST > can interpret the provided argument as two databases and then > fail. This can be the reason why the path-values for input > sequence files and databases need to be quoted/escaped in > different ways: Yes, I agree. This is clear from some of the error messages BLAST gives when it cannot understand a BLAST database with a space in the name. I don't even know if using multiple databases with spaces is even possible. > If this is correct, it might be helpful if Biopython had > a functionality to accept multiple databases (for BLAST) > by using the list data-type: > ... > Biopython can then collapse the list items into a properly > quoted/escaped string for BLAST's 'd' argument. This is a nice idea *IF* we can establish what the rules for for making a properly quoted/escaped string for BLAST's 'd' argument (which may be different for different operating systems). We may want to email the NCBI for clarification here. > If this is not feasible, then a note in the documentation will > also be of help. For any documentation I would want to recommend using the win32api.GetShortPathName() function to avoid the spaces, with an example showing how to do this for the database name(s). To me this seems much simpler than the complex quoting solution. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 1 21:28:59 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 1 Oct 2008 22:28:59 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> Message-ID: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> On Wed, Sep 24, 2008, Peter wrote: > Using CVS Biopython compiled from source, the unit tests all seem fine > on the following three setups: > > Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 > Test suite looks fine > > Linux, python 2.5, Numeric 24.2 and numpy 1.0 > Fine, ignoring the Numeric eigenvalue problem in > test_SVDSuperimposer.py previously discussed > > Linux, python 2.3, numpy 1.1.1 [no Numeric] > Fine, after fixing some broken imports ... > > Note that testing where there is NO version of Numeric is important > (as in this third example), as if both numpy and Numeric are installed > currently most of the pure python modules will use Numeric by choice. Testing on some other machines, with Biopython compiled from source using CVS as of today: Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 Fine (including BioSQL). I can't remove Numeric on this machine due to other libraries still using it. Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 Fine (not testing BioSQL), except for a precision issue on test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this is probably due to Numeric vs numpy. Windows XP, python 2.3.5, numpy 1.0 [no Numeric] Fine (not testing BioSQL). Now that NumPy 1.2.0 has been released (announced on the numpy mailing list on 26 Sept, but their website still needs updating), we should make sure we test Biopython with that too. Bruce tried with 1.2rc2 earlier so we should be fine. Testing on a python 2.6 release candidate might be a good idea too... Peter From bugzilla-daemon at portal.open-bio.org Thu Oct 2 05:03:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 01:03:19 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020503.m9253JrJ031547@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #31 from drpatnaik at yahoo.com 2008-10-02 01:03 EST ------- > For any documentation I would want to recommend using the > win32api.GetShortPathName() function to avoid the spaces, with an example > showing how to do this for the database name(s). To me this seems much simpler > than the complex quoting solution. To me it seems win32api.GetShortPathName() will not work for database paths because the specified values are not really files (e.g., BLAST uses the /data/mouse.db value to look for /data/mouse.db.nin, etc.), and win32api.GetShortPathName works only on files. For BLAST's 'd' argument, to specify multiple databases, one uses the space separator, and double-quotes the entire argument value ("Db1 Db2"). If a database value has spaces within, one backslash-double-quotes that database value (\"Db 3\") and BLAST is supplied with "Db1 Db2 \"Db 3\"". The following BLAST 2.2.18 console command, using multiple databases with spaces in the pointers, e.g., work on Windows XP SP2: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\data\db 1\" \"C:\Documents and Settings\patnaik\My Documents\blast\data\db 2\"" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 05:16:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 01:16:36 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020516.m925Garn032423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #32 from drpatnaik at yahoo.com 2008-10-02 01:16 EST ------- (Foolow-up to comment #31) Some more working BLAST command (multiple databases) examples, on Windows XP: 1. No need to use the backslash as a directory-separator: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:/Documents and Settings/patnaik/My Documents/blast/data/db 1\" \"C:/Documents and Settings/patnaik/My Documents/blast/data/db 2\"" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 2. Multiple databases, with no spaces in the database pointers: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\DB1~1 C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\DB2~1 C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\db3" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 3. Multiple databases, with database pointers with and without spaces: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:/Documents and Settings/patnaik/My Documents/blast/data/db 1\" \"C:/Documents and Settings/patnaik/My Documents/blast/data/db 2\" C:\DOCUME~1\patnaik\MYDOCU~1\blast\data\db3" -i "C:/Documents and Settings/patnaik/My Documents/blast/data/My 30a.seq" -m 7 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 06:12:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 02:12:26 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020612.m926CPi4002947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #33 from drpatnaik at yahoo.com 2008-10-02 02:12 EST ------- Re: using subprocess.Popen with shell=True/False (comment #27), while 'shell=True' works on Mac OS X, and probably other Unix/like systems, one gets 'C:\Documents is not a recogniz...' type of error in Windows. With 'shell=False', also the deafult 'shell' value on Unix, Windows works but Mac OS X fails. So this should be cross-platform (works on Windows XP): my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=(True, False)[sys.platform == "win32"]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 09:41:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 05:41:45 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810020941.m929fjGp014191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-02 05:41 EST ------- (In reply to comment #31) > > For any documentation I would want to recommend using the > > win32api.GetShortPathName() function to avoid the spaces, > > with an example showing how to do this for the database > > name(s). To me this seems much simpler than the complex > > quoting solution. > > To me it seems win32api.GetShortPathName() will not work for > database paths because the specified values are not really > files (e.g., BLAST uses the /data/mouse.db value to look for > /data/mouse.db.nin, etc.), and win32api.GetShortPathName > works only on files. I believe win32api.GetShortPathName works on paths (directories) and files. But by the nature of the filing system, it can only work on existing files/directories - the short names cannot be calculated in advance. As you have found this means the function cannot be used on a database name (which is not a full filename). Thus any example in the documentation would have to use win32api.GetShortPathName on the folder and then add on the name. This alternative approach (from comment 24) would have to known about multiple extensions (nucleotide and protein databases differ): my_blast_db = win32api.GetShortPathName('C:/Documents and Settings/patnaik/My Documents/blast/bin/mine.nin')[:-4] > For BLAST's 'd' argument, to specify multiple databases, one uses the space > separator, and double-quotes the entire argument value ("Db1 Db2"). If a > database value has spaces within, one backslash-double-quotes that database > value (\"Db 3\") and BLAST is supplied with "Db1 Db2 \"Db 3\"". If we extend the Biopython BLAST API to require multiple databases as a list of strings this could be possible. Otherwise, how do we know if we are dealing with two databases (e.g. "Db1 Db2") or a single database whose name contains a space (e.g. "expressed genes")? We might also want to cope with the situation where the user has already pre-quoted their database string. (In reply to comment 33) > Re: using subprocess.Popen with shell=True/False (comment #27), > while 'shell=True' works on Mac OS X, and probably other Unix/like > systems, ... Probably, but we need to check this rather than assuming it. > my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE, > stdout=subprocess.PIPE, stderr=subprocess.PIPE, > shell=(True,False)[sys.platform == "win32"]) Using shell=(sys.platform<>"win32") would be much simpler ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 12:50:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 08:50:40 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810021250.m92Coeb4023844@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #35 from drpatnaik at yahoo.com 2008-10-02 08:50 EST ------- (In reply to comment #34) If the subprocess routine can be implemented there hopefully will not be any issue causes by spaces in path values for the BLAST executable or the input file. For the database values, there is no reason to change the API; the documentation can just state that double-quoting is needed if there are spaces in a database pointer, and that when multiple databases with at least one of the pointers having spaces are specified, then such pointers need to be additionally put inside escaped double-quotes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Oct 2 13:51:18 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 08:51:18 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E4D1D6.9030406@gmail.com> Peter wrote: > On Wed, Sep 24, 2008, Peter wrote: > >> Using CVS Biopython compiled from source, the unit tests all seem fine >> on the following three setups: >> >> Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 >> Test suite looks fine >> >> Linux, python 2.5, Numeric 24.2 and numpy 1.0 >> Fine, ignoring the Numeric eigenvalue problem in >> test_SVDSuperimposer.py previously discussed >> >> Linux, python 2.3, numpy 1.1.1 [no Numeric] >> Fine, after fixing some broken imports ... >> >> Note that testing where there is NO version of Numeric is important >> (as in this third example), as if both numpy and Numeric are installed >> currently most of the pure python modules will use Numeric by choice. >> > > Testing on some other machines, with Biopython compiled from source > using CVS as of today: > > Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 > Fine (including BioSQL). I can't remove Numeric on this machine due > to other libraries still using it. > > Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 > Fine (not testing BioSQL), except for a precision issue on > test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this > is probably due to Numeric vs numpy. > > Windows XP, python 2.3.5, numpy 1.0 [no Numeric] > Fine (not testing BioSQL). > > Now that NumPy 1.2.0 has been released (announced on the numpy mailing > list on 26 Sept, but their website still needs updating), we should > make sure we test Biopython with that too. Bruce tried with 1.2rc2 > earlier so we should be fine. > > Testing on a python 2.6 release candidate might be a good idea too... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi, Please note numpy 1.2 does require Python 2.4+ so if BioPython will still support Python 2.3+ then we need someone to test with numpy 1.1. Download numpy 1.2 from the links at: http://sourceforge.net/projects/numpy/ There are two Windows installation files for Python 2.4 and 2.5 attempts to install the appropriate binary for the processor type and instruction set (such as i.e. SSE). This avoids people installing the wrong version and associated 'bugs' and 'crashes' that may result. Also, it was noted by David on the numpy list "that updated packages for various linux distributions (Fedora, Centos/RHEL, OpenSuse) are available": http://download.opensuse.org/repositories/home:/ashigabou/ Regards Bruce From bsouthey at gmail.com Thu Oct 2 16:06:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 11:06:06 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E4F16E.1030900@gmail.com> Peter wrote: > On Wed, Sep 24, 2008, Peter wrote: > >> Using CVS Biopython compiled from source, the unit tests all seem fine >> on the following three setups: >> >> Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 >> Test suite looks fine >> >> Linux, python 2.5, Numeric 24.2 and numpy 1.0 >> Fine, ignoring the Numeric eigenvalue problem in >> test_SVDSuperimposer.py previously discussed >> >> Linux, python 2.3, numpy 1.1.1 [no Numeric] >> Fine, after fixing some broken imports ... >> >> Note that testing where there is NO version of Numeric is important >> (as in this third example), as if both numpy and Numeric are installed >> currently most of the pure python modules will use Numeric by choice. >> > > Testing on some other machines, with Biopython compiled from source > using CVS as of today: > > Linux (Ubuntu Dapper Drake), python 2.4.3, Numeric 24.2 and numpy 1.0.1 > Fine (including BioSQL). I can't remove Numeric on this machine due > to other libraries still using it. > > Windows XP, python 2.3.5, Numeric 23.1 and numpy 1.0 > Fine (not testing BioSQL), except for a precision issue on > test_ProtParam.py (0.563 verus 0.562) which I have not fixed as this > is probably due to Numeric vs numpy. > > Windows XP, python 2.3.5, numpy 1.0 [no Numeric] > Fine (not testing BioSQL). > > Now that NumPy 1.2.0 has been released (announced on the numpy mailing > list on 26 Sept, but their website still needs updating), we should > make sure we test Biopython with that too. Bruce tried with 1.2rc2 > earlier so we should be fine. > > Testing on a python 2.6 release candidate might be a good idea too... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Actually the 'final' Python 2.6 was released yesterday (October 1st, 2008)! Bruce From bugzilla-daemon at portal.open-bio.org Thu Oct 2 18:07:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 14:07:48 -0400 Subject: [Biopython-dev] [Bug 2604] New: test_Restriction failure with Python 2.6 (also cause error in test_CAPS) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Summary: test_Restriction failure with Python 2.6 (also cause error in test_CAPS) Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Running 'python setup.py test --no-gui' gives the following error for test_Restriction. This is the same line that cause test_CAPS to fail. Both test outputs below. This with Linux x86_64 with Python 2.6 compiled using gcc v4.3.2 ====================================================================== ERROR: test_Restriction ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_Restriction.py", line 8, in from Bio.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 2351, in newenz = T(k, bases, enzymedict[k]) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 217, in __init__ super(RestrictionType, cls).__init__(name, bases, dict) TypeError: descriptor '__init__' requires a 'type' object but received a 'str' ---------------------------------------------------------------------- ====================================================================== ERROR: test_CAPS ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_CAPS.py", line 3, in from Bio.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 2351, in newenz = T(k, bases, enzymedict[k]) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/Restriction/Restriction.py", line 217, in __init__ super(RestrictionType, cls).__init__(name, bases, dict) TypeError: descriptor '__init__' requires a 'type' object but received a 'str' ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 2 18:11:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 2 Oct 2008 14:11:08 -0400 Subject: [Biopython-dev] [Bug 2605] New: test_PDB failure with Python 2.6 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2605 Summary: test_PDB failure with Python 2.6 Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Running 'python setup.py test --no-gui' results in failure for test_PDB on linux x86_64 running Python 2.6 compiled with gcc 4.3.2 ====================================================================== ERROR: test_PDB ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_PDB.py", line 68, in run_test() File "test_PDB.py", line 22, in run_test structure=p.get_structure("example", "PDB/a_structure.pdb") File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 69, in get_structure self._parse(file.readlines()) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 89, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/PDBParser.py", line 186, in _parse_coordinates structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/StructureBuilder.py", line 224, in init_atom residue.add(atom) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.6/Bio/PDB/Residue.py", line 81, in add raise PDBConstructionException, "Atom %s defined twice in residue %s" % (atom_id, self) TypeError: exceptions must be classes or instances, not str ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Oct 2 18:23:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Oct 2008 13:23:05 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> Message-ID: <48E51189.5060603@gmail.com> Hi, I just built and installed Python 2.6 with gcc version 4.3.2. I then installed numpy 1.2 with it (so no Numeric). I did a cvs update on biopython and installed with Python 2.5.2 and Python 2.6. In both cases I noticed many gcc warnings 'differ in signedness' (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and also Bio/triemodule.c has a couple of other warnings. In both cases 'python setup.py test' opened graphical window - it did not do that before when I tested. What should the default be? All expected tests (ie I do not have biosql setup) passed with Python 2.5.2. With Python 2.6, running tests gave two warnings (should I file a bug report?): biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: DeprecationWarning: the sets module is deprecated from sets import Set biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message Also I got three errors with Python 2.6 (without the -3 flag as that provides warnings for Python 3) so I filed bug reports: test_CAPS test_PDB test_Restriction The failure for test_CAPS and and test_Restriction is due to the same line "Bio/Restriction/Restriction.py" (line 217). Apart from these everything else passed. Bruce From biopython at maubp.freeserve.co.uk Fri Oct 3 09:06:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 10:06:46 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <48E51189.5060603@gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> <48E51189.5060603@gmail.com> Message-ID: <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> > Hi, > I just built and installed Python 2.6 with gcc version 4.3.2. I then > installed numpy 1.2 with it (so no Numeric). I thought you might be the first to try Biopython with python 2.6 was I knew it was out. > I did a cvs update on biopython and installed with Python 2.5.2 and Python > 2.6. In both cases I noticed many gcc warnings 'differ in signedness' > (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and > also Bio/triemodule.c has a couple of other warnings. I've noticed differ in signedness warnings from trie before with an older gcc - we propably should fix these so please file a (low priority) bug for that. > In both cases 'python setup.py test' opened graphical window - it did not do > that before when I tested. What should the default be? python setup.py test --no-gui If the relevant GUI python framework isn't present, it defaults to no GUI. It has been suggested that we drop the GUI - as a relative new comer to Biopython what do you think? > All expected tests (ie I do not have biosql setup) passed with Python 2.5.2. Excellent. > With Python 2.6, running tests gave two warnings (should I file a bug > report?): > biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: > DeprecationWarning: the sets module is deprecated > from sets import Set As of python 2.4, set (note lower case) became a built in function (like list). As we still support Python 2.3, avoiding this deprecation would need something like: try : #This should work on python 2.4+ Set = set except NameError: from sets import Set #The remaining code can use Set as before... Or something similar if we switch all the calls to Set() to the newer set() instead. > biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: > DeprecationWarning: BaseException.message has been deprecated as of Python > 2.6 > self.message = message I'm not immediatley sure how to fix that, lets see if anyone on the list has a quick suggestion. > Also I got three errors with Python 2.6 (without the -3 flag as that > provides warnings for Python 3) so I filed bug reports: > test_CAPS > test_PDB > test_Restriction > > The failure for test_CAPS and and test_Restriction is due to the same line > "Bio/Restriction/Restriction.py" (line 217). OK - we'll have to look at those. > Apart from these everything else passed. Thanks. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 10:26:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 06:26:48 -0400 Subject: [Biopython-dev] [Bug 2605] test_PDB failure with Python 2.6 In-Reply-To: Message-ID: <200810031026.m93AQmRV007453@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2605 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 06:26 EST ------- I think this is related to the old style exceptions which were being used in Bio.PDB, see http://www.python.org/dev/peps/pep-0008/ > When raising an exception, use "raise ValueError('message')" instead of > the older form "raise ValueError, 'message'". > > The paren-using form is preferred because when the exception arguments > are long or include string formatting, you don't need to use line > continuation characters thanks to the containing parentheses. The older > form will be removed in Python 3000. It looks like the old form was removed in Python 2.6 (or I have mis-identified the problem). I've switched all the exception raises in Bio.PDB in CVS, and made the exceptions into proper classes, which I hope will address this bug under python 2.6. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 12:27:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 08:27:29 -0400 Subject: [Biopython-dev] [Bug 2605] test_PDB failure with Python 2.6 In-Reply-To: Message-ID: <200810031227.m93CRT0o014499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2605 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 08:27 EST ------- I've now tested this on Linux with python 2.6 and numpy 1.2 and this has indeed fixed the test failure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 3 12:44:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 13:44:25 +0100 Subject: [Biopython-dev] Python 2.6 Message-ID: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> I've now got Python 2.6, numpy 1.2 and Biopython CVS installed on a linux machine and can confirm Bruce's observations. I haven't yet installed MySQLdb in order to verify BioSQL is still fine with Python 2.6. I have fixed the exception problem with test_PDB.py on python 2.6. >> With Python 2.6, running tests gave two warnings (should I file a bug >> report?): >> biopython/build/lib.linux-x86_64-2.6/Bio/Data/CodonTable.py:580: >> DeprecationWarning: the sets module is deprecated >> from sets import Set > > As of python 2.4, set (note lower case) became a built in function > (like list). As we still support Python 2.3, avoiding this > deprecation would need something like: > > try : > #This should work on python 2.4+ > Set = set > except NameError: > from sets import Set > #The remaining code can use Set as before... > > Or something similar if we switch all the calls to Set() to the newer > set() instead. I could test this change, or work on variant using set() by default. Does anyone have a preference? >> biopython/build/lib.linux-x86_64-2.6/Bio/Crystal/__init__.py:42: >> DeprecationWarning: BaseException.message has been deprecated as of Python >> 2.6 >> self.message = message > > I'm not immediately sure how to fix that, lets see if anyone on the > list has a quick suggestion. This is probably also due to an exception class change in python 2.6, similar to that which broke test_PDB (bug 2605). >> Also I got three errors with Python 2.6 (without the -3 flag as that >> provides warnings for Python 3) so I filed bug reports: >> test_CAPS >> test_PDB >> test_Restriction >> >> The failure for test_CAPS and and test_Restriction is due to the same line >> "Bio/Restriction/Restriction.py" (line 217). Bug 2604, http://bugzilla.open-bio.org/show_bug.cgi?id=2604 This seems to be due to the python 2.6 changes to the python built in super. I've tried emailing Fr?d?ric Sohm who wrote this code, but I'm not sure if the email address I used is still valid. Bug 2605, test_PDB failure is now fixed (it was using old style exceptions) http://bugzilla.open-bio.org/show_bug.cgi?id=2605 Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 14:00:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:00:42 -0400 Subject: [Biopython-dev] [Bug 2607] New: Gcc "differ in signedness" warning with cstringfnsmodule.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2607 Summary: Gcc "differ in signedness" warning with cstringfnsmodule.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc version 4.3.2 gives the "differ in signedness" warning below when building Biopython. While my C is not very good, changing line 34 from 'unsigned char' to just 'char' removed the warnings. Bio/cstringfnsmodule.c: In function ???cstringfns_splitany???: Bio/cstringfnsmodule.c:34: warning: pointer targets in initialization differ in signedness Bio/cstringfnsmodule.c:71: warning: pointer targets in passing argument 1 of ???PyString_FromStringAndSize??? differ in signedness Bio/cstringfnsmodule.c:85: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/cstringfnsmodule.c:87: warning: pointer targets in passing argument 1 of ???PyString_FromStringAndSize??? differ in signedness -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 14:08:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:08:37 -0400 Subject: [Biopython-dev] [Bug 2608] New: Gcc "differ in signedness" warnings with trie.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2608 Summary: Gcc "differ in signedness" warnings with trie.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc 4.3.2 provides multiple "differ in signedness" warning with trie.c as given below. This may be related to multiple declarations of 'unsigned char' instead of 'char'. Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???__builtin_strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???__builtin_strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???__builtin_strncat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???__builtin_strncat??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???__builtin_strcmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???__builtin_strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???__builtin_strncat??? differ in signedness Bio/trie.c: In function ???_serialize_transition???: Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???__strdup??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 14:15:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:15:12 -0400 Subject: [Biopython-dev] [Bug 2609] New: Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2609 Summary: Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Gcc 4.3.2 gives an 'initialization from incompatible pointer type' warning with triemodule.c. Bio/triemodule.c:389: warning: initialization from incompatible pointer type Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:488: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 14:35:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 10:35:44 -0400 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: Message-ID: <200810031435.m93EZi2j022259@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2608 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 10:35 EST ------- Interestingly looking at the CVS history, in Bio/triemodule.c revision 1.5 it looks like we used to have lots of "char *" casts/variables which were changed to "unsigned char *" to solve complaints from the SGI cc compiler (the comment doesn't say if these were warnings or errors). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/triemodule.c?cvsroot=biopython We should probably be using whatever PyString_AS_STRING, PyExc_KeyError, PyString_FromString etc use. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Oct 3 15:01:42 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 03 Oct 2008 10:01:42 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <320fb6e00810011428y11473535v88c3f7fdfa52a4bf@mail.gmail.com> <48E51189.5060603@gmail.com> <320fb6e00810030206q6a33cd12q49b80d19f9ce33a0@mail.gmail.com> Message-ID: <48E633D6.9090100@gmail.com> Peter wrote: >> Hi, >> I just built and installed Python 2.6 with gcc version 4.3.2. I then >> installed numpy 1.2 with it (so no Numeric). >> > > I thought you might be the first to try Biopython with python 2.6 was > I knew it was out. > > >> I did a cvs update on biopython and installed with Python 2.5.2 and Python >> 2.6. In both cases I noticed many gcc warnings 'differ in signedness' >> (should I file a bug report?) in Bio/cstringfnsmodule.c and Bio/trie.c and >> also Bio/triemodule.c has a couple of other warnings. >> > > I've noticed differ in signedness warnings from trie before with an > older gcc - we propably should fix these so please file a (low > priority) bug for that. > Filed bug reports for these and I think that the Bio/cstringfnsmodule.c and Bio/trie.c are related to the declaration of 'unsigned char' . I change this to 'char' in cstringfnsmodule.c and the warning goes away. However, that probably is not be the best thing to do without checking that reason for using 'unsigned char' in the code (may be essential to maintain sign). There are some interesting comments on the usage of strlen() warnings such Linus Torvalds: "..and my argument is that a warning which doesn't allow you to call "strlen()" on a "unsigned char" array without triggering is a bogus warning, and must be removed." So perhaps these can be ignored. Regards Bruce From bugzilla-daemon at portal.open-bio.org Fri Oct 3 15:54:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 11:54:19 -0400 Subject: [Biopython-dev] [Bug 2611] New: Message corrections when tests are skipped Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2611 Summary: Message corrections when tests are skipped Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com With the latest cvs version and Python 2.6 there are some corrections needed to the message output. I also think these should be consistent. A few messages include a ')' but have not opening '(': test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics). test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics). test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics). I think these tests should have similar message to the previous ones: test_PopGen_FDist ... skipping. Fdist not found (not a problem if you do not intend to use it). test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if you do not intend to use it). Perhaps: test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen_FDist. test_PopGen.SimCoal ... skipping. Install SimCoal if you want to use Bio.PopGen.SimCoal. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 3 16:35:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 12:35:19 -0400 Subject: [Biopython-dev] [Bug 2611] Message corrections when tests are skipped In-Reply-To: Message-ID: <200810031635.m93GZJ1v031955@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2611 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-03 12:35 EST ------- Fixed extra bracket in: test_GraphicsChromosome.py revision 1.4 test_GraphicsDistribution.py revision 1.3 test_GraphicsGeneral.py revision 1.3 Standardised MissingExternalDependencyError wording in: test_PopGen_SimCoal.py revision 1.2 test_PopGen_FDist.py revision 1.6 and also requires_wise.py revision 1.5 (used by test_wise.py) Thanks for your attention to detail here Bruce. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 3 16:52:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 3 Oct 2008 17:52:34 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> Message-ID: <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> One of the python 2.6 issues Bruce flagged up was the deprecation of the Sets module. Based on a quick grep, this affects several modules: Seq.py - used in the self test only, which could be removed Align/AlignInfo.py AlignIO/__init__.py - used in the self test only, which could be removed AlignIO/PhylipIO.py Data/CodonTable.py Nexus/Nexus.py Nexus/Trees.py Restriction/Restriction.py SeqIO/__init__.py- used in the self test only, which could be removed SeqIO/PhylipIO.py Most of these do either "from sets import Set" or "import sets". On balance I think it would make sense to convert all these to use the new built in "set(...)" instead of "Set(...)", with a fall back for python 2.3 like this: #TODO - Remove this work around once we drop python 2.3 support try: #Check the built in set function is present (python 2.4+) set = set except NameError: #For python 2.3 fall back on the sets module (deprecated in python 2.6) from sets import Set as set and replace all use of Set(...) with set(...) in the main code. See also http://www.python.org/dev/peps/pep-0218/ Of course, dropping support for python 2.3 as part of supporting 2.6 isn't out of the question, and would make dealing with the set/Set issue much simpler. Personally I still use python 2.3 on Windows XP because I have the build environment all setup using MSVC 6.0, and switching python versions would require me to setup a whole new compiler suite. I'd rather not drop support for python 2.3 in Biopython unless/until I've got a Windows machine setup with a working python compatible compiler. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 3 17:37:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 3 Oct 2008 13:37:23 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810031737.m93HbNsW003167@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #1 from bsouthey at gmail.com 2008-10-03 13:37 EST ------- I should add that this error is new in Python 2.6, see 'Porting to Python 2.6' section of http://docs.python.org/whatsnew/2.6.html " object.__init__() previously accepted arbitrary arguments and keyword arguments, ignoring them. In Python 2.6, this is no longer allowed and will result in a TypeError. This will affect __init__() methods that end up calling the corresponding method on object (perhaps through using super()). See issue 1683368 for discussion. " -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalke at dalkescientific.com Sat Oct 4 22:09:35 2008 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 5 Oct 2008 00:09:35 +0200 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: <200810031435.m93EZi2j022259@portal.open-bio.org> References: <200810031435.m93EZi2j022259@portal.open-bio.org> Message-ID: <82FA5FF2-576B-43E8-9628-3D740989113F@dalkescientific.com> > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk > 2008-10-03 10:35 EST ------- > Interestingly looking at the CVS history, in Bio/triemodule.c > revision 1.5 it > looks like we used to have lots of "char *" casts/variables which > were changed > to "unsigned char *" to solve complaints from the SGI cc compiler > (the comment > doesn't say if these were warnings or errors). Those were almost certainly warnings. The "char" type on IRIX is unsigned. I once tracked down a bug in some code which used a char field to store formal charges. On IRIX the charges were 0, +1, +2, +254 and +255. :) Andrew dalke at dalkescientific.com From mjldehoon at yahoo.com Sun Oct 5 02:07:53 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 4 Oct 2008 19:07:53 -0700 (PDT) Subject: [Biopython-dev] Bio.MarkovModel Message-ID: <217322.81025.qm@web62402.mail.re1.yahoo.com> Hi everybody, When I was looking at the NumPy-dependent modules, I got the impression that Bio.MarkovModel can be simplified now that it's using the new NumPy. As far as I can tell, there is no documentation for Bio.MarkovModel, and the code seems to have some (trivial) bugs that (I think) would be noticed if anybody is actively using Bio.MarkovModel. So I am wondering 1) Has anybody looked at Bio.MarkovModel in detail? 2) If not, should this module be kept? On the one hand, Markov models are a core part of computational biology and as such are an appropriate module for Biopython. On the other hand, the code is useful only if people are actually using it. Another option is to have MarkovModel.py as a stand-alone example script of Python in computational biology instead of a full-blown module. --Michiel. From biopython at maubp.freeserve.co.uk Sun Oct 5 11:44:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 5 Oct 2008 12:44:02 +0100 Subject: [Biopython-dev] Bio.MarkovModel In-Reply-To: <217322.81025.qm@web62402.mail.re1.yahoo.com> References: <217322.81025.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> On Sun, Oct 5, 2008 at 3:07 AM, Michiel de Hoon wrote: > Hi everybody, > > When I was looking at the NumPy-dependent modules, I got the impression > that Bio.MarkovModel can be simplified now that it's using the new NumPy. That's good, but of limited benefit in itself. > As far as I can tell, there is no documentation for Bio.MarkovModel, and There isn't even a copyright statement - but there are at least docstrings, which is something. Looking at the CVS log, Jeff Chang checked this in originally, so either he wrote it or he should at least know who did. > the code seems to have some (trivial) bugs that (I think) would be noticed > if anybody is actively using Bio.MarkovModel. So I am wondering > > 1) Has anybody looked at Bio.MarkovModel in detail? Not personally. > 2) If not, should this module be kept? I would say yes. > On the one hand, Markov models are a core part of computational > biology and as such are an appropriate module for Biopython. On the > other hand, the code is useful only if people are actually using it. > Another option is to have MarkovModel.py as a stand-alone example > script of Python in computational biology instead of a full-blown module. As you say, Markov models are an important tool in computational biology, so having some useful code to work with them in Biopython is a good thing. To me, having this remain as a "top level" module in Biopython would give it higher status and visibility than hiding it away in the example scripts. If you can see a few little things that need fixing, then making those improvements would be worthwhile. If you don't really have the time to deal with them now, even just filing bugs would be worth doing. If anyone is actively using the module, then contributing something for the tutorial would be very welcome. If you don't know LaTeX (used for the typesetting), then just plain text is fine - I'm happy to deal with the formatting. Peter From mjldehoon at yahoo.com Mon Oct 6 10:13:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 6 Oct 2008 03:13:18 -0700 (PDT) Subject: [Biopython-dev] Bio.MarkovModel; Bio.Popgen, Bio.PDB documentation In-Reply-To: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> Message-ID: <163677.27280.qm@web62408.mail.re1.yahoo.com> > > When I was looking at the NumPy-dependent modules, I > > got the impression that Bio.MarkovModel can be > > simplified now that it's using the new NumPy. > > That's good, but of limited benefit in itself. Well, currently Bio.MarkovModel uses a C extension module Bio.cMarkovModel. If we can achieve the same speed or better by making use of NumPy, then we won't need this C extension module and we can simplify Biopython. > > 2) If not, should this module be kept? > > I would say yes. > ... > To me, having this remain as a "top level" module in > Biopython would give it higher status and visibility than > hiding it away in the example scripts. OK, let's keep it as a module then. We now have several small modules related to supervised learning as separate Bio.s (LogisticRegression, MaxEntropy, kNN, NaiveBayes, and arguably MarkovModel), which to me looks a bit messy. It may be a good idea to collect these in one Bio.Supervised, though this is not urgent. I'd be happy to set up a new chapter in the tutorial about these supervised learning modules (I wrote a section a long time ago for the cookbook about logistic regression). While we're on the subject, I think that the Bio.PopGen and Bio.PDB sections of the cookbook chapter in the tutorial should be promoted to separate chapters in the tutorial, since these modules are fairly big and have a good documentation. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 10:24:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 06:24:57 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061024.m96AOuhV000861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 06:24 EST ------- I've contacted Fr??d??ric Sohm by email, and this is his suggested fix for the super issue: -------------------------------------- I replaced line 221 : super(RestrictionType, cls).__init__(name, bases, dict) #dict was an error for dct by the way By : if sys.version < '2.6' : # sys is imported at the beginning to check # for set anyway. super(RestrictionType,cls).__init__(name, bases, dct) else : super(RestrictionType,cls).__init__(cls, name, bases, dct) # cls is the equivalent of self there. It's different to mark the fact # that the class is a metaclass not a normal python class. This should support both 2.6 and 2.3; The biopython test is now working with 2.6 (I did not try with 2.3 but this should not have changed anything for this version). I have not much time for testing it thoroughly right now, sorry. -------------------------------------- End quote. Interestingly using the following, test_Restriction.py works on python 2.4: super(RestrictionType,cls).__init__(cls, name, bases, dct) This is consistent with the documentation Bruce found about arbitrary arguments and keyword arguments being accepted and ignored before python 2.6. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Oct 6 10:33:09 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 11:33:09 +0100 Subject: [Biopython-dev] Bio.MarkovModel; Bio.Popgen, Bio.PDB documentation In-Reply-To: <163677.27280.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00810050444k4a86e91cq602e3dd1c89b864e@mail.gmail.com> <163677.27280.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00810060333nc4c8840xab35976e4aaff447@mail.gmail.com> On Mon, Oct 6, 2008 at 11:13 AM, Michiel de Hoon wrote: >> > When I was looking at the NumPy-dependent modules, I >> > got the impression that Bio.MarkovModel can be >> > simplified now that it's using the new NumPy. >> >> That's good, but of limited benefit in itself. > > Well, currently Bio.MarkovModel uses a C extension module > Bio.cMarkovModel. If we can achieve the same speed or better > by making use of NumPy, then we won't need this C extension > module and we can simplify Biopython. I'd missed the C extension module - yes, if we can drop that by making more use of numpy this does sound worth while. >> > 2) If not, should this module be kept? >> >> I would say yes. >> ... >> To me, having this remain as a "top level" module in >> Biopython would give it higher status and visibility than >> hiding it away in the example scripts. > > OK, let's keep it as a module then. We now have several small > modules related to supervised learning as separate Bio.s > (LogisticRegression, MaxEntropy, kNN, NaiveBayes, and arguably > MarkovModel), which to me looks a bit messy. It may be a good > idea to collect these in one Bio.Supervised, though this is not urgent. > > I'd be happy to set up a new chapter in the tutorial about these > supervised learning modules (I wrote a section a long time ago > for the cookbook about logistic regression). http://biopython.org/DIST/docs/cookbook/LogisticRegression.html Using that as a basic for a whole chapter sounds excellent. > While we're on the subject, I think that the Bio.PopGen and > Bio.PDB sections of the cookbook chapter in the tutorial should > be promoted to separate chapters in the tutorial, since these > modules are fairly big and have a good documentation. Bio.PDB also has a whole separate document, but I am not sure off hand how much this overlaps. http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf I agree that those two sections could be promoted to chapters. Would we want to stick with a global authorship? If so Tiago should be listed for the PopGen chapter. Alternatively, we could list authors for each chapter (which will take a little leg work up front) and a few "editors" (which may well change over time). Peter From biopython at maubp.freeserve.co.uk Mon Oct 6 11:33:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 12:33:44 +0100 Subject: [Biopython-dev] Fwd: Biopython - Bio.Restriction problem with super on python 2.6 In-Reply-To: <320fb6e00810060309s449be9er67178dd184b789d5@mail.gmail.com> References: <320fb6e00810030406n67e7254ao1bdcbeebdd0b981@mail.gmail.com> <48E620CB.6090108@inaf.cnrs-gif.fr> <320fb6e00810030708t3dfa51an44410c0faaee0e77@mail.gmail.com> <48E9CA47.1000502@inaf.cnrs-gif.fr> <320fb6e00810060309s449be9er67178dd184b789d5@mail.gmail.com> Message-ID: <320fb6e00810060433td9cab92ie781a9cceaf9e8dd@mail.gmail.com> This is a forwarded email from Fr?d?ric Sohm about the python 2.6 super issue in Bio.Restriction (Bug 2604), with my replies included. See also http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Peter ---------- Forwarded message ---------- From: Peter Date: Mon, Oct 6, 2008 at 11:09 AM Subject: Re: Biopython - Bio.Restriction problem with super on python 2.6 To: Frederic Sohm On Mon, Oct 6, 2008 at 9:20 AM, Fr?d?ric Sohm wrote: > Hi Peter, > > I do not have access to the mailing list (I suspect my e-mail address > changed since my inscription to the biopython and biopython-dev mailing list > (from ... to ...) You can sign up again if you like, http://biopython.org/wiki/Main_Page Do you mind if I forward this to the mailling list (I'll remove your email addresses if you are worried about spam). > Concerning the sets problem : > I prefer the following solution rather than change all Set occurences : > > Replacing the import : > > from sets import Set > > by : > > import sys > if sys.version < '2.6' : > from sets import Set > else : > Set = set > > This should maintain backward compatibility with python 2.3 as you requested > on the mailing list and avoid to have to change too much code, on the other > hand its not as clean as changing Set occurences. Either would work - I don't really mind. I think we'll have to change all the Set occurances one day, so we might as well do it now. > Concerning the Restriction module, it was easiest than I thought it would be: > > I replaced line 221 : > super(RestrictionType, cls).__init__(name, bases, dict) > > #dict was an error for dct by the way I had wondered about dict/dct, so its good to have that confirmed. > By : > > if sys.version < '2.6' : # sys is imported at the beginning to check > # for set anyway. > super(RestrictionType,cls).__init__(name, bases, dct) > else : > super(RestrictionType,cls).__init__(cls, name, bases, dct) > > # cls is the equivalent of self there. It's different to mark the fact > # that the class is a metaclass not a normal python class. > > This should support both 2.6 and 2.3; The biopython test is now working with > 2.6 (I did not try with 2.3 but this should not have changed anything for > this version). I have not much time for testing it thoroughly right now, > sorry. I'll be able to check the unit test passes of a few different versions of python. > I attached the 2 files I modified : Restriction (set and super) and > CodonTable (set). Could you please take care of the uploading as I have no > access to it. Yes, of course. > ===================================================================== > To explain a bit what happen there : > I had problems of inheritance when I tried to build this module : > Restriction enzymes are defined by a serie of site characteristics and ways > to cut the DNA (blunt/3' overhang/5' overhang, one/two cut(s), > inside/outside the recognised sequence,...). > This implied I could not find a way to write a generic enzyme classes with > standard methods without being confronted to inheritance and mro (method > resolution order) problems when instantiating the final class. > > One way out would have been to check each single class instance for all the > characteristic with series of if/else in every method over and over. But > this would have been tedious to write, slow and not very much in the spirit > of an object-oriented programming language. > Moreover I was curious to see how metaclass worked. > So I used metaclass to build the class for the enzyme. > > That way each single enzyme is its own class, which is put together from a > serie of basic class. These classes are combined to build a metaclass for > each enzyme. By putting the class together that way I managed to overcome > the method resolution order problems (diamond rule). > > The main drawback is that Restriction uses directly classes to do the work > instead of "normal" python class instances. > Some magic is then necessary to initiate the classes and create the class > instances (hence the use of super and the magic at the end of the > Restriction.py module - from "for TYPE, (bases, enzymes) ..." onward). > > I am not certain this way is the recommended way to do it, but on the other > hand, it's working, fast enough and it was fun to write so... > ===================================================================== Great - thanks, Peter From biopython at maubp.freeserve.co.uk Mon Oct 6 13:22:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 14:22:43 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> Message-ID: <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> Peter wrote: > One of the python 2.6 issues Bruce flagged up was the deprecation of > the Sets module. Based on a quick grep, this affects several modules: > ... > On balance I think it would make sense to convert all these to use the > new built in "set(...)" instead of "Set(...)", with a fall back for python 2.3 Seq.py - fixed in CVS Align/AlignInfo.py - complicated by the lack of a union_update method for the built in set class, but fixed in CVS. AlignIO/__init__.py - removed unused import in CVS AlignIO/PhylipIO.py - fixed in CVS Data/CodonTable.py - fixed in CVS Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit test output will currently say Set or set depending on the version of python. A further minor change to test_Nexus.py would be needed to cope with this. Restriction/Restriction.py - this subclasses the Set object, so needs a little more checking. SeqIO/__init__.py - fixed in CVS SeqIO/PhylipIO.py - was deprecated, now removed Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 6 15:17:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 11:17:16 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061517.m96FHGBU018020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #3 from bsouthey at gmail.com 2008-10-06 11:17 EST ------- (In reply to comment #2) I changed the line: super(RestrictionType,cls).__init__(name, bases, dct) to super(RestrictionType,cls).__init__(cls, name, bases, dct) All the tests for Restriction passed for all my Python versions 2.3, 2.4, 2.5 and 2.6. So it appears that there no need to check the Python version - of course this needs at least a verification under Windows. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 15:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 11:40:23 -0400 Subject: [Biopython-dev] [Bug 2613] New: test_Wise and test_psw fail under Python 2.3 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2613 Summary: test_Wise and test_psw fail under Python 2.3 Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Under Python 2.3.7 on Linux x86_64 system, gcc 4.3.2 and numpy 1.1.1, both test_Wise and test_psw fail. The output is ====================================================================== FAIL: test_Wise ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 189, in runSafeTest expected_handle) File "run_tests.py", line 288, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'doctest of Bio.Wise._build_align_cmdline ... ok\n' Expected: 'Doctest: Bio.Wise._build_align_cmdline ... ok\n' ====================================================================== FAIL: test_psw ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 189, in runSafeTest expected_handle) File "run_tests.py", line 288, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'doctest of Bio.Wise.psw.parse_line ... ok\n' Expected: 'Doctest: Bio.Wise.psw.parse_line ... ok\n' ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 16:22:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 12:22:45 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810061622.m96GMjgk021883@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 12:22 EST ------- I think that's an annoying variation in doctest itself - we might need to add some magic to the test framework to cope with this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 16:30:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 12:30:20 -0400 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200810061630.m96GUKxx022407@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 12:30 EST ------- Since it seems to work for older versions of python too, I've checked in the one line "super" change. See Bio/Restriction/Restriction.py revision 1.7 in CVS. Fr??d??ric said to me by email that he will try and look into this further, so I'm leaving this bug open for now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From zac at zacbrown.org Mon Oct 6 16:35:30 2008 From: zac at zacbrown.org (Zac Brown) Date: Mon, 06 Oct 2008 12:35:30 -0400 Subject: [Biopython-dev] taxonomic labels Message-ID: <48EA3E52.3070704@zacbrown.org> Hi all, Just a quick question with regard to using the Entrez module. I am looking for a way to get a dictionary for an organism's taxonomy, that is something like: blah = {'domain':'xyz','family':'xyz','class':'xyz'...} and so on. Is there some uniform way to generate this type of information? Thanks, Zac From biopython at maubp.freeserve.co.uk Mon Oct 6 17:10:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 18:10:52 +0100 Subject: [Biopython-dev] taxonomic labels In-Reply-To: <48EA3E52.3070704@zacbrown.org> References: <48EA3E52.3070704@zacbrown.org> Message-ID: <320fb6e00810061010u490b9257x4f9a908917504d90@mail.gmail.com> On Mon, Oct 6, 2008 at 5:35 PM, Zac Brown wrote: > Hi all, > > Just a quick question with regard to using the Entrez module. I am looking > for a way to get a dictionary for an organism's taxonomy, that is something > like: > > blah = {'domain':'xyz','family':'xyz','class':'xyz'...} and so on. Is there > some uniform way to generate this type of information? > > Thanks, > > Zac This isn't really a question for the dev-mailing list, the general discussion list would be better. Anyway, have you looked at the taxonomy lineage entries? from Bio import Entrez ncbi_taxon_id = "9606" handle = Entrez.efetch(db="taxonomy",id=ncbi_taxon_id,retmode="XML") records = Entrez.read(handle) assert len(records)==1 lineage = records[0]["LineageEx"] print lineage This should contain the information you want, but there are a number of "no rank" entries. To turn it into a dictionary as requested, try something like the following (on python 2.4 or later): answer =dict((x["Rank"],x["ScientificName"]) for x in lineage if x["Rank"] <> "no rank") print answer Peter From fkauff at biologie.uni-kl.de Mon Oct 6 17:15:56 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Mon, 06 Oct 2008 19:15:56 +0200 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> Message-ID: <48EA47CC.5070202@biologie.uni-kl.de> > Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit > test output will currently say Set or set depending on the version of > python. A further minor change to test_Nexus.py would be needed to > cope with this. > > Nexus.py and Trees.py fixed in cvs (together with some other changes). test_Nexus.py has been changed by removing the troublesome output. I assume when printing the elements of a set, their order is undefined, and so such an output should not be part of a test because it could potentially fail. Frank From biopython at maubp.freeserve.co.uk Mon Oct 6 17:36:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Oct 2008 18:36:53 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <48EA47CC.5070202@biologie.uni-kl.de> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> Message-ID: <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> Frank wrote: > >Peter wrote: >> Nexus/Nexus.py and Nexus/Trees.py - with my suggested fix, the unit >> test output will currently say Set or set depending on the version of >> python. A further minor change to test_Nexus.py would be needed to >> cope with this. > > Nexus.py and Trees.py fixed in cvs (together with some other changes). Great. > test_Nexus.py has been changed by removing the troublesome output. I assume > when printing the elements of a set, their order is undefined, and so such > an output should not be part of a test because it could potentially fail. Yes, in theory we cannot expect the order of the elements in a set to be consistent - so this looks like a simple solution :) I'll rerun the test suite tomorrow on Python 2.6, but apart from Bio.Restriction I think we are OK on the the set/Set issue. There's a complex __init__ / super issue in Bio.Restriction on Bug 2604 which may be solved (Eric is hoping to investigate further time permitting). Any additional eyes on this couldn't hurt. See http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Are there any other python 2.6 issues? Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 6 22:32:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:32:57 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200810062232.m96MWv4T017893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 18:32 EST ------- Could you try out the Bio/Seq.py revision 1.35 from CVS in which the Seq object now has a find() method which acts like that of a python string (plus strip and split - see Bug 2596). Comments/revisions/improvments/objections here or on the mailing list please. We can also discuss additional behaviour, either as additional Seq methods (e.g. search? finditer?) or perhaps via additional arguments to find(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 22:36:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:36:08 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200810062236.m96Ma8Ax018176@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-06 18:36 EST ------- Checked in a variant of this code but with alphabet checking and additions to test_seq.py as well (plus a provisional .find() method - see Bug 2601). CVS changes: Bio/Seq.py revision 1.35 Tests/test_seq.py revision 1.18 Tests/output/test_seq revision 1.15 I'm marking this bug as fixed, but feel free to add any comments/revisions/improvments/objections here or on the mailing list please. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 22:36:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 18:36:10 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200810062236.m96MaAnb018189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Bug 2351 depends on bug 2596, which changed state. Bug 2596 Summary: Add string like split, strip, rstrip and lstrip methods to the Seq object http://bugzilla.open-bio.org/show_bug.cgi?id=2596 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 6 23:45:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 6 Oct 2008 19:45:05 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810062345.m96Nj5Ui021964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-10-06 19:45 EST ------- If you look at test_psw and test_wise, you'll see that these make use of Python's generic test framework, with asserts in the test code. Instead, Biopython's testing framework expects each test code to print out stuff, which then gets matched to an output file. Sometimes it makes more sense to use Python's testing framework directly; there are several more tests for which the output file required by Biopython does not contain useful information (output/test_Cluster is another example). In such cases, I suggest we stop requiring the output file and simply rely on Python's testing framework directly. This will solve the issue with test_Wise and test_psw, and will let us get rid of unnecessary output files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 08:03:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 04:03:53 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200810070803.m9783rxd015866@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from fkauff at biologie.uni-kl.de 2008-10-07 04:03 EST ------- Nexus.Trees has been extended to deal with internal node names, or "special comments" in the format [& blablalba]. Such comments comments can appear directly after the taxon label, after the closing parentheses, or between branchlength / support values attached to a node or a taxon labels, such as (a,(b,(c,d)[&hi there])) (a,(b[&hi there],c)) (a,(b:0.123[&hi there],c[&heyho]:0.3)) (a,(b,c)0.4[&comment]:0.95) The comments are stored without change in the corresponding node object and can be accessed like >>> t=Trees.Tree('(a,(b:0.123[&hi there],c[&heyho]:0.3))') >>> print t.node(3).data.comment [&hi there] >>> print t.node(4).data.comment [&heyho] >>> The comments are not parsed in any way - internal labels vary greatly in syntax, and are used to store all kinds of information. But at least they are now read and stored, and users can deal with them the way they like. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 17:07:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 13:07:33 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810071707.m97H7XhN015588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-07 13:07 EST ------- (In reply to comment #2) > If you look at test_psw and test_wise, you'll see that these make use of > Python's generic test framework, with asserts in the test code. Instead, > Biopython's testing framework expects each test code to print out stuff, > which then gets matched to an output file. Sometimes it makes more sense > to use Python's testing framework directly; there are several more tests > for which the output file required by Biopython does not contain useful > information (output/test_Cluster is another example). In such cases, I > suggest we stop requiring the output file and simply rely on Python's > testing framework directly. This will solve the issue with test_Wise and > test_psw, and will let us get rid of unnecessary output files. So if there is an expected output file, then run_tests.py will continue to do the comparison as now. However, if there is no output file it will instead just run the code - which presumably will throw an exception if something is wrong (even just an assert statement)? I haven't looked at run_tests.py to see how easy such a change would be, but in principle it sounds fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 7 23:17:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 7 Oct 2008 19:17:50 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810072317.m97NHo5v024624@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-10-07 19:17 EST ------- (In reply to comment #3) > So if there is an expected output file, then run_tests.py will continue to do > the comparison as now. However, if there is no output file it will instead > just run the code - which presumably will throw an exception if something is > wrong (even just an assert statement)? > A safer approach might be to check if the test generates any output, since tests that use an output file now print the name of the test first. Another approach is to do the output comparison inside of each test script that produces output instead of in run_tests.py. Basically, this means that the compare_output function in run_tests.py should be moved to a separate script, which gets imported by each test script that wants to use compare_output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:23:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:23:21 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200810081523.m98FNLYW026623@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:23 EST ------- Bug 2530 and Bug 2457 are fixed in CVS as of Bio/Seq.py revision 1.37 (with the unit test updated in test_seq.py revision 1.20). Old behaviour (e.g. Biopython 1.48): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> TranslationError (Bug 2547) translate("NNN") -> TranslationError (Bug 2547) translate("TA?") -> "*" (Bug 2530) New behaviour (CVS as things stand): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> "X" translate("NNN") -> "X" translate("TA?") -> TranslationError Note that this new behaviour (translation of ambiguous possible stop codons) could be made optional for backwards compatibility, but I would be surprised if anyone would want the old behaviour. Also, we could make the possible stop character an optional argument, but that then brings up questions about how to represent this in the Alphabet objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:25:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:25:07 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200810081525.m98FP7cV026776@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:25 EST ------- (In reply to comment #13) > If there is agreement that changing the behaviour of Bio.Seq.translate() as > described in Bug 2547 is desirable, then we end up fixing both issues at the > same time. I think an agreement was reached. Bug 2530 and Bug 2457 are fixed in CVS as of Bio/Seq.py revision 1.37 (with the unit test updated in test_seq.py revision 1.20). Old behaviour (e.g. Biopython 1.48): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> TranslationError (Bug 2547) translate("NNN") -> TranslationError (Bug 2547) translate("TA?") -> "*" (Bug 2530) New behaviour (CVS as things stand): translate("TAT") -> "Y" translate("TAG") -> "*" translate("TAR") -> "*" translate("TAN") -> "X" translate("NNN") -> "X" translate("TA?") -> TranslationError I dare say the implementation might be improved or optimised, but I think this is a good improvement for the functionality. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:35:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:35:39 -0400 Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py In-Reply-To: Message-ID: <200810081535.m98FZdsd027605@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2583 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:35 EST ------- As per comment 2, I'm assuming this is a duplicate of the previously reported issue (for which no bug was filed). Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:37:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:37:11 -0400 Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace os.popen3 with subprocess.Popen In-Reply-To: Message-ID: <200810081537.m98FbBPr027708@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2528 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:37 EST ------- See also Bug 2480 which suggests using the subprocess module to deal with Windows only issues with spaces in filenames. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:48:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:48:18 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810081548.m98FmIDe028487@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:48 EST ------- I believe the only remaining issue on this bug is improving the failure/skip message from the BioSQL tests. On a CVS checkout, this can vary depending on if MySQLdb is installed or not. If MySQLdb is not installed, the message is: > Install MySQLdb or correct Tests/setup_BioSQL.py (not important if > you do not plan to use BioSQL). If MySQLdb is installed, currently setup_BioSQL.py includes the default settings used on http://www.biopython.org/wiki/BioSQL which if not setup gives: > Connection failed, check settings in Tests/setup_BioSQL.py > if you plan to use BioSQL: ... (The actual database driver error is included as I found this very helpful in actually getting BioSQL setup and working.) Alternatively, we can leave setup_BioSQL.py with missing settings, which would currently show the following message: > Enter your settings in Tests/setup_BioSQL.py > (not important if you do not plan to use BioSQL). My intention with setup_BioSQL.py was that it would all be "ready to go" for people trying out BioSQL following the wiki. People without mySQLdb installed wouldn't see a nasty message. The only downside (the message you saw) is for people who have mySQLdb installed, but have not setup BioSQL yet. I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to have no default settings (making setting up and testing BioSQL just a little bit harder). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:50:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:50:20 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810081550.m98FoKpU028611@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:50 EST ------- Does anyone have any objections to this three line change? Its "just" doing this to the Seq, MutableSeq and SeqRecord classes: old: class Seq : ... new: class Seq(object) : ... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:51:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:51:06 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200810081551.m98Fp6pK028664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:51 EST ------- I think this is all done now :) Marking as fixed -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 15:56:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 11:56:46 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200810081556.m98Fuk0s029044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #36 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 11:56 EST ------- as the main use cases are now covered, I'm marking this bug as fixed. For SeqRecord objects with no taxonomy information, nothing has changed. For SeqRecord objects with taxonomy information AND an NCBI taxon ID, we now record either the full taxonomy via Bio.Entrez if requested, or a stub entry which can be completed by running load_ncbi_taxonomy.pl later. For the atypical case of sequences with taxonomy information but NO NCBI taxon ID, the old behaviour continues - new entries will be created in the taxon tables for the given lineage, without attempting to match existing entries. To do this properly would require some clever heuristics. If this final situation is a real issue, we can re-visit this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 16:03:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 12:03:26 -0400 Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector In-Reply-To: Message-ID: <200810081603.m98G3QUA029538@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2592 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 12:03 EST ------- Now that we have decided to drop Numeric support, it would be possible to press ahead with a gradual move from the numpy.oldnumeric.* to the new numpy.* API. Note that the suggested code would need to be tweaked slightly not to use scipy for the determinant. See: http://lists.open-bio.org/pipermail/biopython/2008-September/004509.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 8 17:12:07 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 8 Oct 2008 18:12:07 +0100 Subject: [Biopython-dev] Time to deprecate Bio.Transcribe? Message-ID: <320fb6e00810081012x54d82b44ga0f7bc0dcb0cf4b9@mail.gmail.com> In Biopython 1.48 the module Bio.Transcribe was described as obsolete, both in the docstring and the tutorial which also warned it was likely to be deprecated: > 3.9 Transcription and Translation Continued > > In the previous sections we talked about the transcription and > translation functions in the Bio.Seq module, which are intended > to be very simple and easy to use. > > There is also an older Bio.Translate module which has a few > more advanced options, but is more complicated to use. > Additionally there is also an older Bio.Transcribe module, but > as this is now obsolete and likely to be deprecated, we will not > discuss it here. So, I'd like to now deprecate Bio.Transcribe for the next release. Any objections/comments? Peter P.S. I'm also hoping that for the next release we can finish Bug 2381 as well, and then mark Bio.Translate as obsolete. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 17:20:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:20:26 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810081720.m98HKQjU002529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2600 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 13:20 EST ------- Just to note that issuing a deprecation warning requires using new style properties, which requires making the Seq and MutableSeq objects into new style classes - this was filed as a separate issue, Bug 2600. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 17:20:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:20:29 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810081720.m98HKT9W002546@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2509 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 17:25:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 13:25:23 -0400 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200810081725.m98HPNCt002893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 13:25 EST ------- (In reply to comment #4) > A safer approach might be to check if the test generates any output, since > tests that use an output file now print the name of the test first. That sounds fine - but currently test_Wise, test_psw and test_Cluster DO have some output, e.g. test_Cluster test_clusterdistance (test_Cluster.TestCluster) ... ok test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok test_kcluster (test_Cluster.TestCluster) ... ok test_matrix_parse (test_Cluster.TestCluster) ... ok test_median_mean (test_Cluster.TestCluster) ... ok test_somcluster (test_Cluster.TestCluster) ... ok test_treecluster (test_Cluster.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.015s OK > Another approach is to do the output comparison inside of each test script > that produces output instead of in run_tests.py. Basically, this means that > the compare_output function in run_tests.py should be moved to a separate > script, which gets imported by each test script that wants to use > compare_output. I can see what you have in mind here, but if we can avoid a separate "helper script" it would nicer (and reduce end user confusion). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 19:44:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 15:44:05 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810081944.m98Ji56C013548@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #6 from bsouthey at gmail.com 2008-10-08 15:44 EST ------- (In reply to comment #5) > I believe the only remaining issue on this bug is improving the failure/skip > message from the BioSQL tests. On a CVS checkout, this can vary depending on > if MySQLdb is installed or not. > > If MySQLdb is not installed, the message is: > > > Install MySQLdb or correct Tests/setup_BioSQL.py (not important if > > you do not plan to use BioSQL). > > If MySQLdb is installed, currently setup_BioSQL.py includes the default > settings used on http://www.biopython.org/wiki/BioSQL which if not setup gives: > > > Connection failed, check settings in Tests/setup_BioSQL.py > > if you plan to use BioSQL: ... > > (The actual database driver error is included as I found this very helpful in > actually getting BioSQL setup and working.) > > Alternatively, we can leave setup_BioSQL.py with missing settings, which would > currently show the following message: > > > Enter your settings in Tests/setup_BioSQL.py > > (not important if you do not plan to use BioSQL). > > My intention with setup_BioSQL.py was that it would all be "ready to go" for > people trying out BioSQL following the wiki. People without mySQLdb installed > wouldn't see a nasty message. The only downside (the message you saw) is for > people who have mySQLdb installed, but have not setup BioSQL yet. > > I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to > have no default settings (making setting up and testing BioSQL just a little > bit harder). > I think that a user must be forced to change Tests/setup_BioSQL.py or similar because these settings may not be correct. Especially if dbuser is not root, dbuser lacks permissions and necessary privileges or dbuser has a password (security). So the current message you get if DBDRIVER is not defined is okay: "Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL)." A stray thought is to create a BioSQL configuration file (which is what Tests/setup_BioSQL.py is) when a user installs BioSQL. This would permit removing the need to enter that information when using BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 8 20:30:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 8 Oct 2008 16:30:59 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200810082030.m98KUxO6016609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-08 16:30 EST ------- (In reply to comment #6) > > My intention with setup_BioSQL.py was that it would all be "ready to go" for > > people trying out BioSQL following the wiki. People without mySQLdb > > installed wouldn't see a nasty message. The only downside (the message > > you saw) is for people who have mySQLdb installed, but have not setup > > BioSQL yet. > > > > I suggest we either leave this as it is, or change Tests/setup_BioSQL.py to > > have no default settings (making setting up and testing BioSQL just a little > > bit harder). > > > > I think that a user must be forced to change Tests/setup_BioSQL.py or similar > because these settings may not be correct. Especially if dbuser is not root, > dbuser lacks permissions and necessary privileges or dbuser has a password > (security). So the current message you get if DBDRIVER is not defined is okay: > > "Enter your settings in Tests/setup_BioSQL.py (not important if you do > not plan to use BioSQL)." Done in Tests/setup_BioSQL.py CVS revision 1.4, and I've also reworded http://www.biopython.org/wiki/BioSQL slightly as a result. Marking this bug as fixed. Thanks Bruce, Peter > A stray thought is to create a BioSQL configuration file (which is what > Tests/setup_BioSQL.py is) when a user installs BioSQL. This would permit > removing the need to enter that information when using BioSQL. Sadly I don't think that would be easy. Currently installing BioSQL is a largely manual process, with lots and lots of options (database program, name, username, password etc). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 9 15:08:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Oct 2008 11:08:00 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810091508.m99F80WA030837@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #836 is|0 |1 obsolete| | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-09 11:07 EST ------- (From update of attachment 836) I've just added transcribe and back_transcribe methods to the Seq object in CVS. Bio/Seq.py revision 1.40 Tests/test_seq.py revision 1.24 Tests/output/test_seq revision 1.18 This bug is still open to cover the translation method(s). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Oct 9 15:31:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 16:31:04 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: References: <20080923120809.GG13074@localdomain> Message-ID: <320fb6e00810090831x28015a2bg43931849acfecf34@mail.gmail.com> Leighton Pritchard wrote: > Hi all, > > It looks like Bio.DBXRef provides a dictionary of dictionaries that > associate database identifiers from a number of file formats with the > appropriate databases. This sort of thing might be useful to keep around > (i.e. not to have to rebuild from scratch) if there is an intention to > populate the dbxref table with consistent Dbnames for divergent identifiers. > However, Peter appears to have noted in the code for Loader.py that this > behaviour would be inconsistent with the other Bio* projects, and mentions > bug 2405 in that context. > > L. As things stand, we don't used this kind of mapping in BioSQL, so I see no reason not to deprecate Bio.DBXRefs now. Of course, I can be talked out of this if anyone has a good use case example. Brad wrote: >> DBXref is associated with all the Martel parsing, so it can be >> removed/deprecated as well. It was used in building SeqRecords from >> Martel descriptions (Bio.builders.SeqRecord.sequence). I've just marked Bio.DBXRef as deprecated for 1.49. Returning to an earlier point on this thread, I have also removed Bio.SGMLExtractor (which was deprecated in 1.46). I think that wraps up the Martel/Mindy deprecations for now - in a few releases time we'll have the much simpler task of removing these modules. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 16:22:05 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 17:22:05 +0100 Subject: [Biopython-dev] Bio.Ndb obsolete? Message-ID: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> Hi all, I just had a very superficial look at the Bio.Ndb module. This is an HTML parser written six years ago, with its last real update five years ago. The given URL doesn't work, but the server is still up - however from first glance the whole page layout has changed. For example, compare the old HTML example under Bio/Ndb/PR0004.html to what seems to be the current equivalent: http://ndbserver.rutgers.edu/servlet/IDSearch.NDBSearch1?id=PR0004 I think it is safe to say Bio.Ndb stopped working some time ago due to the website's HTML changing. Does anyone here use this database? Maybe we should ask on the mailing list, and assuming no one is interested, just deprecate this code. For future, should we have a statement http://www.biopython.org/wiki/Contributing and in the tutorial that we don't want to add any HTML parsers to Biopython? Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 17:19:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 18:19:11 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> Message-ID: <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> Peter wrote, > I'll rerun the test suite tomorrow on Python 2.6, but apart from > Bio.Restriction I think we are OK on the the set/Set issue. > > There's a complex __init__ / super issue in Bio.Restriction on Bug > 2604 which may be solved (Eric is hoping to investigate further time > permitting). Any additional eyes on this couldn't hurt. See > http://bugzilla.open-bio.org/show_bug.cgi?id=2604 Using CVS, Bio.Restriction seems happy now - in addition to the "super" change for Bug 2604, I have also made the sets/set change. > Are there any other python 2.6 issues? I'd forgotten about the Bio.Crystal exception problem (we didn't file a bug on this): .../Bio/Crystal/__init__.py:42: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message = message Otherwise all core the tests pass on my Linux python 2.6 machine (skipping those needing reportlab, MySQLdb or other optional modules). Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 20:21:15 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 21:21:15 +0100 Subject: [Biopython-dev] Bio.Ndb obsolete? In-Reply-To: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> References: <320fb6e00810090922s61bd6679we9377924d3b7fa5d@mail.gmail.com> Message-ID: <320fb6e00810091321lb3ec34eua44aeeac462ced1c@mail.gmail.com> On Thu, Oct 9, 2008 at 5:22 PM, Peter wrote: > Hi all, > > I just had a very superficial look at the Bio.Ndb module. This is an > HTML parser written six years ago, with its last real update five > years ago. The given URL doesn't work, but the server is still up - > however from first glance the whole page layout has changed. > > For example, compare the old HTML example under Bio/Ndb/PR0004.html to > what seems to be the current equivalent: > http://ndbserver.rutgers.edu/servlet/IDSearch.NDBSearch1?id=PR0004 > > I think it is safe to say Bio.Ndb stopped working some time ago due to > the website's HTML changing. Does anyone here use this database? > Maybe we should ask on the mailing list, and assuming no one is > interested, just deprecate this code. If we do drop Bio.Ndb, then I wonder if the related Bio.Crystal module is still relevant? > For future, should we have a statement > http://www.biopython.org/wiki/Contributing and in the tutorial that we > don't want to add any HTML parsers to Biopython? I've made some fairly small changes to the wiki "Contributing" page, which includes this and also mentioning unit tests and documentation for code contributions. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 20:56:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 21:56:43 +0100 Subject: [Biopython-dev] Bio.mathfns obsolete? And Bio.clistfns too? Message-ID: <320fb6e00810091356k36f1fca5ib431504eaeb83818@mail.gmail.com> I'm still in clean up mode! Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now uses numpy more heavily instead. I think that Bio.mathfns (and its C implementation) are no longer used anywhere in Biopython (and I would be surprised if anyone else is using this module). I'm suggesting deprecating Bio.mathfns and Bio.cmathfns for the next release. Similarly, Bio.listfns and its C implementation Bio.clistfns might also be deprecated with a little effort. Some of this code seems to predate things like the python sets module (and its replacement, the built in set). Based on a quick grep, only three modules currently use Bio.listfns: Bio.MarkovModel - uses only listfns.itemindex Bio.NaiveBayes - uses listfns.itemindex, listfns.items and listfns.contents Bio.MaxEntropy - uses listfns.itemindex and listfns.items At first glance, listfns.items(...) might be replaced with list(set(...)) leaving just two trivial functions listfns.items and listfns.contents which don't really justify an entrie module (plus C code). On the other hand, these may be performance bottlenecks for Bio.NaiveBayes and Bio.MaxEntropy which could justify keeping the C code. Peter From biopython at maubp.freeserve.co.uk Thu Oct 9 21:03:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 9 Oct 2008 22:03:42 +0100 Subject: [Biopython-dev] Bio.mathfns obsolete? And Bio.clistfns too? And Bio.stringfns? Message-ID: <320fb6e00810091403k2c0d09bbk4a6962bd9e614ab3@mail.gmail.com> On Thu, Oct 9, 2008 at 9:56 PM, Peter wrote: > I'm still in clean up mode! ... I think that Bio.mathfns (and its C > implementation) are no longer used anywhere in Biopython ... > I'm suggesting deprecating Bio.mathfns and Bio.cmathfns for > the next release. > > Similarly, Bio.listfns and its C implementation Bio.clistfns might > also be deprecated with a little effort. ... And on a related note, I think Bio.stringfns and its C implementation Bio.cstringfns are also now unused in Biopython, and like Bio.mathfns and Bio.cmathfns should be deprecated for the next release. Peter From biopython at maubp.freeserve.co.uk Fri Oct 10 09:42:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Oct 2008 10:42:46 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> Message-ID: <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> > I'd forgotten about the Bio.Crystal exception problem (we didn't file > a bug on this): Fixed in CVS. > Otherwise all core the tests pass on my Linux python 2.6 machine > (skipping those needing reportlab, MySQLdb or other optional modules). All core tests, plus the graphics ones using reportlab, and the BioSQL ones using MySQLdb, now pass on my Linux python 2.6 machine. The only things I have not covered are: test_GFF, test_PopGen_FDist, test_PopGen_SimCoal, test_Wise, test_psw which require additional command line tools etc. Note that with reportlab 2.2 under python 2.6 there is a deprecation warning from reportlab/pdfgen/canvas.py about md5, this has been fixed to use hashlib in the reportlab SVN. Note that with MySQLdb 1.2.2 under python 2.6 there is deprecation warning from MySQLdb/__init__.py about the sets module, which does not seem to have been fixed on the 1.2 branch or the trunk in their SVN. I have reported this issue as a bug on the MySQLdb sourceforge page. So, as far as I can see, we are OK with python 2.6 on Linux. We should probably try and get this tested on Windows and on the Mac too for completeness. Peter From biopython at maubp.freeserve.co.uk Fri Oct 10 14:39:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 10 Oct 2008 15:39:50 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> Message-ID: <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> > > So, as far as I can see, we are OK with python 2.6 on Linux. We > should probably try and get this tested on Windows and on the Mac too > for completeness. > Something the unit tests didn't flag up is the deprecation of popen2, os.popen2, os.popen3, and os.popen4 in python 2.6 - see http://www.python.org/dev/peps/pep-0361/ Ignoring deprecated code, this affects the following modules: Bio.Application Bio.Blast.NCBIStandalone - see also Bug 2528 Bio.Clustalw Bio.Emboss.Applications Bio.PDB.NACCESS It might make sense to ensure all these used Bio.Application rather than re-inventing the wheel? We would then have a single point for calling command line tools, which could use the subprocess module on Python 2.4+, falling back on os.popen* for python 2.3. As a bonus this might cope with filenames with spaces better on Windows. While we are discussing this, does anyone know why Bio.Blast.NCBIStandalone doesn't use Bio.Blast.Application (which subclasses Bio.Application)? Looking over the CVS, eight years ago in revision 1.5 of Bio/Blast/NCBIStandalone.py Jeff added the code for calling standalone BLAST. Then Brad added Bio/Blast/Applications.py later (about six years ago). Note that we also have plenty of modules using os.system too (where there is no need to capture the command's output): Bio.PDB.DSSP Bio.PDB.NACCESS Bio.PDB.PDBList Bio.PDB.PSEA Bio.PDB.ResidueDepth Bio.Wise Bio.PopGen.FDist.Controller Bio.PopGen.SimCoal.Controller Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 10 21:07:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:07:54 -0400 Subject: [Biopython-dev] [Bug 2528] NCBIStandalone.blastall(): Replace os.popen3 with subprocess.Popen In-Reply-To: Message-ID: <200810102107.m9AL7sSq013518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2528 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:07 EST ------- Note that os.popen3 is deprecated in python 2.6 which gives another reason for moving to subprocess. This issue is fixed in Bio/Blast/NCBIStandalone.py revision 1.82, based on changed discussed on See also Bug 2480. We use subprocess where present (i.e. python 2.4+) and fall back to os.popen3 (for python 2.3). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 21:12:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:12:16 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810102112.m9ALCGi8013788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #36 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:12 EST ------- Note that os.popen3 is deprecated in python 2.6 which gives another reason for moving to subprocess. After testing on Linux as well, I have updated Bio/Blast/NCBIStandalone.py in CVS revision 1.82, based on changes discussed here. See: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython We now use subprocess where present (i.e. python 2.4+) and fall back to os.popen3 (for python 2.3). This fixes Bug 2528, and should fix this as well (Bug 2480) - assuming we leave things as they are for spaces in the database argument. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 21:26:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:26:34 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200810102126.m9ALQYMH014735@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 17:26 EST ------- Change made in CVS, marking as fixed. Bio/Seq.py revision 1.42 Bio/SeqRecord.py revision 1.21 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 21:26:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 17:26:37 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810102126.m9ALQb3Q014747@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 Bug 2509 depends on bug 2600, which changed state. Bug 2600 Summary: enhance Seq and SeqRecord to new style classes http://bugzilla.open-bio.org/show_bug.cgi?id=2600 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 10 22:03:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 10 Oct 2008 18:03:21 -0400 Subject: [Biopython-dev] [Bug 2525] The unit tests GUI run_tests.py does not track skipped tests In-Reply-To: Message-ID: <200810102203.m9AM3LgK017028@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2525 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-10 18:03 EST ------- Unit test GUI removed in CVS, marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 05:08:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 01:08:34 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810110508.m9B58Y2K013621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #37 from drpatnaik at yahoo.com 2008-10-11 01:08 EST ------- Thank you. Confirming that CVS version 1.82 of the file works fine on Windows XP SP2 with Python 2.5.2. A note: A custom script using Bio/Blast can appear to hang, and the results file truncated, if the 'error handle' is used before the 'result handle': res_hdl, err_hdl = NCBIStandalone.blastall(my_blast, 'blastn', my_db, my_seq) # OK my_result = res_hdl.read() my_error = err_hdl.read() # Not OK my_error = err_hdl.read() my_result = res_hdl.read() Some recapitulated notes: 1. File-names, file-paths, or database values can contain spaces. 2. There is no special, Windows-specific requirement to use backslash (\) as the directory separator. 3. There is no special, Windows-specific requirement to enclose a value inside double-quotes (") instead of single-quotes ('), or to use Python's 'r'. 4. Except for database values, DOS 8.3 file-names (short file-names) can be used. 5. If the database value contains a space, it should be enclosed in double-quotes ("). 6. If the database value refers to multiple databases, and at least one of them has a space in it, then the pointer for that database should be additionally enclosed in backslash-escaped double-quotes (\"). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 05:44:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 01:44:50 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810110544.m9B5iouP016206@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #38 from drpatnaik at yahoo.com 2008-10-11 01:44 EST ------- (In reply to comment #37) > 4. Except for database values, DOS 8.3 file-names (short file-names) can be > used. Sorry, short file-names _can_ be used for database values [but they cannot be generated by win32api.GetShortPathName, etc.]. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 11:52:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 07:52:37 -0400 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or reportlab in run_tests.py In-Reply-To: Message-ID: <200810111152.m9BBqb6x006207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Summary|Handle missing libraries |Handle missing libraries |like TextTools in |like numpy or reportlab in |run_tests.py |run_tests.py ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-11 07:52 EST ------- After the switch from Numeric to numpy, and the deprecation of Martel/Mindy, this only applies to two libraries: import numpy import reportlab Retitling bug, and downgrading to minor. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 11 12:37:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Oct 2008 08:37:49 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810111237.m9BCbndK009847@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-11 08:37 EST ------- For the sake of discussion, here is a simple (i.e. minimal) translate method for the Seq object (any checked in code should also simplify the current Seq module's translate function to call this for Seq objects). def translate(self, table = "Standard", stop_symbol = "*"): """Terms a nucleotide sequence into a protein sequence (amino acids). This method will translate DNA or RNA sequences, but for a protein sequence an exception is raised. table - Which codon table to use? This can be either a name (string) or an NCBI identifier (integer). NOTE - Ambiguous codons like "TAN" or "NNN" could be an amino acid or a stop codon. These are translated as "X". Any invalid codon (e.g. "TA?" or "T-A") will throw a TranslationError. NOTE - Does NOT support gapped sequences. NOTE - This does NOT behave like the python string's translate method. For that use str(my_seq).translate(...) instead. """ try: table_id = int(table) except ValueError: table_id = None if isinstance(self.alphabet, Alphabet.ProteinAlphabet) : raise ValueError, "Proteins cannot be translated!" if self.alphabet==IUPAC.unambiguous_dna: if table_id is None: codon_table = CodonTable.unambiguous_dna_by_name[table] else: codon_table = CodonTable.unambiguous_dna_by_id[table_id] elif self.alphabet==IUPAC.ambiguous_dna: if table_id is None: codon_table = CodonTable.ambiguous_dna_by_name[table] else: codon_table = CodonTable.ambiguous_dna_by_id[table_id] elif self.alphabet==IUPAC.unambiguous_rna: if table_id is None: codon_table = CodonTable.unambiguous_rna_by_name[table] else: codon_table = CodonTable.unambiguous_rna_by_id[table_id] elif self.alphabet==IUPAC.ambiguous_rna: if table_id is None: codon_table = CodonTable.ambiguous_rna_by_name[table] else: codon_table = CodonTable.ambiguous_rna_by_id[table_id] else: if table_id is None: codon_table = CodonTable.ambiguous_generic_by_name[table] else: codon_table = CodonTable.ambiguous_generic_by_id[table_id] protein = _translate_str(str(self), codon_table, stop_symbol) if stop_symbol in protein : alphabet = Alphabet.HasStopCodon(codon_table.protein_alphabet, stop_symbol = stop_symbol) else : alphabet = codon_table.protein_alphabet return Seq(protein, alphabet) Unlike my earlier comment 11, I'm now leaning to a single trnaslation method (perhaps with extra arguments). You'll notice here I am suggesting using the method name "translate" even though this clashes with the python string method of the same name. This could cause confusion if the Seq object is passed to non-Biopython code which expects a string, but overall seems much simpler for end users. Other method names could be: * translate_ (trailing underscore, see PEP8) which I think is ugly. * translation (noun rather than verb), differs from established style. * bio_translate which is I think too long. I'm thinking we could also support "start" and "end" optional arguments (named after those used in the python string methods, and behaving in the same way) for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would give the three forward reading frames. An optional boolean argument could enable treating the sequence as a CDS - verifying it starts with a start codon (which would always be translated as M) and verifying it ends with a stop codon (with no other stop codons in frame), which would not be translated. Following BioPerl, this argument could be called "complete". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Oct 13 12:00:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Oct 2008 13:00:08 +0100 Subject: [Biopython-dev] Python 2.6 In-Reply-To: <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> References: <320fb6e00810030544l510d76f7g93d805ec5840c1d4@mail.gmail.com> <320fb6e00810030952q74d595d6l89adf06890d5311@mail.gmail.com> <320fb6e00810060622g2f0f9107mc3c7528d5ce3cb21@mail.gmail.com> <48EA47CC.5070202@biologie.uni-kl.de> <320fb6e00810061036w4f161de3o5ccefd8d0a8bcee1@mail.gmail.com> <320fb6e00810091019q64214738nbc8a55f19c1e5eaa@mail.gmail.com> <320fb6e00810100242y30faa5f5od9ff344605344e27@mail.gmail.com> <320fb6e00810100739y7363c0efl8fdfa86455770666@mail.gmail.com> Message-ID: <320fb6e00810130500mf1f20c1gc04f1aa782d5e1f@mail.gmail.com> > Something the unit tests didn't flag up is the deprecation of popen2, > os.popen2, os.popen3, and os.popen4 in python 2.6 - see > http://www.python.org/dev/peps/pep-0361/ Some progress: Bio.Blast.NCBIStandalone - fixed in CVS, uses subprocess where available Bio.Application - fixed in CVS, uses subprocess where available These two changes passed my own hand testing, but as we don't have any unit tests covering these having a 3rd party double check would be a good idea. Bio.Clustalw - actually only uses os.popen which is still OK Bio.Emboss.Applications - only via Bio.Application, so OK Leaving just: Bio.PDB.NACCESS - uses os.popen3, looks simple to update but I don't have naccess installed yet. I suppose this would make a nice unit test too. See http://www.bioinf.manchester.ac.uk/naccess/ Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 14 10:16:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 06:16:17 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810141016.m9EAGHma005952@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-14 06:16 EST ------- We seem to have reached a consensus on the mailing list to use "translate" for the Seq object method (even though this clashes with the python string method of the same name). See: http://lists.open-bio.org/pipermail/biopython/2008-October/004575.html I've checked some code based on that in comment 17 into CVS, and updated the test_seq.py unit test to cover this: Bio/Seq.py revision 1.44 Tests/test_seq.py revision 1.26 I'm leaving this bug open to discuss possible further optional arguments for the translate method (and perhaps for the Bio.Seq.translate function too). e.g. As I wrote in comment 17, > I'm thinking we could also support "start" and "end" optional arguments (named > after those used in the python string methods, and behaving in the same way) > for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would > give the three forward reading frames. This would give an alternative to: my_seq[i:j].translate(table) as: my_seq.translate(table, start=i, end=j) As with the python string methods, potentially the implementation could be slightly faster as a new Seq object doesn't need to be created for the slice. On the other hand, it does then offer two ways of doing the same thing. > An optional boolean argument could enable treating the sequence as a CDS - > verifying it starts with a start codon (which would always be translated as M) > and verifying it ends with a stop codon (with no other stop codons in frame), > which would not be translated. Following BioPerl, this argument could be > called "complete". Related to this, it would be useful to have a boolean option to stop translation at the first in frame stop codon (possible argument names for this include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). For comparison, see the translate_to_stop method in the semi-obsolete Bio.Translate.Translator object. We will also need to support back_translate before we can deprecate the old Bio.Translate module (see comment 6 and comment 7). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 16:42:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 12:42:41 -0400 Subject: [Biopython-dev] [Bug 2616] New: BioSQL support for Psycopg2 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2616 Summary: BioSQL support for Psycopg2 Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com Biopython 1.48 BioSQL does not support the psycopg2 PostgreSQL driver (http://www.initd.org/pub/software/psycopg/). Current support is for the psycopg1 driver only - the latest of which is 3 yrs old and no longer developed. As far as I can tell the only change is to how autocommit is flagged. PATCH: ========================================================================= diff -ruN BioSQL/BioSeqDatabase.py /usr/local/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py --- BioSQL/BioSeqDatabase.py 2008-08-27 17:34:16.000000000 +0100 +++ /usr/local/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py 2008-10-14 15:57:07.000000000 +0100 @@ -53,7 +53,7 @@ if kw.has_key("passwd"): kw["password"] = kw["passwd"] del kw["passwd"] - if driver == "psycopg" and not kw.get("database"): + if driver in ["psycopg", "psycopg2"] and not kw.get("database"): kw["database"] = "template1" try: conn = connect(**kw) @@ -134,7 +134,7 @@ # 1. PostgreSQL can load it all at once and actually needs to # due to FUNCTION defines at the end of the SQL which mess up # the splitting by semicolons - if self.module_name in ["psycopg"]: + if self.module_name in ["psycopg", "psycopg2"]: self.adaptor.cursor.execute(sql) # 2. MySQL needs the database loading split up into single lines of # SQL executed one at a time diff -ruN BioSQL/DBUtils.py /usr/local/lib/python2.5/site-packages/BioSQL/DBUtils.py --- BioSQL/DBUtils.py 2008-03-21 10:48:32.000000000 +0000 +++ /usr/local/lib/python2.5/site-packages/BioSQL/DBUtils.py 2008-10-14 15:57:28.000000000 +0100 @@ -68,7 +68,17 @@ def autocommit(self, conn, y = True): conn.autocommit(y) + _dbutils["psycopg"] = Psycopg_dbutils + +class Psycopg2_dbutils(Psycopg_dbutils): + def autocommit(self, conn, y = True): + if y: + conn.set_isolation_level(0) + else: + conn.set_isolation_level(1) + +_dbutils["psycopg2"] = Psycopg2_dbutils class Pgdb_dbutils(Generic_dbutils): """Add support for pgdb in the PyGreSQL database connectivity package. ======================================================================== Tests/test_BioSQL.py : [cymon at chara Tests]$ python test_BioSQL.py Load SeqRecord objects into a BioSQL database. ... ok Get a list of all items in the database. ... ok Test retrieval of items using various ids. ... ok Make sure Seqs from BioSQL implement the right interface. ... ok Check SeqFeatures of a sequence. ... ok Make sure SeqRecords from BioSQL implement the right interface. ... ok Check that slices of sequences are retrieved properly. ... ok Make sure all records are correctly loaded. ... ok Indepth check that SeqFeatures are transmitted through the db. ... ok ---------------------------------------------------------------------- Ran 9 tests in 19.749s OK With a tweak to test_BioSQL_SeqIO.py : 154 else : 155 #Should both be lists of strings... 156 old_f.qualifiers[key].sort() 157 new_f.qualifiers[key].sort() 158 assert old_f.qualifiers[key] == new_f.qualifiers[key] One record in the tests has two \allele features "T" and "C" so they need to be sorted before comparison. $ python test_BioSQL_SeqIO.py > out $ diff out output/test_BioSQL_SeqIO 0a1 > test_BioSQL_SeqIO $ BUT both _FAIL_ when run with the run_tests.py. The short exercises in the BioSQL wiki (after the unit tests) also run fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 17:25:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 13:25:24 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810141725.m9EHPOgt003394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-14 13:25 EST ------- Supporting psycopg2 sounds good :) What version of Biopython do you have? 1.48 or CVS as there have been some BioSQL changes recently (mostly to do with the taxonomy tables). I'm surprised the order of the qualifiers isn't being preserved - I think we should fix that rather than tweaking test_BioSQL_SeqIO.py to ignore this. Which version of the BioSQL schema do you have? It is possible that this is a BioSQL issue/difference in the PostgreSQL schema compared to the BioSQL schema which I have been using when running the tests. Also your problem about the two tests failing when run via run_tests.py is concerning. What happens if you do this: python run_tests.py -g test_BioSQL python run_tests.py -g test_BioSQL_SeqIO python run_tests.py test_BioSQL test_BioSQL_SeqIO cvs diff output/test_BioSQL output/test_BioSQL_SeqIO Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 14 18:09:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 14 Oct 2008 14:09:08 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810141809.m9EI98Bs007989@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #2 from cymon.cox at gmail.com 2008-10-14 14:09 EST ------- (In reply to comment #1) > Supporting psycopg2 sounds good :) > > What version of Biopython do you have? 1.48 or CVS 1.48 > as there have been some > BioSQL changes recently (mostly to do with the taxonomy tables). I loaded taxonomy in with the Pg driver and load_ncbi_taxonomy.pl with no problem. > I'm surprised the order of the qualifiers isn't being preserved - I think we > should fix that rather than tweaking test_BioSQL_SeqIO.py to ignore this. Sure, that's probably a better approach :) > Which version of the BioSQL schema do you have? biosql-1.0.1 > It is possible that this is a > BioSQL issue/difference in the PostgreSQL schema compared to the BioSQL schema > which I have been using when running the tests. > > Also your problem about the two tests failing when run via run_tests.py is > concerning. What happens if you do this: > > python run_tests.py -g test_BioSQL > python run_tests.py -g test_BioSQL_SeqIO > python run_tests.py test_BioSQL test_BioSQL_SeqIO > cvs diff output/test_BioSQL output/test_BioSQL_SeqIO It really is broken when using run_tests.py, after running with the -g flag: $ cat output/test_BioSQL test_BioSQL Load SeqRecord objects into a BioSQL database. ... ERROR Get a list of all items in the database. ... ERROR Test retrieval of items using various ids. ... ERROR Make sure Seqs from BioSQL implement the right interface. ... ERROR Check SeqFeatures of a sequence. ... ERROR Make sure SeqRecords from BioSQL implement the right interface. ... ERROR Check that slices of sequences are retrieved properly. ... ERROR Make sure all records are correctly loaded. ... ERROR Indepth check that SeqFeatures are transmitted through the db. ... ERROR etc... Probably not the solution we're looking for... The problem is that run_test.py is not picking up the psycopg2 adapter and is deferring to the generic adapter, consequently it throws on "InternalError: DROP DATABASE cannot run inside a transaction block". Why that's the case when individually the test work OK is something I tried to track this down but just couldn't figure it... Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 09:14:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 05:14:47 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810150914.m9F9Elra032490@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 05:14 EST ------- (In reply to comment #2) > (In reply to comment #1) > > Supporting psycopg2 sounds good :) > > > > What version of Biopython do you have? 1.48 or CVS as there have > > been some BioSQL changes recently (mostly to do with the taxonomy > > tables). > > 1.48 > Could you update to Biopython CVS please? This now populates the taxon/taxon_name tables differently when there is an NCBI taxon ID (with the option to fetch lineages from Entrez). Once you're running CVS, could you attach a patch to this bug. That should make it easier for me to look at this. > > Which version of the BioSQL schema do you have? > > biosql-1.0.1 Good. > > Also your problem about the two tests failing when run via run_tests.py is > > concerning. What happens if you do this: > > > > python run_tests.py -g test_BioSQL > > python run_tests.py -g test_BioSQL_SeqIO > > python run_tests.py test_BioSQL test_BioSQL_SeqIO > > cvs diff output/test_BioSQL output/test_BioSQL_SeqIO > > It really is broken when using run_tests.py, after running with the -g flag: > $ cat output/test_BioSQL > test_BioSQL > Load SeqRecord objects into a BioSQL database. ... ERROR > Get a list of all items in the database. ... ERROR > Test retrieval of items using various ids. ... ERROR > Make sure Seqs from BioSQL implement the right interface. ... ERROR > Check SeqFeatures of a sequence. ... ERROR > Make sure SeqRecords from BioSQL implement the right interface. ... ERROR > Check that slices of sequences are retrieved properly. ... ERROR > Make sure all records are correctly loaded. ... ERROR > Indepth check that SeqFeatures are transmitted through the db. ... ERROR > etc... > > Probably not the solution we're looking for... This was really a diagnostic step, rather than a solution. > The problem is that run_test.py is not picking up the psycopg2 adapter and is > deferring to the generic adapter, consequently it throws on "InternalError: > DROP DATABASE cannot run inside a transaction block". Why that's the case when > individually the test work OK is something I tried to track this down but just > couldn't figure it... You must have edited test_setup_BioSQL.py correctly, so that's probably not the problem. Where did you install psycopg2? Using run_tests.py does some magic with the python path to make sure the local copy of Biopython you've just built is used, rather than any existing system installation of Biopython. Perhaps this is preventing python from finding psycopg2 somehow. You don't have any test files present called psycopg2.py do you? Alternatively, maybe there is something wrong with your adaptor code - but presumably this works outside the test suite? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From lpritc at scri.ac.uk Wed Oct 15 10:00:31 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 11:00:31 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram Message-ID: Hi, A while ago I wrote the GenomeDiagram library for drawing images of genomes and other large biological sequences, and collections of sequences (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). This library already uses Biopython objects (Seq, SeqFeature, etc.) and, like other modules in Bio.Graphics, has a dependency on Reportlab only. It's been published, and has found use in other groups, who seem to be using it without any issues - there's been a trickle of maintenance requests, but nothing of late other than questions from people new to Python. Now that I have managed to free up a little bit of time I'd like to revisit GenomeDiagram, tidy up the internals some more (there's some clunky stuff in there...), and contribute it to Bio.Graphics - which hasn't seen much traffic for a while. Looking at the current Bio.Graphics structure, I think that incorporating the (revised) library as Bio.Graphics.GenomeDiagram in a directory under Bio.Graphics would be a suitable approach. I'm happy to maintain this code for the foreseeable future, also - though help is, of course, welcome. There is written documentation, which I would happily move over to the wiki, and some testing in __name__ == '__main__', which could be expanded upon and moved over to a unit test format for consistency. One of the things I would like to do to expand on current functionality is to provide some library methods that produce commonly-desired output, similar to that in GenomeAtlas (http://www.cbs.dtu.dk/services/GenomeAtlas/), so that users don't have to know about the internals of GenomeDiagram, and something like a Bio.Graphics.GenomeDiagram.draw_seqrecord_cds(style='circular', gc_content=True, outfile='cds1.pdf') call would produce a simple circular diagram of CDS features with accompanying graph of GC content. I suggested something doing similar a while ago and got no feedback - does anyone object to this contribution, in principle or in practice? Or are there any other comments? I'm all (well, mostly) ears... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 10:24:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 11:24:42 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: References: Message-ID: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard wrote: > Hi, > > A while ago I wrote the GenomeDiagram library for drawing images of genomes > and other large biological sequences, and collections of sequences > (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... > Now that I have managed to free up a little bit of time I'd like to revisit > GenomeDiagram, tidy up the internals some more ..., and contribute it > to Bio.Graphics .. > I suggested something doing similar a while ago and got no feedback - does > anyone object to this contribution, in principle or in practice? Or are > there any other comments? I'm all (well, mostly) ears... I'm in favour of this (and have actually chatted to Leighton about this off list). One small thing I would change is switching colour to color for the argument/properties (the American spelling of color is the norm in all programming usage). Anyone using the existing stand alone GenomeDiagram library would have to make some small changes anyway (new import statements), so if there are going to be any other API changes it would be best to do them at the same time. Peter From lpritc at scri.ac.uk Wed Oct 15 10:39:11 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 11:39:11 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> Message-ID: On 15/10/2008 11:24, "Peter" wrote: > On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard > wrote: >> Hi, >> >> A while ago I wrote the GenomeDiagram library for drawing images of genomes >> and other large biological sequences, and collections of sequences >> (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... >> Now that I have managed to free up a little bit of time I'd like to revisit >> GenomeDiagram, tidy up the internals some more ..., and contribute it >> to Bio.Graphics .. >> I suggested something doing similar a while ago and got no feedback - does >> anyone object to this contribution, in principle or in practice? Or are >> there any other comments? I'm all (well, mostly) ears... > One small thing I would change is switching colour to color for the > argument/properties (the American spelling of color is the norm in all > programming usage). Fair point - I'll do that. Though, like those pesky Canadians (see Maple), I'm inclined to permit either spelling out of sheer bloody-mindedness ;). Historically (rather than etymologically), it's a holdover from working with interim EMBL-ish .tab files from Sanger, which use the British English spelling: """ FT /class="3.1.03" FT /colour=7 FT /gene="asnA" """ Would people see permitting either form of colour/color as potentially confusing? If so, I'm happy to go with the majority spelling. > Anyone using the existing stand alone > GenomeDiagram library would have to make some small changes anyway > (new import statements), so if there are going to be any other API > changes it would be best to do them at the same time. I agree. I see this as a break from the standalone library, and would be branching this version of GenomeDiagram from what had gone before. While I'd like to make API changes as low-impact as possible, I'm in favour of such changes where they support functional improvement. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 11:41:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 12:41:25 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu> <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com> Message-ID: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> On Fri, Sep 26 Peter wrote: > On Fri, Sep 26 Jared wrote: >> On Sep 26 Peter wrote: >> >>> Did you try the dot-plot example? >> >> I didn't, but it looked good. > > Hopefully I've pitched it right - I've tried to make it as simple as > possible, but the nested list comprehension is perhaps non-obvious. Old output: http://biopython.org/DIST/docs/tutorial/images/dot_plot.png I recently wanted to draw a dot plot for a larger pair of sequences, and found that the example code didn't scale well. There were two issues, the naive calculation and the fact that pylab.imshow has an upper limit for the size of matrix (due to memory). I've added a second more complicated version to the Tutorial in CVS using pylab.scatter for the plotting: http://biopython.org/DIST/docs/tutorial/images/dot_plot_scatter.png #Load two SeqRecord objects from Bio import SeqIO handle = open("ls_orchid.fasta") record_iterator = SeqIO.parse(handle, "fasta") rec_one = record_iterator.next() rec_two = record_iterator.next() handle.close() window = 7 step = 1 #Map every window sized sub-sequence's location in a dict dict_one = {} dict_two = {} for (seq, section_dict) in [(rec_one.seq.tostring().upper(), dict_one), (rec_two.seq.tostring().upper(), dict_two)] : for i in range(0, len(seq)-window, step) : section = seq[i:i+window] try : section_dict[section].append(i) except KeyError : section_dict[section] = [i] #Now find any sub-sequences found in both sequences matches = set(dict_one).intersection(dict_two) print "%i unique matches" % len(matches) #Create lists of x and y co-ordinates for scatter plot x = [] y = [] for section in matches : for i in dict_one[section] : for j in dict_two[section] : x.append(i) y.append(j) #Now draw it import pylab pylab.gray() pylab.scatter(x,y) pylab.xlim(0, len(seq_one)-window) pylab.ylim(0, len(seq_two)-window) pylab.xlabel("%s (length %i bp)" % (rec_one.id, len(rec_one))) pylab.ylabel("%s (length %i bp)" % (rec_two.id, len(rec_two))) pylab.title("Dot plot using window size %i\n(allowing no mis-matches)" % window) pylab.show() Using pylab.scatter is still a bit slow, but it does actually work. I was wondering if this dot-plot code were to use reportlab instead, would it make a sensible addition to the Bio.Graphics module? Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:29:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 08:29:33 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151229.m9FCTXmq017109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #4 from cymon.cox at gmail.com 2008-10-15 08:29 EST ------- Created an attachment (id=1006) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1006&action=view) Psycopg2 support for BioSQL -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 12:31:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 08:31:34 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151231.m9FCVYMn017238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #5 from cymon.cox at gmail.com 2008-10-15 08:31 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > Supporting psycopg2 sounds good :) > > > > > > What version of Biopython do you have? 1.48 or CVS as there have > > > been some BioSQL changes recently (mostly to do with the taxonomy > > > tables). > > > > 1.48 > > > > Could you update to Biopython CVS please? This now populates the > taxon/taxon_name tables differently when there is an NCBI taxon ID (with the > option to fetch lineages from Entrez). > > Once you're running CVS, could you attach a patch to this bug. That should > make it easier for me to look at this. OK, so a clean install from CVS seemed to do the trick and now both tests pass from within the test suite after applying the patch (attached) [cymon at chara Tests]$ cat setup_BioSQL.py |grep "DBDRIVER \=" 16:#DBDRIVER = 'MySQLdb' 19:DBDRIVER = 'psycopg2' [cymon at chara Tests]$ python run_tests.py test_BioSQL.py test_BioSQL_SeqIO.py test_BioSQL ... ok test_BioSQL_SeqIO ... ok ---------------------------------------------------------------------- Ran 2 tests in 32.882s OK (NB: the buglet in preserving qualifier order - sorted the lists in test_BioSQL_SeqIO.py in order to pass test). Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From lpritc at scri.ac.uk Wed Oct 15 12:57:37 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 13:57:37 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> Message-ID: On 15/10/2008 12:41, "Peter" wrote: > Using pylab.scatter is still a bit slow, but it does actually work. I > was wondering if this dot-plot code were to use reportlab instead, > would it make a sensible addition to the Bio.Graphics module? I'd welcome it as an addition there. Maybe there are other small functions of convenience that might find a home there? A graphical rendering of a BLAST record, for example... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Wed Oct 15 13:42:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 09:42:55 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810151342.m9FDgtc3022442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 09:42 EST ------- (In reply to comment #5) > OK, so a clean install from CVS seemed to do the trick and now both tests pass > from within the test suite after applying the patch (attached) OK, good. I guess you had CVS tests against old Biopython or something like that happening. I've had a quick look at the patch - it looks fine to me, but I have not actually tested it. In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being required for older releases of psycopg - do we still need this for psycopg2? I guess it doesn't hurt. > (NB: the buglet in preserving qualifier order - sorted the lists in > test_BioSQL_SeqIO.py in order to pass test). I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py (conditional on the database driver) with a "TODO" comment next to it about checking why the order wasn't preserved. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 15 14:09:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 15:09:30 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: References: <320fb6e00810150441t23250eeeqe44bb07cc6480595@mail.gmail.com> Message-ID: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> On Wed, Oct 15, 2008 at 1:57 PM, Leighton Pritchard wrote: > > On 15/10/2008 12:41, Peter wrote: > >> Using pylab.scatter is still a bit slow, but it does actually work. I >> was wondering if this dot-plot code were to use reportlab instead, >> would it make a sensible addition to the Bio.Graphics module? > > I'd welcome it as an addition there. Maybe there are other small functions > of convenience that might find a home there? A graphical rendering of a > BLAST record, for example... Sequence logos are another obvious little addition. Peter From lpritc at scri.ac.uk Wed Oct 15 14:16:23 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 15 Oct 2008 15:16:23 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> Message-ID: On 15/10/2008 15:09, "Peter" wrote: > On Wed, Oct 15, 2008 at 1:57 PM, Leighton Pritchard wrote: >> >> On 15/10/2008 12:41, Peter wrote: >> >>> Using pylab.scatter is still a bit slow, but it does actually work. I >>> was wondering if this dot-plot code were to use reportlab instead, >>> would it make a sensible addition to the Bio.Graphics module? >> >> I'd welcome it as an addition there. Maybe there are other small functions >> of convenience that might find a home there? A graphical rendering of a >> BLAST record, for example... > > Sequence logos are another obvious little addition. Also eyecharts (like sequence logos, but in a grid), and graphical rendering of HMMs as state diagrams could be useful. There's some Python code for rendering logos at http://code.google.com/p/weblogo/ - maybe they'd like to contribute, or the code could be adapted? L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Wed Oct 15 14:42:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 15 Oct 2008 15:42:52 +0100 Subject: [Biopython-dev] Biopython with python 2.6 on Windows Message-ID: <320fb6e00810150742s44a0eacdm4e50cbbcc7d560c0@mail.gmail.com> Has anyone been able to try out Biopython CVS with python 2.6 on Windows? I don't think ANY version of numpy is available pre-compiled for python 2.6 on Windows yet, so we can't easily try the numpy dependent parts of Biopython. However, checking everything without a numpy and/or C dependency should be fairly straightforward... Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 15 16:39:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:39:18 -0400 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200810151639.m9FGdILQ008553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 12:39 EST ------- Until recently we supported only: align[r] gives a row as a SeqRecord Updated in CVS to support row-slicing: align[start:end:step] gives a new (sub)alignment e.g. align[1:5] - new four row sub-alignment align[::2] - sub alignment using every second row align[:] - makes a copy align[::-1] - makes a copy with the row order reversed The current implementation could be improved after fixing enhancement Bug 2554. This leaves the door open for double indexes as previously outlined (blocking on Bug 2507). -- In reply to Jose's comment 1 and comment 2, this really is a complete replacement for the current alignment object, and would be better off on a separate bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 16:43:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 12:43:30 -0400 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200810151643.m9FGhUZM009070@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 12:43 EST ------- Hi Joel, Did you get any reply from the NCBI on this issue? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 15 17:21:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 15 Oct 2008 13:21:03 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810151721.m9FHL3Xc012897@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-15 13:21 EST ------- Checking in Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.46; previous revision: 1.45 done Checking in ../DEPRECATED; /home/repository/biopython/biopython/DEPRECATED,v <-- DEPRECATED new revision: 1.31; previous revision: 1.30 done The Seq object's .data is now a new style property and will issue a warning if written to. We can then easily make this into a read only property for the next release (and perhaps make even reading the property trigger a warning). If we do keep the MutableSeq's data property as read/write, it should check the alphabet if Bug 2597 is fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From robert.cadena at gmail.com Wed Oct 15 17:32:59 2008 From: robert.cadena at gmail.com (Robert Cadena) Date: Wed, 15 Oct 2008 10:32:59 -0700 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: References: <320fb6e00810150324j4fd20253i86c0001cfb143bfd@mail.gmail.com> Message-ID: I'd been working on and off on writing clones of bioperl's and bioruby's graphics libraries: http://www.bioperl.org/wiki/HOWTO:Graphics http://bio-graphics.rubyforge.org/ I have very rudimentary drawing of tracks and vertically shifting subfeatures as can be seen here: http://machine501.com/images/bio_graphics_test_1.jpg The prep code to draw that is very similar to the bioruby example: -- p = Panel(100, start=10, width=480, pad_left=10, pad_right=10) generic_track = p.add_track('generic', glyph=GenericGlyph, label="Constant") directed_track = p.add_track('directed', glyph=DirectedBoxGlyph, label="Variable Test") generic_track.add_feature(SeqFeature(FeatureLocation(250, 375), 'clone1')) generic_track.add_feature(SeqFeature(FeatureLocation(54, 124), 'clone2')) --- I'd be happy to volunteer some time to help with GenomeDiagram and maybe there's the possibility of incorporating the bit of code I have to create a bioperl::graphics library clone. thanks. /r On Wed, Oct 15, 2008 at 3:39 AM, Leighton Pritchard wrote: > On 15/10/2008 11:24, "Peter" wrote: > >> On Wed, Oct 15, 2008 at 11:00 AM, Leighton Pritchard >> wrote: >>> Hi, >>> >>> A while ago I wrote the GenomeDiagram library for drawing images of genomes >>> and other large biological sequences, and collections of sequences >>> (http://bioinf.scri.ac.uk/lp/programs.php#genomediagram). ... >>> Now that I have managed to free up a little bit of time I'd like to revisit >>> GenomeDiagram, tidy up the internals some more ..., and contribute it >>> to Bio.Graphics .. >>> I suggested something doing similar a while ago and got no feedback - does >>> anyone object to this contribution, in principle or in practice? Or are >>> there any other comments? I'm all (well, mostly) ears... > >> One small thing I would change is switching colour to color for the >> argument/properties (the American spelling of color is the norm in all >> programming usage). > > Fair point - I'll do that. Though, like those pesky Canadians (see > Maple), I'm inclined to permit either spelling out of sheer > bloody-mindedness ;). Historically (rather than etymologically), it's a > holdover from working with interim EMBL-ish .tab files from Sanger, which > use the British English spelling: > > """ > FT /class="3.1.03" > FT /colour=7 > FT /gene="asnA" > """ > > Would people see permitting either form of colour/color as potentially > confusing? If so, I'm happy to go with the majority spelling. > >> Anyone using the existing stand alone >> GenomeDiagram library would have to make some small changes anyway >> (new import statements), so if there are going to be any other API >> changes it would be best to do them at the same time. > > I agree. I see this as a break from the standalone library, and would be > branching this version of GenomeDiagram from what had gone before. While > I'd like to make API changes as low-impact as possible, I'm in favour of > such changes where they support functional improvement. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by > guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views > expressed by the sender are not necessarily the views of SCRI and its > subsidiaries. This email and any files transmitted with it are > confidential > > to the intended recipient at the e-mail address to which it has been > addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this > > confidentiality and you must not use, disclose, copy, print or rely on > this > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are > present in this email, neither the Institute nor the sender accepts any > responsibility for any viruses, and it is your responsibility to scan > the email and the attachments (if any). > ______________________________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From lpritc at scri.ac.uk Thu Oct 16 09:17:57 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 16 Oct 2008 10:17:57 +0100 Subject: [Biopython-dev] Bio.Graphics and GenomeDiagram In-Reply-To: Message-ID: On 15/10/2008 18:32, "Robert Cadena" wrote: > I'd been working on and off on writing clones of bioperl's and > bioruby's graphics libraries: > http://www.bioperl.org/wiki/HOWTO:Graphics > http://bio-graphics.rubyforge.org/ [...] > maybe there's the possibility of incorporating the bit of code I have > to create a bioperl::graphics library clone. I would see GenomeDiagram as existing alongside a Bioperl::Graphics clone, providing extra functionality that (for now) is not present in Bioperl/Bioruby, so I don't see our approaches clashing on that level. > I'd be happy to volunteer some time to help with GenomeDiagram Thanks Robert, that's very welcome. The way I would like to move forward is to branch code off from the current version of GenomeDiagram, to make it work as though it's part of Biopython (sitting under Bio.Graphics), then neaten up the internals, add unit tests and documentation, before adding enhancement features/fixing a couple of outstanding issues. You can get a copy of the current GenomeDiagram code and documentation at http://bioinf.scri.ac.uk/lp/programs.php, and I'm happy to field design questions/comments either here or off-list. Initially I had thought to handle the first stages of this (up to and including neatening internals) myself before seeking code inclusion in Biopython, as I have an informal plan for what needs to be done, already. I'm open to advice and suggestions - including "I can do all that, if you like" - though ;) Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Thu Oct 16 09:53:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 05:53:50 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810160953.m9G9roWp029842@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #7 from cymon.cox at gmail.com 2008-10-16 05:53 EST ------- (In reply to comment #6) > (In reply to comment #5) > > OK, so a clean install from CVS seemed to do the trick and now both tests pass > > from within the test suite after applying the patch (attached) > > OK, good. I guess you had CVS tests against old Biopython or something like > that happening. > > I've had a quick look at the patch - it looks fine to me, but I have not > actually tested it. > > In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being > required for older releases of psycopg - do we still need this for psycopg2? Apparently not (tests run after editing out code...). > I > guess it doesn't hurt. I assume we want to maintain support for the older psycopg driver. Unfortunately, I cant get the old drivers to compile on my box - they configure, but gcc chokes on the make... I cant see why the patch should effect the operation of the old driver, but it would be worth checking. > > (NB: the buglet in preserving qualifier order - sorted the lists in > > test_BioSQL_SeqIO.py in order to pass test). > > I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py > (conditional on the database driver) with a "TODO" comment next to it about > checking why the order wasn't preserved. Unfortunately, I dont have direct CVS access - proxy hassles - I downloaded the cvs tarball previously. Besides I'm not a developer. Perhaps, someone else would check it in. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 10:18:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 06:18:28 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810161018.m9GAISK8031655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-16 06:18 EST ------- (In reply to comment #7) > (In reply to comment #6) > > > > In BioSQL/BioSeqDatabase.py there is a comment about building a DSN being > > required for older releases of psycopg - do we still need this for psycopg2? > > Apparently not (tests run after editing out code...). > I assume we want to maintain support for the older psycopg driver. > Unfortunately, I cant get the old drivers to compile on my box - they > configure, but gcc chokes on the make... I cant see why the patch should > effect the operation of the old driver, but it would be worth checking. OK - but I'll leave it in rather than risk breaking the old psycopg driver. > > > (NB: the buglet in preserving qualifier order - sorted the lists in > > > test_BioSQL_SeqIO.py in order to pass test). > > > > I suggest you check in this patch, plus the tweak to test_BioSQL_SeqIO.py > > (conditional on the database driver) with a "TODO" comment next to it > > about checking why the order wasn't preserved. > > Unfortunately, I dont have direct CVS access - proxy hassles - I > downloaded the cvs tarball previously. Besides I'm not a developer. > Perhaps, someone else would check it in. Sorry - I knew Frank had CVS access and had assumed you did too. If you think you'll need CVS access, send an email on the dev-list. In the meantime, I'm happy to check this in on your behalf. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 10:57:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 06:57:52 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810161057.m9GAvqRp001465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #9 from cymon.cox at gmail.com 2008-10-16 06:57 EST ------- (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6)In the meantime, I'm > happy to check this in on your behalf. Yes, please do - thanks Peter. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 15:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 11:57:00 -0400 Subject: [Biopython-dev] [Bug 2618] New: back_translate method for the Seq object (in Bio.Seq)? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2618 Summary: back_translate method for the Seq object (in Bio.Seq)? Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Should we add a back_translate method to the Seq object (mirroring the translate method added on Bug 2381)? Mailing list discussion: http://lists.open-bio.org/pipermail/biopython/2008-October/004588.html Issues include how to cope with the ambiguous nature of the genetic code, e.g. "P" -> "CCT" or "CCN"? What about "L" -> "CTN" versus "TTR" or other options? Possible implementation to follow as a patch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 16 16:09:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Oct 2008 12:09:23 -0400 Subject: [Biopython-dev] [Bug 2618] back_translate method for the Seq object (in Bio.Seq)? In-Reply-To: Message-ID: <200810161609.m9GG9NSB032767@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2618 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-16 12:09 EST ------- Created an attachment (id=1009) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1009&action=view) Patch to Bio/Seq.py for back translation This follows Bio.Translate and simply uses whatever arbitrary unambiguous codon Bio.Data.CodonTable object supplies via its back_table. e.g. >>> from Bio import Seq >>> Seq("ACBDEF*").back_translate() Seq('GCUUGUNNNGAUGAGUUUUAA', IUPACAmbiguousRNA()) >>> Seq("ACBDEF*").back_translate().translate() Seq('ACXDEF*', HasStopCodon(ExtendedIUPACProtein(), '*')) If instead we want to return ambiguous codons (e.g. "P" -> "CCN"), then handling of back-transcription of special cases B (R/N) and J (I/L) or Z (E/Q) could also be improved (here just "NNN" is used). e.g. For the standard table, "SAR" codes for "Z". I haven't checked if something this is possible for all B, J and Z for all NCBI codon tables. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Oct 17 11:54:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 12:54:13 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? Message-ID: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> I was wondering what benefits we would see by dropping support for Python 2.3 after the next release (or next couple of releases?). Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still be a fair number of people out there still interested in using Biopython on Python 2.3 (in addition to my own current Windows development machine). Before making any plans to drop Python 2.3 support we should canvas the main mailing list. See http://docs.python.org/dev/whatsnew/2.4.html There are two additions in python 2.4 which are interesting in regards to supporting 2.6, PEP 324: New subprocess Module http://www.python.org/dev/peps/pep-0324/ PEP 218 : PEP 218: Built-In Set Objects http://www.python.org/dev/peps/pep-0218/ In python 2.6, popen2 and os.popen3 etc are deprecated (so we need subprocess instead) and the sets module is deprecated (so we need the builtin set and frozenset). Most of Biopython now handles this gracefully with a import try/except handler. Once we drop python 2.3, these become slightly cleaner, but this in itself isn't a compelling reason. There are a couple more things I thought would be useful - but nothing pressing, e.g. PEP 289: Generator Expressions http://www.python.org/dev/peps/pep-0289/ There are a couple of places in the code where I have wanted to use a generator expressions, but have fallen back on a list comprehension or a generator function for Python 2.3 compatibility. PEP 318: Decorators for Functions and Methods http://www.python.org/dev/peps/pep-0318/ Again, decorators could be useful but I am not aware of any pressing need for their functionality in Biopython. Peter From biopython at maubp.freeserve.co.uk Fri Oct 17 14:15:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 15:15:57 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> Message-ID: <320fb6e00810170715i62b38308p2fdae9465be8bc05@mail.gmail.com> On Fri, Oct 17, 2008 at 12:54 PM, Peter wrote: > I was wondering what benefits we would see by dropping support for > Python 2.3 after the next release (or next couple of releases?). > ... > See http://docs.python.org/dev/whatsnew/2.4.html One other pretty trivial thing is the string object gained the rsplit method in python 2.4 (while partition and rpartition are in Python 2.5+). I've updated the Seq object's new rsplit method accordingly. Peter From bsouthey at gmail.com Fri Oct 17 14:19:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 17 Oct 2008 09:19:06 -0500 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> Message-ID: <48F89EDA.70307@gmail.com> Peter wrote: > I was wondering what benefits we would see by dropping support for > Python 2.3 after the next release (or next couple of releases?). > Support for Numpy 1.2 as I suspect that most people would have (or should have) upgraded to 2.4 for bug and performance gains. I have not looked at the major Linux distros like Fedora and Ubuntu to know when these dropped Python 2.3 for the standard Python install. But I also must add that there is no numpy Windows binary installation for Python 2.6 and does not seem likely to be an official one in the near future (technical issues with regards to the official Windows binary for Python 2.6). > Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still > be a fair number of people out there still interested in using > Biopython on Python 2.3 (in addition to my own current Windows > development machine). Before making any plans to drop Python 2.3 > support we should canvas the main mailing list. > Also some of the older Red Hat / Centos systems still run it - joys of these long-term releases. How many bug reports are with Python 2.3 from people with an interest in Python 2.3 not just testing it? To me the issue is about supporting different versions in the medium term (5 years) given that NumPy and Biopython will have been rewritten for Python 3.0 and most people will be using Python 3.0. I think that if the burden is too great to support a Python version it should be officially dropped. Of course any criteria bug or feature can be backported to earlier versions if requested. I would recommend that this starts a new minor version i.e 1.5 so it is clear that Biopython 1.5+ is Python 2.4+ only. (I also note the recent changes in the cvs that would justify this anyhow.) Bruce From biopython at maubp.freeserve.co.uk Fri Oct 17 14:58:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 17 Oct 2008 15:58:40 +0100 Subject: [Biopython-dev] What would we gain by dropping python 2.3? In-Reply-To: <48F89EDA.70307@gmail.com> References: <320fb6e00810170454v6ed86d88se8252abb2c2ca57@mail.gmail.com> <48F89EDA.70307@gmail.com> Message-ID: <320fb6e00810170758n62815862i970498087d8dfacc@mail.gmail.com> >> I was wondering what benefits we would see by dropping support for >> Python 2.3 after the next release (or next couple of releases?). > > Support for Numpy 1.2 ... We've tested Biopython CVS works on python 2.3, 2.4, 2.5, and are almost ready for 2.6. We've also tested Biopython CVS works on Numpy 1.0, 1.1 and 1.2. The fact that Numpy 1.2 requires Python 2.4+ isn't really linked to weather or not Biopython continues to work on Python 2.3 > I have not looked at the major Linux distros like Fedora and Ubuntu to know > when these dropped Python 2.3 for the standard Python install. According to http://packages.ubuntu.com/intrepid/python and linked pages, Ubuntu hardy comes with Python 2.3 (very old) Ubuntu dapper comes with Python 2.4 (pretty old) Ubuntu gutsy, feisty and intrepid come with Python 2.5 > But I also must add that there is no numpy Windows binary installation for > Python 2.6 and does not seem likely to be an official one in the near future > (technical issues with regards to the official Windows binary for Python > 2.6). I've been keeping an eye on the numpy list and that is rather disappointing news - hopefully they can resolve this shortly and maybe there will be a numpy 1.2.x release for Windows. >> Note that Mac OS X 10.4 Tiger uses Python 2.3.5, so there could still >> be a fair number of people out there still interested in using >> Biopython on Python 2.3 (in addition to my own current Windows >> development machine). Before making any plans to drop Python 2.3 >> support we should canvas the main mailing list. > > Also some of the older Red Hat / Centos systems still run it - joys of these > long-term releases. Yes - this is why I am loath to just drop python 2.3 support without some benefits. Some of the linux machine I have access to at work still run python 2.3 for example. > How many bug reports are with Python 2.3 from people with an interest in > Python 2.3 not just testing it? Our Bugzilla doesn't track the python version, so we can't easily work that out. > To me the issue is about supporting different versions in the medium term (5 > years) given that NumPy and Biopython will have been rewritten for Python > 3.0 and most people will be using Python 3.0. I think that if the burden is > too great to support a Python version it should be officially dropped. Of > course any criteria bug or feature can be backported to earlier versions if > requested. > > I would recommend that this starts a new minor version i.e 1.5 so it is > clear that Biopython 1.5+ is Python 2.4+ only. Biopython doesn't currently have minor version numbers. On a related note, perhaps doing the first numpy supporting release as Biopython 1.50 rather than 1.49 would be more memorable / eye pleasing. > (I also note the recent changes in the cvs that would justify this anyhow.) Did you mean justify a version number bump, or justify dropping python 2.3? With hind sight, trying to support both Python 2.3 and Python 2.6 was more work than I expected - but I think its done now (apart from Bio.PDB.NACCESS). If Python 2.7 makes a similar volume of deprecations needing similar workarounds for Python 2.3, then we may have more of an incentive to drop Python 2.3. We've seen some of the drawbacks to continuing to support old python 2.3 while avoiding deprecation warnings in Python 2.6, but what I wanted to hear was ideas on how any of the newer language features added in python 2.4 could be useful (in the short to medium term). Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 17 15:24:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 11:24:47 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200810171524.m9HFOlh9004587@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 11:24 EST ------- (In reply to comment #37) > Thank you. Confirming that CVS version 1.82 of the file works fine on Windows > XP SP2 with Python 2.5.2. Great - marking this bug as fixed. > A note: > > A custom script using Bio/Blast can appear to hang, and the results file > truncated, if the 'error handle' is used before the 'result handle': > > res_hdl, err_hdl = NCBIStandalone.blastall(my_blast, 'blastn', my_db, my_seq) > > # OK > my_result = res_hdl.read() > my_error = err_hdl.read() > > # Not OK > my_error = err_hdl.read() > my_result = res_hdl.read() This is known and mentioned in the tutorial: >> The error info can be hard to deal with, because if you try >> to do a error_handle.read() and there was no error info >> returned, then the read() call will block and not return, >> locking your script. In my opinion, the best way to deal >> with the error is only to print it out if you are not >> getting result_handle results to be parsed, but otherwise >> to leave it alone. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 17 15:59:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 11:59:12 -0400 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200810171559.m9HFxC9r007553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 11:59 EST ------- (In reply to comment #9) > Yes, please do - thanks Peter. > > Cheers, C. OK, your patch is now in CVS: Checking in BioSeqDatabase.py; /home/repository/biopython/biopython/BioSQL/BioSeqDatabase.py,v <-- BioSeqDatabase.py new revision: 1.20; previous revision: 1.19 done Checking in DBUtils.py; /home/repository/biopython/biopython/BioSQL/DBUtils.py,v <-- DBUtils.py new revision: 1.8; previous revision: 1.7 done We still need to sort out the feature qualifiers loss of ordering... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 17 17:19:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Oct 2008 13:19:23 -0400 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200810171719.m9HHJNVZ014015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #853 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-17 13:19 EST ------- (From update of attachment 853) Something similar is now in CVS (covering both the Bio.SeqIO and Bio.AlignIO modules). I still need to extend the unit tests and update the documentation accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Oct 18 19:31:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Oct 2008 15:31:40 -0400 Subject: [Biopython-dev] [Bug 2619] New: Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2619 Summary: Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: cjoldfield at gmail.com MMCIFParser is a documented feature of Bio.PDB, but it is broken by default because the MMCIFlex build is commented out in the distribution setup.py. According to http://osdir.com/ml/python.bio.devel/2006-02/msg00038.html this is because it doesn't compile on Windows. Though the function is documented, the changes need to enable are not, so this seems like an installation bug to me. The fix on linux is to uncomment setup.py lines 486 on. A general work around might be to condition the compile on the os.sys.platform variable. I'd offer a diff, but I'm new to biopython and python in general, so please forgive my ignorance. Source install of version 1.48, gentoo linux 2008, x86_64. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 12:46:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 08:46:43 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810191246.m9JCkhm6030332@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-19 08:46 EST ------- http://lists.open-bio.org/pipermail/biopython/2006-February/002923.html Michiel wrote: > This is a recurring problem and is not limited to > Windows, but to any machine without flex installed. Certainly, as things stood back in Feb 2006, getting Bio.PDB.mmCIF.MMCIFlex to compile on Windows was tricky (or impossible). However, even on Linux/Mac we really need to be able to check if flex is installed without blindly trying to compile it. A non-flex version would be another option - something Thomas didn't have the time or inclination to tackle. In the short term, a note in the documentation would help... were you refering to "The Biopython Structural Bioinformatics FAQ"? http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 16:02:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 12:02:55 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810191602.m9JG2tGJ010540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #19 from mmokrejs at ribosome.natur.cuni.cz 2008-10-19 12:02 EST ------- (In reply to comment #18) > e.g. As I wrote in comment 17, > > I'm thinking we could also support "start" and "end" optional arguments > > (named > > after those used in the python string methods, and behaving in the same way) > > for specifying a sub-sequence to be translated. Using start=0, 1 or 2 would > > give the three forward reading frames. > > This would give an alternative to: > > my_seq[i:j].translate(table) > > as: > > my_seq.translate(table, start=i, end=j) > > As with the python string methods, potentially the implementation could be > slightly faster as a new Seq object doesn't need to be created for the slice. > On the other hand, it does then offer two ways of doing the same thing. The second approach would be I think often handy. > > An optional boolean argument could enable treating the sequence as a CDS - > > verifying it starts with a start codon (which would always be translated as M) > > and verifying it ends with a stop codon (with no other stop codons in frame), > > which would not be translated. Following BioPerl, this argument could be > > called "complete". The "complete" is a cryptic naming, I wouldn't be fond of it. I think everybody would rather him/herself rather check is a.startswith('M') and a.endswith('*') instead. But, what would be useful is a.find_orf(offset=0). > > Related to this, it would be useful to have a boolean option to stop > translation at the first in frame stop codon (possible argument names for this > include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). Yes, find_orf(offset) with default offset=0. I hope there always will be a way to get translate whole NA sequence into prot residues in a desired frame so one could inspect the positions of various STOP codons, etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 19 16:06:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Oct 2008 12:06:24 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810191606.m9JG6OmD010729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #20 from mmokrejs at ribosome.natur.cuni.cz 2008-10-19 12:06 EST ------- (In reply to comment #17) > def translate(self, table = "Standard", stop_symbol = "*"): > """Terms a nucleotide sequence into a protein sequence (amino acids). > > This method will translate DNA or RNA sequences, but for a protein > sequence an exception is raised. > > table - Which codon table to use? This can be either a name > (string) or an NCBI identifier (integer). Would be nice to document a URL to a page documenting the translation tables in the doc string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 08:24:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 04:24:25 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200824.m9K8OPf0029113@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #21 from lpritc at scri.sari.ac.uk 2008-10-20 04:24 EST ------- (In reply to comment #19) > (In reply to comment #18) > The "complete" is a cryptic naming, I wouldn't be fond of it. I think everybody > would rather him/herself rather check is a.startswith('M') and a.endswith('*') > instead. But, what would be useful is a.find_orf(offset=0). Ditto the 'complete' naming - it's not clear at all. > > Related to this, it would be useful to have a boolean option to stop > > translation at the first in frame stop codon (possible argument names for this > > include "stop" if not used as above, "to_stop", "auto_stop", "terminate" etc). > > Yes, find_orf(offset) with default offset=0. I would like to raise the issue that 'ORF' has taken on (at least) two meanings over the years, and it's not yet clear which is being discussed here. The correct definition of 'Open Reading Frame' is an uninterrupted sequence of nucleotides that do not contain an in-frame stop codon. However, more restrictive definitions have found a way in erroneously over the years, asserting that the sequence must have an in-frame start codon, or additionally that the ORF begins at that start codon. This latter case in particular would be a putative coding sequence (CDS), rather than an ORF. See a Google define: orf search for details... (http://www.google.com/search?q=define:+orf). As an implementation examply, Sanger's Artemis (http://www.sanger.ac.uk/Software/Artemis/) correctly identifies ORFs. See also Doolittle's 'Of URFS and ORFS', available on Google Books: http://books.google.com/books?id=jIlMMx6Ji-sC - it's 22 years old now, and a good candidate for the first manual on bioinformatics. The Wikipedia page for ORF is typically egregious, and also incorrect. Also, by 'offset' in the proposed syntax above, is 'reading_frame' intended? If so I think it would be clearer to indicate that the reading frame is what is desired, as specifying a reading frame of -1 implies something different to an offset of -1. I propose that the default behaviour is to find all ORFs in all reading frames, leaving it to the user to decide whether that behaviour is appropriate for their sequence and optionally specify a reading frame. For discussion purposes, I'm attaching code for an ORF search I implemented locally in a subclass of the Seq object. As ever, I don't claim that it's perfect, but it did what I needed at the time. In particular the returned index for ORFs is 1-based, as that is what I wanted then. def find_ORFs(self, codon_table=1, min_length=100): """ find_ORFs(self, codon_table=1, min_length=100) codon_table Integer, must be one of the integers in Bio.Data.CodonTable.generic_by_id; these are the standard codon table numbers used by sequence databases. min_length Integer, the shortest length of consecutive nucleotides to consider as an ORF Finds ORFs within the SeqRecord sequence, and returns them as a list of tuples in the format: (frame, start, end, sequence) where start and end are the start and end points on the sequence (i.e. the first and last base positions, NOT the values you should use when indexing sequences in Python), and sequence is a Seq object. """ assert self.alphabet.__class__ in dna_alphabets, \ "Alphabet is not a known DNA alphabet" # Get the codon table; raises a KeyError if an invalid table number codon_table = CodonTable.generic_by_id[codon_table] # Loop over the record's sequence in all six forward and reverse # frames, returning a list of (frame, start, end, sequence) tuples # List of tuples orflist = [] # Forward frames first forward_orfs = self.__find_orfs_in_sequence(self.data, codon_table) for frame, start, end, sequence in forward_orfs: if len(sequence) >= min_length: orflist.append(('+%d' % frame, start, end, Seq(sequence, self.alphabet))) # Then reverse frames seq = reverse_complement(self.data) reverse_orfs = self.__find_orfs_in_sequence(seq, codon_table) for frame, start, end, sequence in reverse_orfs: if len(sequence) >= min_length: start = len(self.data) - start + 1 end = len(self.data) - end + 1 start, end = end, start orflist.append(('-%d' % frame, start, end, Seq(sequence, self.alphabet))) return orflist def __find_orfs_in_sequence(self, sequence, codon_table): """ Returns a list of ORFs for a passed sequence, in three forward frames, as tuples (frame, start, end, sequence) """ orflist = [] for frame, offset in [(1, 0), (2, 1), (3, 2)]: tmporf = [] orfstart = offset i = offset while i < len(sequence): codon = sequence[i:i+3] if len(codon) == 3 and codon not in codon_table.stop_codons: tmporf.append(codon) else: if codon in codon_table.stop_codons: tmporf.append(codon) tmporf = ''.join(tmporf) orflist.append((frame, orfstart+1, orfstart+len(tmporf), tmporf)) orfstart += len(tmporf) tmporf = [] i += 3 # Catch ORFs that run up to the end of the sequence, by checking # for an empty tmporf list if tmporf != []: tmporf = ''.join(tmporf) orflist.append((frame, orfstart+1, orfstart+len(tmporf), tmporf)) return orflist In order to obtain a potential coding sequence that begins with a methionine, I would translate, and then use this method in a subclass of Seq for the translated sequence: def trim_to_first_met(self): """ Assuming that the sequence is a protein sequence, trims to the first methionine in the sequence and returns a Seq object If the sequence has no methionine, then the full sequence is returned """ # Crop the sequence to the first Methionine. If there is no methionine # the full sequence is returned # We assert that we have a protein sequence assert self.alphabet.__class__ in protein_alphabets, \ "Sequence alphabet is not a known ProteinAlphabet" if self.data.count('M'): seq = self.data[self.data.index('M'):] else: seq = self.data return Seq(seq, self.alphabet) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 09:14:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 05:14:42 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200914.m9K9EgDv031522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-20 05:14 EST ------- Martin wrote in comment #19: > Peter wrote in comment #18: > > > e.g. As I wrote in comment 17, > > > I'm thinking we could also support "start" and "end" optional arguments > > > (named after those used in the python string methods, and behaving in > > > the same way) for specifying a sub-sequence to be translated. Using > > > start=0, 1 or 2 would give the three forward reading frames. > > > > This would give an alternative to: > > > > my_seq[i:j].translate(table) > > > > as: > > > > my_seq.translate(table, start=i, end=j) > > > > As with the python string methods, potentially the implementation could > > be slightly faster as a new Seq object doesn't need to be created for > > the slice. On the other hand, it does then offer two ways of doing the > > same thing. > > The second approach would be I think often handy. If we did add this, then arguably we should do this for all the other methods too (transcribe, reverse_complement, etc). I'm not convinced this adds any value. Martin, why do you like the second approach (using start & end arguments) over the first (slicing the sequence before translation)? ------------------------------------------------------ Using BioPerl's idea of a "complete" argument (boolean) isn't popular: Martin wrote in comment #19 >> >> The "complete" is a cryptic naming, I wouldn't be fond of it... >> Leighton wrote in comment #21 > > Ditto the 'complete' naming - it's not clear at all. > This was to control two related features: (a) Validate the first codon is a valid start codon, and translate it as M (even if going on the genetic code it would normally be say L). This should be a boolean argument defaulting to False, possible names "start", "check_start", "from_start", ... Variations on this like "find the first in frame start codon" are getting into gene/ORF finding and I don't see this are part of the remit for a translate method. (b) Stop translating at the first in frame stop codon (see my comment 18). Again, a boolean argument, and for compatibility with previous Biopython conventions, defaulting to False (i.e. read through). Possible names "stop", "to_stop", "auto_stop", "terminate", ... In this case, how should the method behave if there is no final stop codon - raise an error or not? Also should the stop codon be included in the returned sequence (note that the Bio.Translate module did not include the stop symbol). You might want to control these two options independently, so having them as two arguments is more flexible. ------------------------------------------------------ This bug has started discussing ORF/gene finding - I see this as separate to the translate method. Could we do this on the mailing list or a separate bug please? ------------------------------------------------------ Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 09:48:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 05:48:16 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810200948.m9K9mGVo001679@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #23 from lpritc at scri.sari.ac.uk 2008-10-20 05:48 EST ------- (In reply to comment #22) > (a) Validate the first codon is a valid start codon, and translate it as M > (even if going on the genetic code it would normally be say L). This should be > a boolean argument defaulting to False, possible names "start", "check_start", > "from_start", ... > (b) Stop translating at the first in frame stop codon (see my comment 18). > Again, a boolean argument, and for compatibility with previous Biopython > conventions, defaulting to False (i.e. read through). Possible names "stop", > "to_stop", "auto_stop", "terminate", ... [...] > In this case, how should the method behave if there is no final stop codon - > raise an error or not? Also should the stop codon be included in the returned > sequence (note that the Bio.Translate module did not include the stop symbol). > > You might want to control these two options independently, so having them as > two arguments is more flexible. Further to the above (and keeping away from ORF-finding) another use-case would be translation of ESTs, which may come with or without either a start or a stop codon. Often I am handed compilations of EST sets that have been obtained using different experimental methods, and are not consistently 3` or 5` sequenced (nor, to be fair, are they uniformly in the correct orientation...), and in those cases I would wish to translate the entire sequence without regard to the presence of a start or stop codon (really I'd like to find ORFs, but I promised I'd keep away from that, for now ;) ). I would prefer that default behaviour did not enforce either a start or stop codon check, but that each of these could be optional arguments. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 20 13:36:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 20 Oct 2008 09:36:56 -0400 Subject: [Biopython-dev] [Bug 2509] Deprecating the .data property of the Seq and MutableSeq objects In-Reply-To: Message-ID: <200810201336.m9KDauP6014867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2509 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #929 is|0 |1 obsolete| | ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-20 09:36 EST ------- (From update of attachment 929) This patch is now obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 11:28:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 07:28:57 -0400 Subject: [Biopython-dev] [Bug 2622] New: Parsing between position locations like 5933^5934 in GenBank/EMBL files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2622 Summary: Parsing between position locations like 5933^5934 in GenBank/EMBL files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk GenBank and EMBL files can contain features with locations like 123^456, handled in Biopython as BetweenPosition objects. Quoting ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt > A site between two residues, such as an endonuclease cleavage site, is > indicated by listing the two bases separated by a carat (e.g., 23^24). A small GenBank example containing examples of this is NC_005816.gbk available here: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Yersinia_pestis_biovar_Microtus_91001/NC_005816.gbk e.g. variation 5933^5934 /note="compared to AL109969" /replace="a" variation 5933^5934 /note="compared to AF053945" /replace="aa" For a larger example, see NC_005027.gbk ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Pirellula_sp/NC_005027.gbk e.g. misc_feature 41855^41856 /note="cosmid pircos-a3a12/ cosmid pircos-a1d04 joining point" See also one of the Biopython unit test examples, SC10H5.embl, a pre-2006 style EMBL file from BioPerl. As the following example script and its output will show, Biopython CVS (and I presume several releases) does not parse these locations sensibly. There are at least two issues, firstly there is a numerical error from treating 5933^5934 as 5932^11866 (position versus extension) and secondly the representation of these locations might be better not using separate start/end objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 11:30:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 07:30:50 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810211130.m9LBUoE3032234@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-21 07:30 EST ------- Sample script showing the problem, from Bio import SeqIO #filename = "NC_005027.gbk" filename = "NC_005816.gbk" print "=" * 50 for line in open(filename) : if "^" in line : print line.rstrip() print "=" * 50 record = SeqIO.read(open(filename), "genbank") print record.id for feature in record.features : if "^" in str(feature.location) : print feature And its output: ================================================== variation 5933^5934 variation 5933^5934 variation 8529^8530 ================================================== NC_005816.1 type: variation location: [(5932^11866):(5932^11866)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['a'] type: variation location: [(5932^11866):(5932^11866)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AF053945'] Key: replace, Value: ['aa'] type: variation location: [(8528^17058):(8528^17058)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['tt'] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 12:07:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 08:07:01 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810211207.m9LC71c1002617@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-21 08:07 EST ------- Part of the problem is in Bio/GenBank/__init__.py around line 793, # case 4 -- we've got 100^101 elif isinstance(position, LocationParser.Between): final_pos = SeqFeature.BetweenPosition(position.low.val, position.high.val) # case 5 -- we've got (100.101) elif isinstance(position, LocationParser.TwoBound): final_pos = SeqFeature.WithinPosition(position.low.val, position.high.val) The BetweenPosition and WithinPosition objects expect the (low) position and the extension, not the low position and the high position. Thus instead: # case 4 -- we've got 100^101 => position 100, extension 1 elif isinstance(position, LocationParser.Between): final_pos = SeqFeature.BetweenPosition(position.low.val, position.high.val-position.low.val) # case 5 -- we've got (100.101) => position 100, extension 1 elif isinstance(position, LocationParser.TwoBound): final_pos = SeqFeature.WithinPosition(position.low.val, position.high.val-position.low.val) However, things still don't seem quite right with the SeqFeature.location object (even with this change) as the same object is used for both the start and end, which means both have zero-based locations: ================================================== variation 5933^5934 variation 5933^5934 variation 8529^8530 ================================================== NC_005816.1 type: variation location: [(5932^5933):(5932^5933)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['a'] type: variation location: [(5932^5933):(5932^5933)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AF053945'] Key: replace, Value: ['aa'] type: variation location: [(8528^8529):(8528^8529)] ref: None:None strand: 1 qualifiers: Key: note, Value: ['compared to AL109969'] Key: replace, Value: ['tt'] Note that a location string "5933..5934" (2bp) becomes in Biopython a typical range between two exact positions, representing the slice [5932:5934] (2bp). Perhaps locations like 5933^5934 (0bp) should be held similarly, akin to a slice [5933:5933] (0bp). e.g. for a sequence "ACTG...", a location string "2^3" means between "AC" and "TG...", or in python speak the empty slice [2:2] The GenBank release notes do say: > 3. A site between two bases; > ... > A site between two residues, such as an endonuclease cleavage site, is > indicated by listing the two bases separated by a carat (e.g., 23^24). I think they mean implicitly two neighbouring bases - after all "23^25" can just be written as "24" or "23^26" as "24..26". The need for the caret "23^25" is a result of the one-based counting system - avoided in python slice notation. Finally, it is not clear to me from the GenBank release notes if locations like "23^34" can be joined as part of more complex location, or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 21 17:42:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 21 Oct 2008 13:42:47 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810211742.m9LHglaY020907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #2 from cjoldfield at gmail.com 2008-10-21 13:42 EST ------- > In the short term, a note in the documentation would help... were you refering > to "The Biopython Structural Bioinformatics FAQ"? > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf The FAQ in part, but there is also a link from RCSB that claims BioPython can parse mmCIF: http://sw-tools.rcsb.org/ I've run the Bio.PDB mmCIF parser over all of PDB, and it plain fails on >10% of files (>40,000 files, >5,000 failures, mostly spurious missing key exceptions). From what I've seen, it seems that an inconsistency in one table of a mmCIF file throws a wrench in the whole parse. I tried the C++ mmCIF parser from ncbi (only on a few files so far) and it doesn't suffer these parse problems (though it reports the faulty entries). If Bio.PDB were to be updated, this seems like a good candidate for a back end (assuming its portable). I have the inclination, maybe not the time ;), to do this, unless this should fall to Thomas or others. Chris -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 08:51:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 04:51:33 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810220851.m9M8pXb3002091@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 04:51 EST ------- (In reply to comment #2) > > In the short term, a note in the documentation would help... were you > > refering to "The Biopython Structural Bioinformatics FAQ"? > > http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf > > The FAQ in part, but there is also a link from RCSB that claims BioPython can > parse mmCIF: > > http://sw-tools.rcsb.org/ I'll make some documentation updates along the lines of "The Bio.PDB mmCIF parser isn't installed by default due to cross platform compilation problems", and see if anyone on the dev mailing list has any bright ideas for detecting flex at install time. > I've run the Bio.PDB mmCIF parser over all of PDB, and it plain fails on >10% > of files (>40,000 files, >5,000 failures, mostly spurious missing key > exceptions). From what I've seen, it seems that an inconsistency in one table > of a mmCIF file throws a wrench in the whole parse. Would you mind reporting a separate bug on this (tiny sample script, the exception error(s), and URLs for a couple of the 5000+ failures)? > I tried the C++ mmCIF parser from ncbi (only on a few files so far) and it > doesn't suffer these parse problems (though it reports the faulty entries). Given the number of PDB problems Bio.PDB has to deal with, its sadly not surprising that mmCIF files also suffer from this kind of thing. > If Bio.PDB were to be updated, this seems like a good candidate for a back > end (assuming its portable). I have the inclination, maybe not the time ;), > to do this, unless this should fall to Thomas or others. I would worry about the cross platform support (in particular Windows), but also the additional complication to building Biopython. Its certainly worth discussing if you or Thomas are keen. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 22 09:32:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Oct 2008 10:32:44 +0100 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) Message-ID: <320fb6e00810220232if63772ejda05f6d5e692b24e@mail.gmail.com> Dear all, Back in Feb 2006 (shortly before Biopython 1.42), in CVS revision 1.109 setup.py was modified to comment out building & installation of the Bio.PDB.mmCIF module which requires flex to be installed. For background see: http://lists.open-bio.org/pipermail/biopython/2006-February/002923.html http://lists.open-bio.org/pipermail/biopython-dev/2006-February/002280.html This issue was recently re-opened with Bug 2619: http://bugzilla.open-bio.org/show_bug.cgi?id=2619 I looks like Bio.PDB.mmCIF didn't (and probably still doesn't) compile on Windows, but should compile on Unix provided flex is installed. Ideally setup.py would check the platform and if flex is installed, and if so install Bio.PDB.mmCIF - rather than the current situation never installing it (unless the user edits setup.py by hand). Alternatively, we could have a simple prompt (on Unix) asking if we should try and build/install Bio.PDB.mmCIF (like the ugly KDTree prompt when that was written in C++)? Does anyone have any code handy for checking if flex is installed from within python? Perhaps ideally we could replace the flex version of Bio.PDB.mmCIF with pure python - but this is a big job. Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 22 09:49:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 05:49:42 -0400 Subject: [Biopython-dev] [Bug 2619] Bio.PDB.MMCIFParser component MMCIFlex commented out in setup.py In-Reply-To: Message-ID: <200810220949.m9M9ngE0005805@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2619 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 05:49 EST ------- I've been looking at modifying setup.py, but need to be able to tell if flex is installed AND if its headers are installed (required to compile the mmCFIG code). The following is only a partial solution: def is_flex_installed(): """try and work out if flex (and its headers) are installed.""" if sys.platform.startswith("win") : return False import commands #TODO - This only checks the command line tool, not the headers return "not found" not in commands.getoutput("flex --version") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 11:42:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 07:42:08 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221142.m9MBg8Hf012645@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 07:42 EST ------- Created an attachment (id=1011) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1011&action=view) Changes to Bio/Blast/NCBIStandalone.py and Bio/Blast/Record.py I'd like to make the XML and text parser agree on representing the HSP identities, positives and gaps as integers. Currently the text parser (and the default values in the HSP object) use a tuple of the value and the alignment length. The upside is it brings the objects returned by the XML and plain text parsers into better agreement. In this case I find storing these properties as simple integers makes much more sense than as a tuple (a choice probably based on the layout of the BLAST plain text output itself). The downside of applying this patch is it could break some existing scripts parsing the plain text output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 15:43:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 11:43:06 -0400 Subject: [Biopython-dev] [Bug 2618] back_translate method for the Seq object (in Bio.Seq)? In-Reply-To: Message-ID: <200810221543.m9MFh6Yu009327@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2618 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 11:43 EST ------- After some lively debate on the mailing list, we failed to come up with any real world examples where a simple back_translate method (or a Bio.Seq back_translate function) giving a string or Seq object would be useful. A simple string or the current Seq object simply cannot represent all the possible codons in a back translation. Consider the standard table for leucine, Leu/L = {TTA, TTG, CTT, CTC, CTA, CTG} = {TTR, CTN} which covers 6 unambiguous codons. This is a subset of YTN = {TTC, TTA, TTG, TTT, CTC, CTA, CTG, CTT} which covers 8 unambiguous codons. Having back_translate("L") == "CTN" means translate(back_translate("L")) == "L", but doesn't cover the two codons TTR (i.e. TTA or TTG). At least this is better than back_translate("L") == "TTR" which still has translate(back_translate("L")) == "L", but doesn't cover the four codons CTN. Picking any one of the six codons also ensures translate(back_translate("L")) == "L" but of course doesn't cover the other five codons. In all three cases, the utility of the back translation is limited (e.g. no help for searches). Having back_translate("L") == "YTN" means translate(back_translate("L")) == "X", which would surprise many. Using "YTN" covers all the codons plus some extra ones. This might be useful for searching purposes, but otherwise its very misleading. However, while I am marking this bug as WONTFIX, returning a more complex ambiguous sequence representation (e.g. using regular expressions) may have merit. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 16:08:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 12:08:29 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221608.m9MG8TcN011450@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 12:08 EST ------- Query Length ============ XML output includes this information once, currently recorded as .query_letters only. Plain text output includes this twice, recorded as .query_letters (associated with the query header) and .query_length (associated with the pairwise alignments). e.g. ... Query= gi|120291|sp|P21297|FLBT_CAUCR FLBT PROTEIN (141 letters) ... >gi|120291|sp|P21297|FLBT_CAUCR FLBT PROTEIN Length = 141 ... As far as I know, these are always the same. An assertion could be added to the plain text parser to verify this... For consistency, the XML parser could just populate both .query_length and .query_letters - a simple change that won't break any old code and makes migrating from the text parser to the XML parser a little easier. This does perpetuate the confusion of two names. We could go further and make one of these properties officially deprecated (e.g. using a property method to issue a warning). But which one should we keep? Currently the XML parser only supports .query_letters but .query_length is more natural. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 22 16:28:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 22 Oct 2008 12:28:48 -0400 Subject: [Biopython-dev] [Bug 2176] XML Blast parser: miscellaneous bug fixes and cleanup In-Reply-To: Message-ID: <200810221628.m9MGSmiV014465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2176 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-22 12:28 EST ------- Database Length =============== I wanted to record my notes on this based on findings reported on the mailing list. See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2008-August/004101.html The plain text BLAST format contains the database length information three times (!), once in the header (for each query) and then again at the end of the file in the database report and the parameters "total letters" and again as "length of database", e.g. http://bugzilla.open-bio.org/attachment.cgi?id=676 ... Database: Leigo 4,535,438 sequences; 1,573,298,872 total letters ... Database: Leigo Posted date: Jan 22, 2007 11:26 AM Number of letters in database: 1,573,298,872 Number of sequences in database: 4,535,438 ... Length of database: 1,573,298,872 ... The Bio.Record.Header class defines "database_letters" (this is repeated every query), Bio.Record.DatabaseReport defines "num_letters_in_database", and Bio.Record.Parameters class defines "database_length" (where the names reflect the NCBI strings). The Bio.Record.Record inherits from all three, so ends up with "database_letters", "database_length" and "num_letters_in_database" (all coming from different bits of a plain text BLAST file). If the -z option is used, only the last of these three databases in the plain text output is changed (tested using standalone BLAST 2.2.18, which Biopython can parse for single queries). Using the Biopython plain text parser, "database_letters" and "num_letters_in_database" reflect the real database size, while "database_length" reflects the -z argument (which is used in the statistics). If the -z option is used with XML output, then is updated. As far as I can tell, the "real" database size is not reported. The XML parser stores this as "num_letters_in_database". So from plain text BLAST we have two pieces of information, actual database size - "database_letters" and "num_letters_in_database specified database size - "database_length" While for XML BLAST we only get one piece of information, specified database size - "num_letters_in_database" while "database_letters" and "database_length" default to None. This is a horrid mess. In the short term I propose the XML parser also record the specified database size as "database_length", and perhaps also as "database_letters" which would facilitate anyone trying to migrate a script from the plain text parser to the XML parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 22 17:04:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Oct 2008 18:04:20 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? Message-ID: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> This is about three Biopython "support" modules: Bio.mathfns, Bio.listfns, Bio.stringfns, each of which has its own C implementation for speed. These haven't been touched for 6 years (which suggests they are stable and well tested), but they are now hardly used in Biopython. By removing these we not only reduce the amount of C code in Biopython (although here it is optional) which is a good thing for portability and supporting other python variants, but we also can reduce the "clutter" under the Bio.* namespace, e.g. >>> import Bio >>> help(Bio) On 9th Oct I wrote: > Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now > uses numpy more heavily instead. I think that Bio.mathfns (and its C > implementation) are no longer used anywhere in Biopython (and I would > be surprised if anyone else is using this module). I'm suggesting > deprecating Bio.mathfns and Bio.cmathfns for the next release. Any objections to deprecating Bio.mathfns and Bio.cmathfns? On 9th Oct I wrote: > I think Bio.stringfns and its C implementation Bio.cstringfns are also > now unused in Biopython, and like Bio.mathfns and Bio.cmathfns > should be deprecated for the next release. Any objections to deprecating Bio.stringfns and Bio.cstringfns? On 9th Oct I wrote: > Similarly, Bio.listfns and its C implementation Bio.clistfns might > also be deprecated with a little effort ... only three modules > currently use Bio.listfns We could just label Bio.listfns (and Bio.clistfns) as obsolete for the next release, or just add a note in the docstring that this might be deprecated shortly. Peter From bsouthey at gmail.com Thu Oct 23 16:28:48 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 23 Oct 2008 11:28:48 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> Message-ID: <4900A640.8050102@gmail.com> Peter wrote: > This is about three Biopython "support" modules: Bio.mathfns, > Bio.listfns, Bio.stringfns, each of which has its own C implementation > for speed. These haven't been touched for 6 years (which suggests > they are stable and well tested), but they are now hardly used in > Biopython. > > By removing these we not only reduce the amount of C code in Biopython > (although here it is optional) which is a good thing for portability > and supporting other python variants, but we also can reduce the > "clutter" under the Bio.* namespace, e.g. > >>>> import Bio >>>> help(Bio) >>>> > > On 9th Oct I wrote: > >> Until recently Bio.mathfns was used in Bio/NaiveBayes.py but that now >> uses numpy more heavily instead. I think that Bio.mathfns (and its C >> implementation) are no longer used anywhere in Biopython (and I would >> be surprised if anyone else is using this module). I'm suggesting >> deprecating Bio.mathfns and Bio.cmathfns for the next release. >> > > Any objections to deprecating Bio.mathfns and Bio.cmathfns? > Nope, the functions used by Bio/NaiveBayes.py are: mathfns.safe_log (also defines safe_log2) but is not very good because it sets a hard constant (1E-100) as a limit. mathfns.safe_exp The other functions included are: fcmp Compare two floating point numbers, up to a specified precision. intd Represent a floating point number as an integer. I presume that you mean adding mathfns.safe_log and mathfns.safe_exp to Bio/NaiveBayes.py first because these are needed by Bio/NaiveBayes.py. Note that the safe_log in Bio/MarkovModel.py is not the same as mathfns.safe_log. > On 9th Oct I wrote: > >> I think Bio.stringfns and its C implementation Bio.cstringfns are also >> now unused in Biopython, and like Bio.mathfns and Bio.cmathfns >> should be deprecated for the next release. >> > > Any objections to deprecating Bio.stringfns and Bio.cstringfns? > Nope, as you say these are not used. But just to be clear, the functions, lost are splitany Split a string using many delimiters. find_anychar Find one of a list of characters in a string. rfind_anychar Find one of a list of characters in a string, from end to start. starts_with Check whether a string starts with another string [DEPRECATED]. > On 9th Oct I wrote: > >> Similarly, Bio.listfns and its C implementation Bio.clistfns might >> also be deprecated with a little effort ... only three modules >> currently use Bio.listfns >> > > We could just label Bio.listfns (and Bio.clistfns) as obsolete for the > next release, or just add a note in the docstring that this might be > deprecated shortly. > Used by: Bio/MaxEntropy.py Bio/NaiveBayes.py Bio/MarkovModel.py Bio/pairwise2.py Functions directly used: itemindex Make an index of the items in the list. items Get one of each item in a list. contents Calculate percentage each item appears in a list. Functions indirectly or not used: asdict Make the list into a dictionary (for fast testing of membership). count Count the number of times each item appears. intersection Get the items in common between 2 lists. difference Get the items in 1 list, but not the other. indexesof Get a list of the indexes of some items in a list. take Take some items from a list. Also Bio.listfns used by pairwise2.py which also has a c implementation (cpairwise2) that I would also suggest is a candidate for removal. At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, and Bio/MarkovModel.py to indicate if Bio.listfns functions are really required or to port them to numpy. (I may try look at trying to port them but not soon.) In summary I have no objection to removing the c code associated with this code. Bruce From biopython at maubp.freeserve.co.uk Thu Oct 23 16:48:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 23 Oct 2008 17:48:23 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <4900A640.8050102@gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> Message-ID: <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> Bruce: >Peter: >> Any objections to deprecating Bio.mathfns and Bio.cmathfns? > > Nope, the functions used by Bio/NaiveBayes.py are ... You must be looking at Bio/NaiveBayes.py an older CVS checkout - it doesn't use Bio.mathfns at all now, but rather makes more use of numpy. >> We could just label Bio.listfns (and Bio.clistfns) as obsolete for the >> next release, or just add a note in the docstring that this might be >> deprecated shortly. > > Used by: > Bio/MaxEntropy.py > Bio/NaiveBayes.py > Bio/MarkovModel.py > Bio/pairwise2.py > > Functions directly used: > ... > At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, > and Bio/MarkovModel.py to indicate if Bio.listfns functions are really > required or to port them to numpy. (I may try look at trying to port them > but not soon.) I haven't dug too deeply either - which is why I wasn't going to push to deprecate Bio.listfns yet. I did mention some of this in the earlier email, but you have gone into more detail. http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004406.html As you will have noticed, many of the things in Bio.listfns could nowadays be done in pure python with a set. Bruce wrote: > Also Bio.listfns used by pairwise2.py which also has a c implementation > (cpairwise2) that I would also suggest is a candidate for removal. I think Bio.pairwise2 is actually potentially quite useful. It could do with a little documentation love - even a short "cookbook" entry for the Tutorial would help. Peter From bsouthey at gmail.com Thu Oct 23 18:50:06 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 23 Oct 2008 13:50:06 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> Message-ID: <4900C75E.9020907@gmail.com> Peter wrote: > Bruce: > >> Peter: >> >>> Any objections to deprecating Bio.mathfns and Bio.cmathfns? >>> >> Nope, the functions used by Bio/NaiveBayes.py are ... >> > > You must be looking at Bio/NaiveBayes.py an older CVS checkout - it > doesn't use Bio.mathfns at all now, but rather makes more use of > numpy. > Sorry, yes, I just pulled a new cvs version and I now see that it has been removed. > >>> We could just label Bio.listfns (and Bio.clistfns) as obsolete for the >>> next release, or just add a note in the docstring that this might be >>> deprecated shortly. >>> >> Used by: >> Bio/MaxEntropy.py >> Bio/NaiveBayes.py >> Bio/MarkovModel.py >> Bio/pairwise2.py >> >> Functions directly used: >> ... >> At present I do not know enough about Bio/MaxEntropy.py, Bio/NaiveBayes.py, >> and Bio/MarkovModel.py to indicate if Bio.listfns functions are really >> required or to port them to numpy. (I may try look at trying to port them >> but not soon.) >> > > I haven't dug too deeply either - which is why I wasn't going to push > to deprecate Bio.listfns yet. > > I did mention some of this in the earlier email, but you have gone > into more detail. > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004406.html > > As you will have noticed, many of the things in Bio.listfns could > nowadays be done in pure python with a set. > My brief look at the function defined by Bio.listfns indicated that these were only useful if a list could not be converted into a dictionary. I am not sure that this is required at least for Bio/MaxEntropy.py, Bio/NaiveBayes.py, and Bio/MarkovModel.py. Personally I would be inclined to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I know this is duplication but hopefully this would be addressed if someone updates the code. Bruce From biopython at maubp.freeserve.co.uk Thu Oct 23 21:25:47 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 23 Oct 2008 22:25:47 +0100 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <4900C75E.9020907@gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> Message-ID: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Bruce wrote: > Personally I would be inclined > to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I > know this is duplication but hopefully this would be addressed if someone > updates the code. If it were only pure python, I might agree with you. But given there is C code which might well be worthwhile for speed, I'm not in any hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to be deprecated and removed in the future should suffice for the time being. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 24 03:03:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Oct 2008 23:03:09 -0400 Subject: [Biopython-dev] [Bug 2626] New: Bio.PDB mmCIFParser parse exceptions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2626 Summary: Bio.PDB mmCIFParser parse exceptions Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: cjoldfield at gmail.com I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate). Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary. A few examples of files that failed to parse: http://www.rcsb.org/pdb/files/1alw.cif.gz http://www.rcsb.org/pdb/files/1det.cif.gz http://www.rcsb.org/pdb/files/1tmy.cif.gz A few with missing fields: http://www.rcsb.org/pdb/files/1mfl.cif.gz http://www.rcsb.org/pdb/files/1tfj.cif.gz http://www.rcsb.org/pdb/files/1zn8.cif.gz The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse. x86_64 gentoo linux 2008, src BioPython install __CODE__ import sys from Bio.PDB import * if len(sys.argv) != 2: print "usage: mmCifParseCheck.py " sys.exit(0) structFile = sys.argv[1] resultString = ""; #parse to structure object numRes = 0 parser=MMCIFParser() try: structure=parser.get_structure('test',structFile) for model in structure: for chain in model: for residue in chain: if(residue.id[0][:2] != "H_"): numRes += 1 except: resultString += "parse to structure object failed\n"; else: resultString += "parse to structure object succeeded\n"; #parse whole mmCIF file to dict try: mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile) except: resultString += "parse to dict failed\n"; else: resultString += "parse to dict succeeded\n"; #get a required entry try: id = mmcif_dict['_entry.id'] except: resultString += "key lookup failed\n"; else: resultString += "key lookup succeeded\n"; print resultString print "number of non-het residues " + str(numRes) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 13:30:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:30:15 -0400 Subject: [Biopython-dev] [Bug 2627] New: Updated Bio.MarkovModel to remove oldnumeric and listfns imports Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2627 Summary: Updated Bio.MarkovModel to remove oldnumeric and listfns imports Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have updated Bio.MarkovModel to remove using numpy.oldnumeric and Bio.listfns. Hopefully I found the correct places because of the usage of 'from import *'. The test_MarkovModel.py does pass and the commented section using Baum-Welch does run without errors. However, this is not my area so it may not be completely correct. So I would appreciate any other testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 13:32:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:32:03 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200810241332.m9ODW3wJ004124@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 ------- Comment #1 from bsouthey at gmail.com 2008-10-24 09:32 EST ------- Created an attachment (id=1012) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1012&action=view) Updated MarkovModel.py to remove numpy.oldnumeric and listfns -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 13:35:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 09:35:20 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200810241335.m9ODZKP4004615@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 ------- Comment #2 from bsouthey at gmail.com 2008-10-24 09:35 EST ------- Created an attachment (id=1013) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1013&action=view) Modified test_MarkovModel to remove numpy.oldnumeric import This is a modified version of the Bio test for Bio.MarkovModel. Remove the triple quotes to use Baum-Welch because this was commented out in the original test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Oct 24 13:50:01 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 24 Oct 2008 08:50:01 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Message-ID: <4901D289.6070503@gmail.com> Peter wrote: > Bruce wrote: > >> Personally I would be inclined >> to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I >> know this is duplication but hopefully this would be addressed if someone >> updates the code. >> > > If it were only pure python, I might agree with you. But given there > is C code which might well be worthwhile for speed, I'm not in any > hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes > and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to > be deprecated and removed in the future should suffice for the time > being. > > Peter > > Hi, Bug 2627 should include a modified version of Bio.MarkovModel and the associated test that removes Bio.listfns as well as the usage of numpy.oldnumeric (I hate the use of 'from x import *'). I really don't know enough about MarkovModel to know how correct it really is. I only worked with getting the test code to work including the Baum-Welsh part that is commented out in the test. I'll try to get to Bio.MaxEntropy.py and Bio.NaiveBayes over the next week as time permits. Also, Bio.pairwise2 appears to require the functionality of Bio.listfns. Since speed is relative, I think you need some type of benchmarking on this, which in turn needs a good example. Bruce From bugzilla-daemon at portal.open-bio.org Fri Oct 24 14:51:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 10:51:00 -0400 Subject: [Biopython-dev] [Bug 2628] New: Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2628 Summary: Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Motivation: When creating a sequence (or alignment) file, It is sometimes useful to know how many records (or alignments) were written out. This is easy if your records are in a list: records = list(...) SeqIO.write(records, handle, format) print "Wrote %i records" % len(records) If however your records are from a generator/iterator (e.g. a generator expression, or some other iterator) you cannot use len(records). You could turn this into a list just to count them, but this wastes memory. It would therefore be useful to have the count returned: records = some_generator count = SeqIO.write(records, handle, format) print "Wrote %i records" % count For a precedent, the BioSQL loader returns the number of records loaded into the database. Currently Bio.SeqIO.write(...) and Bio.AlignIO.write(...) have no return value, so adding a return value is a backwards compatible enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 20:38:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:38:39 -0400 Subject: [Biopython-dev] [Bug 2629] New: Updated Bio.NaiveBayes to listfns import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2629 Summary: Updated Bio.NaiveBayes to listfns import Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have attempted to modify Bio/NaiveBayes.py to remove the dependency on Bio.listfns functions. Also, made use of the numpy namespace rather than using 'from numpy import *'. Also, I made a testing file with two examples, the car data is not mine but has a worked example and the other is Fisher's iris data. Fisher's data is not really appropriate because it has continuous data and Bio.NaiveBayes only handles discrete data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 20:40:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:40:38 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200810242040.m9OKecUX004894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #1 from bsouthey at gmail.com 2008-10-24 16:40 EST ------- Created an attachment (id=1014) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1014&action=view) Modified NaiveBayes code This is the modified code for Bio/NaiveBayes.py that removes Bio.listfns requirements and a minor change to how numpy is imported. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 24 20:42:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Oct 2008 16:42:03 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200810242042.m9OKg3pT005077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #2 from bsouthey at gmail.com 2008-10-24 16:42 EST ------- Created an attachment (id=1015) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1015&action=view) Example code on using the new NaiveBayes code This has two example data sets that use the new NaiveBayes code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Sun Oct 26 21:34:15 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Sun, 26 Oct 2008 14:34:15 -0700 Subject: [Biopython-dev] CE implementation in Python Message-ID: Interesting donation from Jason Vertrees. CE is a well-known structural alignment program from Phil Bourne's lab at UCSD. -------- Original Message -------- Subject: BiopPython & Structure Alignments Date: Fri, 17 Oct 2008 23:45:35 -0400 From: Jason Vertrees To: idoerg at burnham.org Iddo, I'm not sure if this might be of assitance to the BioPython project, but I implemented a version of CE Align (structure alignment algorithm) as an extension to the PyMOL program. The code is open-source (BSDL) and I have two versions: (1) Pure Python -- slow; essentially unusable, but all Python. A typica alignment might take about 30 seconds. (2) Mixed Python/C implementation: the math is done in C/C++ (implemented both) so it's very fast. The good news for BioPython is that I used distutils and the typical Python Extending procedures to write the code: BioPython might very easily be able to plug in the code so you don't have to access a web server to do structure alignments. A typical alignment takes about 0.2 seconds. If you're at all interested, please look at http://www.pymolwiki.org/index.php/Cealign as that's there the code lives. I don't have time to port the code for the project, but I don't think it would be all that hard given some effort. I'm happy to answer any questions. HTH, -- Jason Vertrees -- Jason Vertrees, PhD Boston U. -- jasonv at bu.edu Dartmouth -- jv at cs.dartmouth.edu -- Iddo Friedberg, Ph.D. Atkinson Hall, mail code 0446 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0446, USA T: +1 (858) 534-0570 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Mon Oct 27 09:57:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Oct 2008 09:57:06 +0000 Subject: [Biopython-dev] CE implementation in Python In-Reply-To: References: Message-ID: <320fb6e00810270257x43f8de09scd544b3a3f43ef73@mail.gmail.com> On Sun, Oct 26, 2008 at 9:34 PM, Iddo Friedberg wrote: > Interesting donation from Jason Vertrees. CE is a well-known > structural alignment program from Phil Bourne's lab at UCSD. This sounds interesting - presumably to integrate it into Biopython we would need to make it work on the Bio.PDB chain/structure objects. For example, I would expect the CE code would need to be able to get the secondary structure of each residue (i.e. where did the PDB file say the alpha helices and beta sheets were). However, I will not have the time nor the motivation to do this myself. Its nice that in addition to pure python he has both C and C++ back ends. My gut instinct is to avoid the C++ code and stick with the C code (given previous cross platform fun with C++). Its a little surprising his python code is so much slower - but maybe it doesn't take full advantage of numpy at the moment? Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 27 13:37:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:37:53 -0400 Subject: [Biopython-dev] [Bug 2631] New: Updated Bio.MaxEntropy to remove listfns import Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2631 Summary: Updated Bio.MaxEntropy to remove listfns import Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I have updated Bio/MaxEntropy.py to remove the dependency on Bio/listfns.py and replaced the from numpy.oldnumeric import. Also, I created a small example of the usage. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 13:38:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:38:32 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271338.m9RDcWv5018326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #1 from bsouthey at gmail.com 2008-10-27 09:38 EST ------- Created an attachment (id=1016) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1016&action=view) Modified MaxEntropy code -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 13:46:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 09:46:43 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271346.m9RDkhGb018836@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #2 from bsouthey at gmail.com 2008-10-27 09:46 EST ------- Created an attachment (id=1017) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1017&action=view) Example code on using the new MaxEntrophy code This is an example of using the MaxEntrophy code. Perhaps the most important aspect of using MaxEntrophy is that it requires an iterable type that contains the functions for classification. MaxEntrophy will then iterate through each of these functions to predict the status based on that function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Oct 27 14:02:07 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 27 Oct 2008 09:02:07 -0500 Subject: [Biopython-dev] Deprecating Bio.mathfns, Bio.stringfns and their C code? In-Reply-To: <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> References: <320fb6e00810221004u2d02f518p267f7aa539bd8b80@mail.gmail.com> <4900A640.8050102@gmail.com> <320fb6e00810230948yada623fg647f75b8d752eef1@mail.gmail.com> <4900C75E.9020907@gmail.com> <320fb6e00810231425y1bafec09ufbd773a479ac8e6d@mail.gmail.com> Message-ID: <4905C9DF.7000602@gmail.com> Peter wrote: > Bruce wrote: > >> Personally I would be inclined >> to incorporate Bio.listfns into each of these and depreciate Bio.listfns. I >> know this is duplication but hopefully this would be addressed if someone >> updates the code. >> > > If it were only pure python, I might agree with you. But given there > is C code which might well be worthwhile for speed, I'm not in any > hurry to remove Bio.listfns until Bio.MaxEntropy.py, Bio.NaiveBayes > and Bio.MarkovModel don't need it. Labelling Bio.listfns as likely to > be deprecated and removed in the future should suffice for the time > being. > > Peter > > Hi, Just an update, I think that I have removed the listfns dependency from Bio.MaxEntropy.py (Bug 2631 ), Bio.NaiveBayes (Bug 2629 )and Bio.MarkovModel (Bug 2627 ) - the order is also in terms of low to high number of changes that were required. I also removed any 'from x import *' as well especially where x=numpy.oldnumeric. I also added examples that use MaxEntropy and NaiveBayes but these should have bioinformatics examples. Note that NaiveBayes is the discrete version. Bruce From bugzilla-daemon at portal.open-bio.org Mon Oct 27 16:16:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 12:16:52 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271616.m9RGGqBY031121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-27 12:16 EST ------- I don't know if it matters for MaxEntropy, but your re-implementation of Bio.listfns.itemindex does not preserve the current behaviour with duplicate entries: >>> x = [1,2,3,3,2,5] >>> from Bio.listfns import itemindex >>> itemindex(x) {1: 0, 2: 1, 3: 2, 5: 5} >>> class2index ={} >>> for i, j in enumerate(x): class2index.update({j:i}) >>> class2index {1: 0, 2: 4, 3: 3, 5: 5} -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 16:55:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 12:55:04 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200810271655.m9RGt43L001261@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #4 from bsouthey at gmail.com 2008-10-27 12:55 EST ------- (In reply to comment #3) > I don't know if it matters for MaxEntropy, but your re-implementation of > Bio.listfns.itemindex does not preserve the current behaviour with duplicate > entries: > > >>> x = [1,2,3,3,2,5] > >>> from Bio.listfns import itemindex > >>> itemindex(x) > {1: 0, 2: 1, 3: 2, 5: 5} > >>> class2index ={} > >>> for i, j in enumerate(x): > class2index.update({j:i}) > > >>> class2index > {1: 0, 2: 4, 3: 3, 5: 5} > In this case, x is a return type of the listfns.items() function, where the doc string of listnfs: """items(l) -> list of items Generate a list of one of each item in l. The items are returned in arbitrary order. """ Therefore duplicates are not allowed to occur (and duplicates would not make sense anyhow). But in order to be similar to the original code, just avoid updating if the key already exists: class2index ={} for i, j in enumerate(x): if not class2index.has_key(j): class2index.update({j:i}) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 19:08:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 15:08:48 -0400 Subject: [Biopython-dev] [Bug 2634] New: PAM30 Matrix doesn't work with Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2634 Summary: PAM30 Matrix doesn't work with Product: Biopython Version: 1.48 Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ngcrawfo at bu.edu I send it this code: result_handle = NCBIWWW.qblast("blastp", "nr", seq_record.seq.tostring(), matrix_name = 'PAM30', word_size='2', expect='30000', composition_based_statistics='no adjustment') And I get this: ValueError: invalid literal for int() with base 10: function qblast in NCBIWWW.py at line 769 rid, rtoe = _parse_qblast_ref_page(handle) function _parse_qblast_ref_page in NCBIWWW.py at line 828 return rid, int(rtoe) Note: if I change the matrix name to 'PAM70' or BlOSUM62 there is no error. I'm trying to emulate the short sequence parameters in BLAST so I need to use PAM30 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 20:14:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 16:14:53 -0400 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200810272014.m9RKErND028218@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|PAM30 Matrix doesn't work |PAM30 Matrix doesn't work |with |with qblast ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-27 16:14 EST ------- The error from Biopython is because it can't find the RID and RTOE references in what is normally a "please wait" HTML page. The reason being you have triggered an error. From dumping the HTML requested: ------------------------------- Message ID#33 Error: Cannot validate the Blast options: Gap existence and extension values of 11 and 1 not supported for PAM30 supported values are: 32767, 32767 7, 2 6, 2 5, 2 10, 1 9, 1 8, 1 This error message indicates that the combination of options for this Blast search is inconsistent or invalid. This can happen when the selected Blast program does not support one of the options provided, when two or more options have conflicting values, etc. If you are using URL API, please check the options mentioned in the error message string and re-submit your search. Please note that the current version of the Blast CGI application is stricter at validating Blast options than it has been historically. If this error persists, please, contact Blast-help at ncbi.nlm.nih.gov for more help. ------------------------------- Short of printing out the whole HTML dump, I'm not sure how best to tell the user about this kind of error - automatically extracting the error message looks unreliable. In anycase, I think you need to investigate the gap options and see if you can match what you are trying to mimic. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 27 20:24:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Oct 2008 16:24:44 -0400 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200810272024.m9RKOir7028873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 ------- Comment #2 from ngcrawfo at bu.edu 2008-10-27 16:24 EST ------- Peter, Thanks for the insight. I'll mess about with the gap and existence parameters. Thanks again! - Nick (In reply to comment #1) > The error from Biopython is because it can't find the RID and RTOE references > in what is normally a "please wait" HTML page. The reason being you have > triggered an error. From dumping the HTML requested: > > ------------------------------- > Message ID#33 Error: Cannot validate the Blast options: Gap existence and > extension values of 11 and 1 not supported for PAM30 > supported values are: > 32767, 32767 > 7, 2 > 6, 2 > 5, 2 > 10, 1 > 9, 1 > 8, 1 > > This error message indicates that the combination of options for this Blast > search is inconsistent or invalid. This can happen when the selected Blast > program does not support one of the options provided, when two or more options > have conflicting values, etc. If you are using URL API, please check the > options mentioned in the error message string and re-submit your search. Please > note that the current version of the Blast CGI application is stricter at > validating Blast options than it has been historically. If this error persists, > please, contact href="mailto:blast-help at ncbi.nlm.nih.gov">Blast-help at ncbi.nlm.nih.gov for > more help. > ------------------------------- > > Short of printing out the whole HTML dump, I'm not sure how best to tell the > user about this kind of error - automatically extracting the error message > looks unreliable. > > In anycase, I think you need to investigate the gap options and see if you can > match what you are trying to mimic. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 15:25:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 11:25:57 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810281525.m9SFPvCV031306@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 11:25 EST ------- (In reply to comment #22) > ... Stop translating at the first in frame stop codon (see my comment 18). > Again, a boolean argument, and for compatibility with previous Biopython > conventions, defaulting to False (i.e. read through). Possible names "stop", > "to_stop", "auto_stop", "terminate", ... > > In this case, how should the method behave if there is no final stop codon - > raise an error or not? Also should the stop codon be included in the returned > sequence (note that the Bio.Translate module did not include the stop symbol). Added in CVS with the optional argument named "to_stop" (boolean), defaulting to False (continue translating through any stops). See Bio/Seq.py revision 1.51 and Tests/test_seq.py revision 1.28 I'm happy to discuss alternative names for this argument (up until the next release of Biopython). This choice was influenced by the existing method name translate_to_stop in Bio.Translate (which can now be declared obsolete and awaiting deprecation). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 16:33:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 12:33:49 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200810281633.m9SGXnBO017256@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 12:33 EST ------- I have updated the Tutorial, and also: The undocumented Bio.utils.translate() and Bio.utils.transcribe() etc have been deprecated. The undocumented Bio.SeqUtils.translate() has been deprecated. Bio.Translate has been labelled as obsolete. This was the documented way to do a translation, so I'd like to give people some advance warning before we add a deprecation message. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 18:13:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 14:13:42 -0400 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200810281813.m9SIDgt8027734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-28 14:13 EST ------- Created an attachment (id=1023) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1023&action=view) Patch for Bio.SeqIO and Bio.AlignIO and their unit tests Adds an integer return value to Bio.SeqIO.write() and Bio.AlignIO.write() giving the number of records/alignments written to the handle. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 28 22:12:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Oct 2008 18:12:19 -0400 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200810282212.m9SMCJrr018731@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #3 from chapmanb at 50mail.com 2008-10-28 18:12 EST ------- Peter -- this is a great summary of the problem. The fix you propose in the _FeatureParser should definitely go in. In terms of the higher level how should we treat this question, it wasn't really thought about two heavily in the initial implementation since it's a fringe case. Your proposal for handling it sounds fine. In this case BetweenPosition would change to something like: def __init__(self, position, extension = 0): AbstractPosition.__init__(self, position + 1, 0) self._between_position = position self._between_extension = extension def __str__(self): return "(%s^%s)" % (self._between_position, self._between_position + self._between_extension) We would just hide the "between" junk from the standard Position stuff and represent it as your proposal. How does that sound generally? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:57:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:05 -0400 Subject: [Biopython-dev] [Bug 1492] Martel Parser fails on Bio.db["protein-genbank-cgi"] entry In-Reply-To: Message-ID: <200810291657.m9TGv5Y0006396@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1492 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:57:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:19 -0400 Subject: [Biopython-dev] [Bug 1589] Parsing fails at "operon" tag with RecordParser or FeatureParser In-Reply-To: Message-ID: <200810291657.m9TGvJ0H006452@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1589 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:57:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:57:36 -0400 Subject: [Biopython-dev] [Bug 1758] genbank parser chokes on /transl_except In-Reply-To: Message-ID: <200810291657.m9TGvaxs006520@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1758 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:57 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:58:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:58:23 -0400 Subject: [Biopython-dev] [Bug 2072] GenBank parser breaks: LOCUS line does not contain valid sequence type (DNA, RNA, ...) In-Reply-To: Message-ID: <200810291658.m9TGwNrH006661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2072 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:58 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:58:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:58:47 -0400 Subject: [Biopython-dev] [Bug 2076] EMBL to GenBank converter should fix unterminated lines In-Reply-To: Message-ID: <200810291658.m9TGwld5006730@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2076 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:58 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:59:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:03 -0400 Subject: [Biopython-dev] [Bug 1920] Bio.Geo does not support recent GEO files In-Reply-To: Message-ID: <200810291659.m9TGx3Fx006785@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1920 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:59:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:30 -0400 Subject: [Biopython-dev] [Bug 1773] Martel.Parser.ParserPositionException In-Reply-To: Message-ID: <200810291659.m9TGxUhd006827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1773 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 29 16:59:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Oct 2008 12:59:41 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200810291659.m9TGxf0Z006863@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Martel/Mindy |Main Distribution ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-10-29 12:59 EST ------- Refiling this bug under "Main Distribution" in order to delete the "Martel/Mindy" entry in Bugzilla. Sorry about the pointless emails this will trigger. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Oct 30 21:35:31 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Oct 2008 21:35:31 +0000 Subject: [Biopython-dev] [BioPython] calculate F-Statistics from SNP data In-Reply-To: <6d941f120810251834q87495d5re558cf179356a8b0@mail.gmail.com> References: <5aa3b3570810160302q48df31d8h777cb760b763b77d@mail.gmail.com> <5aa3b3570810200657i4ff7ded1p5198a801ff9eccd7@mail.gmail.com> <5aa3b3570810220325g563f6a22x3f30185ae3a01b4e@mail.gmail.com> <320fb6e00810220334n6aedc5a2m7a560c25ff703917@mail.gmail.com> <6d941f120810220903s6cdc034fhec369677ac5896c9@mail.gmail.com> <5aa3b3570810221010h787c74c7h65084e05964de71d@mail.gmail.com> <6d941f120810230810k4e48c48cp5c55722a851005cf@mail.gmail.com> <5aa3b3570810230925k1eccff39kd47f022842576a46@mail.gmail.com> <6d941f120810251804o31ed44cat49b407db36a6891e@mail.gmail.com> <6d941f120810251834q87495d5re558cf179356a8b0@mail.gmail.com> Message-ID: <6d941f120810301435m7c151ad5u77def486eb24a70c@mail.gmail.com> Hi, FYI, I am going to continue this discussion to biopython-dev, as I think it makes more sense there. Especially the parts about implementation suggestions. On Sun, Oct 26, 2008 at 1:34 AM, Tiago Ant?o wrote: > I just want add on an extra comment explaining why I oppose doing an > individual object: > > I have the following questions (and others) in my mind, which I don't > know the answer. I am not looking for answers to them, I am just > trying to illustrate the difficulty of the problem. > > 1. For a certain marker, do we store the genomic position of the > marker? Some (most) statistics don't use this information. For many > species this information is not even available. But for some > statistics this information is mandatory... > 2. For a microsatellite do we store the motif and number of repeats or > the whole sequence? (see 4) > 3. If one is interested in SNPs and one has the full sequences does > one store the full sequences or just the SNPs? If you store just the > SNPs then you cannot do sequence based analysis in the future (say > Tajima D). If you store everything then you are consuming memory and > cpu. > 4. If one just wants to do frequency statistics (Fst), do you store > the marker or just the assign each one an ID and store the ID? It is > much cheaper to store an ID than a full sequence. > > Populations > 1. Support for landscape genetics? I mean geo-referentiation > 2. Support for hierarchical population structure? > 3. Do we cache statistics results on Population objects? > > > Let me take your class marker: > class Marker: > total_heterozygotes_count = 0 > total_population_count = 0 > total_Purines_count = 0 # this could be renamed, of course > total_Pyrimidines_count = 0 > > How would this be useful for microsatellites? Why purines, and if my > marker is a protein? If it is a SNP I want to know the nucleotide? And > if I am studying proteins and I want to have the aminoacid? > > Dont take me wrong, I have done this path. To solve my particular > problems is not very hard. To have a framework that is usable by > everybody, it is a damn hard problem. And we dont really need to solve > it (ok, it would be nice to do things to populations in general, that > I agree). But the fundamental is: read file, calculate statistics. > That doesnt need population and individual objects. > > If we end up having too many formats a consolidation step might be > needed in the future (to avoid having 10 split_in_pops). That I agree. > -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org From tiagoantao at gmail.com Thu Oct 30 23:58:57 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Oct 2008 23:58:57 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I Message-ID: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Hi, Statistics is the most important part of population genetics modules. In fact one could say that statistics where invented FOR population genetics (check http://en.wikipedia.org/wiki/Ronald_Fisher ). When I started to work on the population genetics module I decided to delay the statistics module a bit, in order to get experience with the whole biopython project before committing to do the most important thing. Irrespective of it is possible or not to link scipy or not, now seems to be the time to advance, especially considering that Giovanni is interested in participating. A few of points need to be said before suggesting on how to put statistics in Bio.PopGen 1. Whatever design is put in, it should be reasonably future proof: in a few releases it should not be a good idea to break older code. That should be avoided in as much as possible. 2. It goes without saying that the code should be useful to everybody doing population genetics and not only the authors of Bio.PopGen: all kinds of markers and population structures should be accommodatable in the future . 3. For reasons that I've partially explained on the biopython list, I don't think a OO model explicitly based on individuals or populations e good (or even necessary) 4. Any framework should be more pragmatic than anything else. I would envision a typical use case like this a) read data (from a certain data source) b) Do some basic processing (changing individuals or populations, converting markers) c) calculate statistics A few comments regarding each of these points: a) data sources, file formats: file formats in population genetics exist in large quantities and are essencialy completely ad-hoc, most made in a very naive way. Good or BAD, that is what there is. The most used format (some kind of de facto standard, GenePop) can only be used for frequency-based statistics, for all the rest things are fragmented (although, if there are no population structure and the data is sequences than standard sequence based formats can be used - but from my experience this is a small minority) b) basic processing: This is the point where a OO model of individuals and populations would pay, but I think it is not the "meat of the issue" c) statistics: there are of every type and for every taste. If you want to have an idea of what is out there an interesting place to look at is the arlequin3 manual: http://cmpg.unibe.ch/software/arlequin3/arlequin31.pdf (part of the manual is UI description, but especially starting at page 89 - the table there is a good overview - there are descriptions of the overall panorama). With time, and after at least 3 failed attempts to think in terms of individuals/populations I started to cristalize around a model centered on types of statistics. This model ends up actually having implicit models of populations and individuals, and that is, in fact, there. It is just implicit and not unified: different kinds of statistics have different implicit models. The model that I would like to propose, centered around statistics, will be the subject of my next email (which I will send in the next couple of days - still under design and lost sleep). I might split it in 2 parts (concepts and suggestions for implementation). From bsouthey at gmail.com Fri Oct 31 02:18:56 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 30 Oct 2008 21:18:56 -0500 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Message-ID: Hi, Can you please be more specific especially in terms of: What statistics do you want to compute? What type of data ? Obviously these are rather interdependent. In my experience, the statistic and the data type really dictate how to proceed. Typically you start with pedigree and data files then add more files for genetic markers (often chromosome specific) etc. Each requires a specific format and appropriate links between them. Again this really depends on what you want to calculate and how you do it. You will probably find that object orientated approach with individuals, families, populations, models and data type etc. may actually be helpful and necessary depending on what you want to do. This it really help me with QTL mapping code especially the overall design because you makes think exactly where things should go and that was far more important than the actual coding. While some of it is implicit, separating out some components will be necessary especially getting population-based statistics for data values recorded on individuals. Bruce On Thu, Oct 30, 2008 at 6:58 PM, Tiago Ant?o wrote: > Hi, > > Statistics is the most important part of population genetics modules. > In fact one could say that statistics where invented FOR population > genetics (check http://en.wikipedia.org/wiki/Ronald_Fisher ). > When I started to work on the population genetics module I decided to > delay the statistics module a bit, in order to get experience with the > whole biopython project before committing to do the most important > thing. > Irrespective of it is possible or not to link scipy or not, now seems > to be the time to advance, especially considering that Giovanni is > interested in participating. > A few of points need to be said before suggesting on how to put > statistics in Bio.PopGen > > 1. Whatever design is put in, it should be reasonably future proof: in > a few releases it should not be a good idea to break older code. That > should be avoided in as much as possible. > 2. It goes without saying that the code should be useful to everybody > doing population genetics and not only the authors of Bio.PopGen: all > kinds of markers and population structures should be accommodatable in > the future . > 3. For reasons that I've partially explained on the biopython list, I > don't think a OO model explicitly based on individuals or populations > e good (or even necessary) > 4. Any framework should be more pragmatic than anything else. I would > envision a typical use case like this > a) read data (from a certain data source) > b) Do some basic processing (changing individuals or populations, > converting markers) > c) calculate statistics > A few comments regarding each of these points: > a) data sources, file formats: file formats in population > genetics exist in large quantities and are essencialy completely > ad-hoc, most made in a very naive way. Good or BAD, that is what there > is. The most used format (some kind of de facto standard, GenePop) can > only be used for frequency-based statistics, for all the rest things > are fragmented (although, if there are no population structure and the > data is sequences than standard sequence based formats can be used - > but from my experience this is a small minority) > b) basic processing: This is the point where a OO model of > individuals and populations would pay, but I think it is not the "meat > of the issue" > c) statistics: there are of every type and for every taste. If > you want to have an idea of what is out there an interesting place to > look at is the arlequin3 manual: > http://cmpg.unibe.ch/software/arlequin3/arlequin31.pdf > (part of the manual is UI description, but especially starting at page > 89 - the table there is a good overview - there are descriptions of > the overall panorama). > > With time, and after at least 3 failed attempts to think in terms of > individuals/populations I started to cristalize around a model > centered on types of statistics. This model ends up actually having > implicit models of populations and individuals, and that is, in fact, > there. It is just implicit and not unified: different kinds of > statistics have different implicit models. > The model that I would like to propose, centered around statistics, > will be the subject of my next email (which I will send in the next > couple of days - still under design and lost sleep). I might split it > in 2 parts (concepts and suggestions for implementation). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Fri Oct 31 10:03:28 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 31 Oct 2008 10:03:28 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> Message-ID: <6d941f120810310303q439c1225r26511944066ab49@mail.gmail.com> On Fri, Oct 31, 2008 at 2:18 AM, Bruce Southey wrote: > Can you please be more specific especially in terms of: > What statistics do you want to compute? > What type of data ? > > Obviously these are rather interdependent. I want a framework that can accommodate all statistics and all types of data (this will be subject of my next email). I personally am concerned for now with F statistics, allelic diversity, expected heterosigosity and such . I.e., frequency based statistics. To put it in another way: marker-independent. A great deal of studies in population genetics is actually frequency based. But, I don't want a particular view of the world (mine or other) to dictate the end result. My expectation is that, in a few weeks the statistics above will be in biopython (they are already implemented in functioning code) but that that doesn't impair the ability to continue in other directions (marker-dependent statistics, genome-wide statistics). > In my experience, the statistic and the data type really dictate how > to proceed. Typically you start with pedigree and data files then add > more files for genetic markers (often chromosome specific) etc. Each > requires a specific format and appropriate links between them. Again > this really depends on what you want to calculate and how you do it. I think the key point is precisely that diversity of statistics and data types, and how the drive the whole thing. I also have found that different people do completely different things. From people working with humans with lots of data and money, to people with model species, to people working in conservation of endangered species. Some people have thousands of markers and lots of individuals others have 10 individuals and 20 markers ("poor-man" markers like microsatellites). Not to talk about population and landscape genetics statistics. Or hierarchical population structure. Not to talk about new sequencing methods and the creative uses that we are starting to see with them. > You will probably find that object orientated approach with > individuals, families, populations, models and data type etc. may > actually be helpful and necessary depending on what you want to do. I've tried to implement several OO frameworks with these kinds of relations and they all failed. They fail precisely because of the immense diversity of statistics, data-formats and use-cases. I always ended trashing everything because of a use case/statistic that would render the model awkward or useless. It is bad over engineering. Correcting things is not bad, but in biopython we don't want to break interfaces in every release. Even if there is a good, future-proof model it will always be either a poor fit in some situations and have performance problems (performance is becoming a more serious issue every day). I think the first approach is thinking: lets do OO with populations, individuals, ... . But experience in trying to do that will lower the expectations of what can be delivered. > This it really help me with QTL mapping code especially the overall > design because you makes think exactly where things should go and that > was far more important than the actual coding. While some of it is > implicit, separating out some components will be necessary especially > getting population-based statistics for data values recorded on > individuals. Getting a correct future-proof design is above my pay-grade using concepts like individuals and populations. And I believe is above the pay grade of 100% of people that I know in this area. I think there is no need for it anyway. I will try to write about this in the next part of my emails.