From bugzilla-daemon at portal.open-bio.org Sat Oct 2 22:51:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 2 Oct 2010 22:51:14 -0400 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: Message-ID: <201010030251.o932pEUM020278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2608 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2010-10-02 22:51 EST ------- The problem here was the strdup is not an ANSI-C function, and its implementation show differences between platforms. Replacing strdup removes the need for unsigned chars. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 3 09:51:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 3 Oct 2010 09:51:50 -0400 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <201010031351.o93DpoGZ023133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2010-10-03 09:51 EST ------- (In reply to comment #7) > Does the current funny XML file have anything useful in it? Yes, but I doubt that many people (if any) are using the Journals database. If they do, we could make a straightforward parser for plain-text output from the Journals database, which is supported by NCBI. See this discussion on the mailing list: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008239.html To resolve this bug, I have modified the parser such that an error is raised whenever the XML data do not start with the XML declaration ( References: <486264729.08793@eyou.net> Message-ID: On Tue, Oct 5, 2010 at 8:45 AM, Yong wrote: > Hello everyone, > > I am testing a database and its web interface > (http://pbl.neau.edu.cn:8080/)?established with Plone4Bio, BioPython and > BioSQL, when query database from webpage it always return the default date > for sequence: "01-JAN-1980". > > I found that the error happened here in file Bio::SeqIO::InsdcIO.py (lines: > 366-371) of BioPython: > > ??? def _get_date(self, record) : > ??????? default = "01-JAN-1980" > ??????? try : > ??????????? date = record.annotations["date"] > ??????? except KeyError : > ??????????? return default > > It looks like that it does not have "date" key, is it a bug of BioPython or > Plone4Bio? anybody know how to solve it? Hi As I recall, reading/writing a GenBank file with Bio.SeqIO (note single dot in Python, two colons is Perl - grin), the date is preserved. I think the problem is in Biopython loading/retrieving a GenBank file in BioSQL, and I thought there was a bug open on this... I can probably suggest a hack in the Plone4Bio code, but it would be better to tweak Biopython. Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 5 05:16:05 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 5 Oct 2010 05:16:05 -0400 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <201010050916.o959G53F031667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-05 05:16 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > 1) Fixed date/dates typo. > > > > Why is it a typo? Change not checked in. > > The function _load_bioentry_date in Loader.py inserts the annotation 'date', > if present, or the current date if not, into the bioentry_qualifier_value > table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as > the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, > which should be 'date' and not 'dates'. Also, because Loader.py handles dates > separately, they should not be handled by the function load_annotations. I'd forgotten about this issue - I was just reminded by a query on the Plone4Bio mailing list. Yes, I think you are right: http://github.com/biopython/biopython/commit/6aca2c0dbc17a172e76483d925248184080bb654 Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Oct 5 05:18:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 5 Oct 2010 10:18:16 +0100 Subject: [Biopython-dev] [P4b] Always return default date: "01-JAN-1980" In-Reply-To: References: <486264729.08793@eyou.net> Message-ID: On Tue, Oct 5, 2010 at 10:00 AM, Peter wrote: > On Tue, Oct 5, 2010 at 8:45 AM, Yong wrote: >> Hello everyone, >> >> I am testing a database and its web interface >> (http://pbl.neau.edu.cn:8080/)?established with Plone4Bio, BioPython and >> BioSQL, when query database from webpage it always return the default date >> for sequence: "01-JAN-1980". >> >> I found that the error happened here in file Bio::SeqIO::InsdcIO.py (lines: >> 366-371) of BioPython: >> >> ??? def _get_date(self, record) : >> ??????? default = "01-JAN-1980" >> ??????? try : >> ??????????? date = record.annotations["date"] >> ??????? except KeyError : >> ??????????? return default >> >> It looks like that it does not have "date" key, is it a bug of BioPython or >> Plone4Bio? anybody know how to solve it? > > Hi > > As I recall, reading/writing a GenBank file with Bio.SeqIO (note single > dot in Python, two colons is Perl - grin), the date is preserved. I think > the problem is in Biopython loading/retrieving a GenBank file in BioSQL, > and I thought there was a bug open on this... > > I can probably suggest a hack in the Plone4Bio code, but it would > be better to tweak Biopython. > > Peter Hi Yong, I found the open bug Biopython report I was thinking of, and committed a fix: http://bugzilla.open-bio.org/show_bug.cgi?id=2681#c9 Are you able to update your copy of Biopython to the latest source code to test this fix? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Oct 11 04:53:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Oct 2010 09:53:45 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/9/28 Tiago Ant?o : > Hi, > > I've been playing with buildbot a bit (for continuous integration > stuff). I am creating a page on the wiki with some info on that front. > > This is just concept/exploratory stuff: if people don't like it, it is > just a question to delete the page. Hopefully this will at least > permit to see if continuous integration is worthwhile the effort and > if buildbot is a good platform for Biopython. > > Any comments most welcome. I expect to have a working > prototype very soon. If people don't ?like it, I just trash it (no > problems with > that). > > Tiago I see from your notes on the wiki you have been making good progress. I have a couple of queries/ideas: (1) Several of our tests go online to the NCBI or UniProt etc. These tests can and do fail sometimes due to network issues. Also, having some/many buildbot slaves running on a regular basis (once a week? once a day?) would add up and this load may be unwelcome. Perhaps we need to add an -offline flag to run_tests.py which can skip any online tests? (2) You mention buildbot doesn't have built in support for spotting changes in a git repository - but can it do this for SVN? Since github.com also allow access to the git repo via svn that might be a more elegant workaround. (3) Does the buildbot master require the buildbot slaves be online most/all of the time? Would a desktop machine which is typically only on during office hours on week days still be useful? I could probably answer this myself with a bit more background reading ;) Thanks Peter From tiagoantao at gmail.com Mon Oct 11 08:21:01 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Oct 2010 13:21:01 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: Hi Peter, 2010/10/11 Peter : > (1) Several of our tests go online to the NCBI or UniProt etc. > These tests can and do fail sometimes due to network issues. > Also, having some/many buildbot slaves running on a regular > basis (once a week? once a day?) would add up and this > load may be unwelcome. Perhaps we need to add an -offline > flag to run_tests.py which can skip any online tests? That might be a good idea (to have an --offline flag, I mean). A very good idea, indeed. I would like to put the infrastructure in place (if people are interested in going ahead with this...), but after that we need to stabilize a test policy and that will mean answering questions like that. As far as I see we will have many builders (tests under different conditions). Say 5 different Python versions (Jython included), at least 3 OSes. This is already 15 builders. This can easily creep up. Though the numbers are high, it is quite easy to maintain all this stuff: 3 volunteer machines (one for each OS) are enough. The cool thing about buildbot is that it is designed for volunteer machines to be added, so you can start your buildbot slave on your laptop when you are idle. It does not need an array of servers on demand to produce the tests. NCBI and Uniprot might not like to see 30 daily connections for tests :( . So we might need to have, say, one weekly test for each OS doing the network stuff (just a single Python version per OS, maybe) and dailies not doing network loads. > (2) You mention buildbot doesn't have built in support for > spotting changes in a git repository - but can it do this for > SVN? Since github.com also allow access to the git repo > via svn that might be a more elegant workaround. There are 2 different things to consider: 1. Spotting the git repository. There is no builtin support, but this is TRIVIAL nonetheless with the general adaptor of buildbot. It works like this: a. a developer does a push b. github has a hook system which allows for reporting a change to the repository to a certain URL/CGI. Fully automated, transparent to the developer. c. We supply a CGI that receives the event and informs buildbot. There are CGIs for github. We just have to stuff one in a webserver. 2. The slaves/builders have to download github code. In this case, buildbot HAS NATIVE SUPPORT. > (3) Does the buildbot master require the buildbot slaves > be online most/all of the time? Would a desktop machine > which is typically only on during office hours on week > days still be useful? I could probably answer this myself > with a bit more background reading ;) That is one of the wonders of buildbot. Just the server needs to be online. You can indeed have a desktop machine: Whenever it suits you better you start your buildbot slave, it connects to the server to see if there is work to do and the server supplies work to be done. The server can be instructed to only allow the slave to do a single task at a time (to avoid overloading the slave). I am now at a stage were I really need a server to test (with a public address). I would volunteer to do the installation myself, but I would need shell access to a machine where I could run a server process. No root access is needed, but a web server is not enough as buildbot is twisted based. Maybe we can convince the OBF to help. Again, I can volunteer to do the installation. No root access is needed, just the ability to run a server process and a couple of open ports. Tiago From biopython at maubp.freeserve.co.uk Mon Oct 11 09:05:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Oct 2010 14:05:40 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/10/11 Tiago Ant?o : > Hi Peter, > > 2010/10/11 Peter : >> (1) Several of our tests go online to the NCBI or UniProt etc. >> These tests can and do fail sometimes due to network issues. >> Also, having some/many buildbot slaves running on a regular >> basis (once a week? once a day?) would add up and this >> load may be unwelcome. Perhaps we need to add an -offline >> flag to run_tests.py which can skip any online tests? > > That might be a good idea (to have an --offline flag, I mean). A very > good idea, indeed. > > I would like to put the infrastructure in place (if people are > interested in going ahead with this...), but after that we need to > stabilize a test policy and that will mean answering questions like > that. > > As far as I see we will have many builders (tests under different > conditions). Say 5 different Python versions (Jython included), at > least 3 OSes. This is already 15 builders. This can easily creep up. > Though the numbers are high, it is quite easy to maintain all this > stuff: 3 volunteer machines (one for each OS) are enough. The cool > thing about buildbot is that it is designed for volunteer machines to > be added, so you can start your buildbot slave on your laptop when you > are idle. It does not need an array of servers on demand to produce > the tests. > > NCBI and Uniprot might not like to see 30 daily connections for tests > :( . So we might need to have, say, one weekly test for each OS doing > the network stuff (just a single Python version per OS, maybe) and > dailies not doing network loads. Exactly. >> (2) You mention buildbot doesn't have built in support for >> spotting changes in a git repository - but can it do this for >> SVN? Since github.com also allow access to the git repo >> via svn that might be a more elegant workaround. > > There are 2 different things to consider: > 1. Spotting the git repository. There is no builtin support, but this > is TRIVIAL nonetheless with the general adaptor of buildbot. It works > like this: > ? ? a. a developer does a push > ? ? b. github has a hook system which allows for reporting a change > to the repository to a certain URL/CGI. Fully automated, transparent > to the developer. > ? ? c. We supply a CGI that receives the event and informs buildbot. > There are CGIs for github. We just have to stuff one in a webserver. Do you even need a post-commit hook? Unless you want to automatically run the tests after every commit (which might be useful) wouldn't it be enough to do a daily checkout? > 2. The slaves/builders have to download github code. In this case, > buildbot HAS NATIVE SUPPORT. Understood. >> (3) Does the buildbot master require the buildbot slaves >> be online most/all of the time? Would a desktop machine >> which is typically only on during office hours on week >> days still be useful? I could probably answer this myself >> with a bit more background reading ;) > > That is one of the wonders of buildbot. Just the server needs to be online. > You can indeed have a desktop machine: Whenever it suits you better > you start your buildbot slave, it connects to the server to see if > there is work to do and the server supplies work to be done. The > server can be instructed to only allow the slave to do a single task > at a time (to avoid overloading the slave). Excellent. I guess the specifics of starting the buildbot slave will be OS specific, thus it would be up to the machine owner if this should happen automatically at login or not. > I am now at a stage were I really need a server to test (with a public > address). I would volunteer to do the installation myself, but I would > need shell access to a machine where I could run a server process. No > root access is needed, but a web server is not enough as buildbot is > twisted based. Maybe we can convince the OBF to help. Again, I can > volunteer to do the installation. No root access is needed, just the > ability to run a server process and a couple of open ports. I think we should have a work with the OBF as running this on one of their servers does seem the best plan. I'll email you. Peter From tiagoantao at gmail.com Mon Oct 11 09:23:15 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Oct 2010 14:23:15 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/10/11 Peter : > Do you even need a post-commit hook? Unless you want > to automatically run the tests after every commit (which might > be useful) wouldn't it be enough to do a daily checkout? I have actually being doing this for my tests: a daily checkout. So we do not need the hook. I would go with the simpler solution for now: ignore the post-commit hook, get something useful working (maybe a nightly build) and in the future we might revisit this when things are better understood and tested. From bugzilla-daemon at portal.open-bio.org Mon Oct 18 07:16:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 18 Oct 2010 07:16:49 -0400 Subject: [Biopython-dev] [Bug 3146] New: DSSP ungraceful failure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3146 Summary: DSSP ungraceful failure Product: Biopython Version: 1.53 Platform: PC OS/Version: All Status: NEW Severity: minor Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: patrick.winters at gmail.com The DSSP annotator should probably fail gracefully when the PDBParser and DSSP disagree about the existence of a residue at a certain position. Here, DSSP reports values for residue 115 of chain A, while the PDBParser throws a key error. from Bio.PDB import PDBParser parser = PDBParser() from Bio.PDB.DSSP import DSSP structure=parser.get_structure("2p0i","pdb2p0i.ent") model=structure[0] dssp=DSSP(model, "pdb2p0i.ent") Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/PDB/DSSP.py", line 175, in __init__ res=chain[res_id] File "/usr/lib/pymodules/python2.6/Bio/PDB/Chain.py", line 71, in __getitem__ return Entity.__getitem__(self, id) File "/usr/lib/pymodules/python2.6/Bio/PDB/Entity.py", line 38, in __getitem__ return self.child_dict[id] KeyError: (' ', 115, ' ') >>> model['A'][114] >>> model['A'][116] >>> model['A'][115] Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/PDB/Chain.py", line 71, in __getitem__ return Entity.__getitem__(self, id) File "/usr/lib/pymodules/python2.6/Bio/PDB/Entity.py", line 38, in __getitem__ return self.child_dict[id] KeyError: (' ', 115, ' ') -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From barwil at gmail.com Tue Oct 19 08:34:45 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Tue, 19 Oct 2010 14:34:45 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi, I've started to look into merging Bio.Motif docs with the Tutorial. I have a few questions: - First, I need to find a good place in the tutorial to put it. One possibility is to make a separate chapter for it, another option is to put it as a subchapter in chapter 15 (cookbook). I think it would be better to make it a separate chapter, similar to one the ones discussing Bio.popgen or bio.phylo, So i thought it would make sense to create it as a new chapter 13, entitled Sequence motif analysis with Bio.Motif -second, I have links and references to papers in there. The question would be should I remove those to keep to the style of the tutorial any thoughts are welcome Bartek On Sat, Sep 18, 2010 at 3:04 PM, Bartek Wilczynski wrote: > Hi, > > On Sat, Sep 18, 2010 at 2:25 PM, Peter wrote: > >> Hi Bartek, >> >> I think it would be good to try and move your Bio.Motif >> documentation from file Docs/cookbook/motif/motif.tex >> into the main Docs/Tutorial.tex as a new chapter. >> Currently it isn't obvious that Biopython supports >> things like a Position Weight Matrix (PWM). >> >> What do you think? >> >> The text will need a slight update since we have now >> deprecated and removed Bio.AlignAce and Bio.MEME, >> but that should be easy. >> > > In general, I'm all for it. It's just that right now is not necessarily the > best time for me to put much work into it. I'm trying to meet a RECOMB > deadline of Oct. 8th with a paper, so if it would not be a problem, I could > update it to the current state of the API after that. On the other hand, if > there's anybody who wants to do it before then, I can review the changes > even earlier. > > thanks for remembering about it. > > Bartek > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Oct 19 08:45:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Oct 2010 13:45:47 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 1:34 PM, Bartek Wilczynski wrote: > Hi, > > I've started to look into merging Bio.Motif docs with the Tutorial. I have a > few questions: > - First, I need to find a good place in the tutorial to put it. > ? ?One possibility is to make a separate chapter for it, another option is > to put it as a subchapter in chapter 15 (cookbook). > ? ?I think it would be better to make it a separate chapter, similar to one > the ones discussing Bio.popgen or bio.phylo, So i thought it would make > sense to create it as a new chapter 13, entitled Sequence motif analysis > with Bio.Motif I agree, create a new chapter (and add yourself to the authors list). I'd definitely put it before the "Cookbook Chapter", and between the Phylogenetics and "Supervised learning methods" chapters seems reasonable. > -second, I have links and references to papers in there. The question would > be should I remove those to keep to the style of the tutorial Keep them - links to external webpages are fine - they work well in both PDF and HTML. For references we currently don't have a formal bibliography - but we do have some existing case of links to papers already, e.g. http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:SeqIO-fastq-conversion Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 19 10:17:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 Oct 2010 10:17:22 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201010191417.o9JEHM0x029641@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-19 10:17 EST ------- (In reply to comment #4) > I do not know what I would like to happen here in addition to the improved > error message. Probably not get an error at all and have biopython able to > cope with these cases as well. I have just asked asimpson at ludwig.org.br > whether fix of the data in dbEST would be feasible. The plain text GenBank file from the NCBI is fine (see comment 1), but the HTML version is not. I don't think this is really a problem with the raw data... Anyway, I've just committed a fix which means Biopython will write an over long line and issue a warning: http://github.com/biopython/biopython/commit/f25ccef1e07129a377954021e08e980b82b6e795 Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Oct 19 11:54:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Oct 2010 16:54:43 +0100 Subject: [Biopython-dev] Merging Uniprot XML parser? Message-ID: Hi all, I've fixed a few issues I felt were holding up merging Andrea's UniProt XML parser. I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed into more or less equivalent objects, and that these can be written out as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent work to support protein EMBL files - which do exist but are rarely used). This required "fixing" Bug 3026 to cope with long annotation that cannot be line wrapper nicely (lots of long URL strings in UniProt XML comments). http://bugzilla.open-bio.org/show_bug.cgi?id=3026 I'm tempted to remove the warning because it is so common... or make it use the same text each time so you get warned once. There are also some additions to the Bio.SeqFeature position classes, since SwissProt/UniProt files can have uncertain positions. Could someone take a look at the code here (a rebased branch), as I'd like some independent testing (and better yet, code review): http://github.com/peterjc/biopython/tree/uniprot Thanks, Peter From eric.talevich at gmail.com Tue Oct 19 22:01:20 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 19 Oct 2010 22:01:20 -0400 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 9:47 AM, Peter wrote: > Hi all, > > A while back I installed NumPy from their svn under Python 3, so that I > could test more of Biopython. I hadn't really looked at Bio.PDB until > recently because test_PDB.py depended on Bio.KDTree which needs > some C code to be compiled (which we haven't tried yet). > [...] > > This has revealed there are at least two issues with Bio.PDB to be > addressed (see below). > [...] > > ====================================================================== > ERROR: test_ExposureCN (__main__.Exposure) > HSExposureCN. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PDB.py", line 612, in setUp > ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 185, in init_atom > ? ?duplicate_atom=residue[name] > TypeError: 'DisorderedResidue' object is not subscriptable > These errors occur when parsing Tests/PDB/a_structure.pdb under permissive mode. In this structure, residue 3 is disordered, and that triggers some exciting things. The bug seems to be related to this method of DisorderedEntityWrapper in Bio/PDB/Entity.py: def __getattr__(self, method): "Forward the method call to the selected child." if not hasattr(self, 'selected_child'): # Avoid problems with pickling # Unpickling goes into infinite loop! raise AttributeError return getattr(self.selected_child, method) When running the test script, where we reach lines 185-186 in StructureBuilder.py: if residue.has_id(name): duplicate_atom=residue[name] it gets magical. The method 'has_id' is not defined on the DisorderedResidue class. Instead, if residue is an instance of DisorderedResidue (subclass of DisorderedEntityWrapper), instead of Residue (subclass of Entity), then accessing residue.has_id on that object calls __getattr__, which in turn calls residue.selected_child.has_id(id). The next line raises a TypeError in Python 3, but not in Python 2 -- residue[name] seems to find the appropriate __getitem__ implementation in Python 2 only. My hypothesis is that Python 2 treats this magic-method call to residue.__getitem__ as an attribute access, allowing DisorderedEntityWrapper.__getattr__ to forward this access to the appropriate child, some Residue instance, which does implement __getitem__. In Python 3, __getitem__-related syntax could be implemented slightly differently, so it's not seen as a __getattr__ access and everything falls apart. (I could be wrong about all of this.) So here's what I'm doing: - In DisorderedEntityWrapper, implement __getitem__(self, id) such that self.selected_child[id] is returned instead. This fixes most of the errors but produces/uncovers three new ones. These new errors also seem to indicate that magic methods on DisorderedEntityWrapper aren't being handled through __getattr__ in Python 3. - Fix the new errors. I'll post the patch here before pushing it upstream once I get it working. Best, Eric From eric.talevich at gmail.com Tue Oct 19 22:52:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 19 Oct 2010 22:52:27 -0400 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 10:01 PM, Eric Talevich wrote: > On Mon, Aug 16, 2010 at 9:47 AM, Peter wrote: >> Hi all, >> >> A while back I installed NumPy from their svn under Python 3, so that I >> could test more of Biopython. I hadn't really looked at Bio.PDB until >> recently because test_PDB.py depended on Bio.KDTree which needs >> some C code to be compiled (which we haven't tried yet). >> > [...] >> >> This has revealed there are at least two issues with Bio.PDB to be >> addressed (see below). >> > [...] >> >> ====================================================================== >> ERROR: test_ExposureCN (__main__.Exposure) >> HSExposureCN. >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "test_PDB.py", line 612, in setUp >> ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 64, in get_structure >> ? ?self._parse(file.readlines()) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 84, in _parse >> ? ?self.trailer=self._parse_coordinates(coords_trailer) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 200, in _parse_coordinates >> ? ?fullname, serial_number, element) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", >> line 185, in init_atom >> ? ?duplicate_atom=residue[name] >> TypeError: 'DisorderedResidue' object is not subscriptable >> > [...] > > So here's what I'm doing: > ?- In DisorderedEntityWrapper, implement __getitem__(self, id) such > that self.selected_child[id] is returned instead. This fixes most of > the errors but produces/uncovers three new ones. These new errors also > seem to indicate that magic methods on DisorderedEntityWrapper aren't > being handled through __getattr__ in Python 3. > ?- Fix the new errors. > > > I'll post the patch here before pushing it upstream once I get it working. As if we didn't have a better mechanism for this... here's a patch that seems to work on both Pythons. -Eric diff --git a/Bio/PDB/Entity.py b/Bio/PDB/Entity.py index ed17308..af2fcc7 100644 --- a/Bio/PDB/Entity.py +++ b/Bio/PDB/Entity.py @@ -165,10 +165,27 @@ class DisorderedEntityWrapper: raise AttributeError return getattr(self.selected_child, method) + def __getitem__(self, id): + "Return the child with the given id." + return self.selected_child[id] + def __setitem__(self, id, child): "Add a child, associated with a certain id." self.child_dict[id]=child + def __iter__(self): + "Return the number of children." + return iter(self.selected_child) + + def __len__(self): + "Return the number of children." + return len(self.selected_child) + + def __sub__(self, other): + """Subtraction with another object.""" + return self.selected_child - other + + # Public methods def get_id(self): From bugzilla-daemon at portal.open-bio.org Wed Oct 20 02:22:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 02:22:56 -0400 Subject: [Biopython-dev] [Bug 3147] New: AlignIO.parse doesn't raise StopIteration on empty files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3147 Summary: AlignIO.parse doesn't raise StopIteration on empty files Product: Biopython Version: 1.55 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp For example: $ rm -rf test.aln $ touch test.aln $ python Python 2.7 (r27:82500, Jul 6 2010, 13:27:45) [GCC 4.3.4 20090804 (release) 1] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import AlignIO >>> records = AlignIO.parse(open("test.aln"), 'clustal') >>> records.next() >>> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 20 05:12:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 05:12:06 -0400 Subject: [Biopython-dev] [Bug 3147] AlignIO.parse doesn't raise StopIteration on empty files In-Reply-To: Message-ID: <201010200912.o9K9C6og005150@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3147 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-20 05:12 EST ------- In this case you are getting back None - which may have been allowed back on Python 2.2, see also: http://docs.python.org/release/2.4/lib/typeiter.html I'm used to iterators either returning None or raising StopIteration at the end of the elements - but quite often I've had to write code like this: while True: try: record = i.next() except StopIteration: record = None if record is None: break ... The above documentation implies it would be correct to expect a StopIteration exception here. This also applies to some of the Bio.SeqIO parsers too I'm sure, and potentially other parsers in Biopython. To identify most issues we can just change test_SeqIO.py and test_AlignIO.py to check for the exception... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 20 06:03:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 06:03:27 -0400 Subject: [Biopython-dev] [Bug 3147] AlignIO.parse doesn't raise StopIteration on empty files In-Reply-To: Message-ID: <201010201003.o9KA3RY5006926@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3147 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-20 06:03 EST ------- Added test and fixed for AlignIO, http://github.com/biopython/biopython/commit/208d926d8e2e706a8bd5d0eee215a26c0457946c Added test for SeqIO (passes already), http://github.com/biopython/biopython/commit/246bd426094ecba9943aba2f58da8f3b7cc4a5f5 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 20 06:45:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 20 Oct 2010 11:45:33 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Wed, Oct 20, 2010 at 3:52 AM, Eric Talevich wrote: > > As if we didn't have a better mechanism for this... here's a patch > that seems to work on both Pythons. > -Eric Ha ha. Thanks for digging into this - I thought it was going to be complicated, and it sounds like it was. The patch works fine for me - please check it into the master. Cheers, Peter From eric.talevich at gmail.com Wed Oct 20 22:41:19 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 20 Oct 2010 22:41:19 -0400 Subject: [Biopython-dev] DendroPy is now BSD-licensed (was: [Nexml-discuss] NexML schema question) Message-ID: Folks, About two months ago, Jeet Sukumaran mentioned on the NeXML-discuss mailing list that he would be willing to relicense DendroPy, an excellent phylogenetics library for Python, from GPL to the more permissive BSD license. And shortly thereafter, he did: http://github.com/jeetsukumaran/DendroPy/commit/d3a91621fb62b37c311a462cae150772dd735771 For those just tuning in, DendroPy supports tree I/O in the NeXML format, but not phyloXML; Biopython supports phyloXML but not NeXML. Since the licenses are now compatible, we could probably make good use of Jeet's NeXML parsing code at the very least. Unfortunately, as evidenced by my two-month delay, I don't really have the leeway to do the integration myself this semester. But here's a heads-up anyway. Relatedly, Jaime Huerta Cepas (author of ETE, a Python Environment for Tree Exploration) indicated interest in generating phyloXML and NeXML parsers from XSD schemas -- another one to keep an eye out for. Regards, Eric ---------- Forwarded message ---------- From: Jeet Sukumaran Date: Wed, Sep 22, 2010 at 12:51 PM Subject: Re: [Nexml-discuss] NexML schema question To: Eric Talevich Cc: Jaime Huerta Cepas , "NeXML-discuss (list)" < nexml-discuss at lists.sourceforge.net> Hi Eric, Neither Mark nor I have any objections to releasing the DendroPy code to the Biopython library under the BSD license. Not sure what legalities are involved beyond saying "go for it", but if that's all it takes then "go for it!". -- jeet On 9/22/10 10:49 AM, Eric Talevich wrote: > On Sep 22, 2010, at 9:13 AM, Jaime Huerta Cepas wrote: > >> >> all I know is that Eric Talevich (the person who wrote the phyloXML parser >> in biopython) seems to be working on this, as claimed in the biopython wiki. >> But I don't think is ready yet. >> > > I took a crack at NeXML parsing a while ago, but it's nowhere near > ready, and I don't expect to be able to work on it again for several > more months. > > If you're looking for a currently usable library for working with > NeXML (I didn't catch the rest of this discussion), DendroPy is nice. > Its internal representation of tree objects isn't the same as > Biopython's Bio.Phylo, and it's GPL, so we can't just plug it directly > into Biopython (which uses a more permissive BSD-style license). But > serializing a tree to Newick from Biopython and then parsing the > Newick string in DendroPy, or the reverse, would give you some basic > interoperability. > > > What I think is that XSD schemas could be automatically parsed to >> generate parsers :) This would allow us to have a comprehensive and up to >> date parser for the NexML schema that everyone can use. >> > > Sure! PhyloXML is defined by an XSD schema, too. With this approach, > would it be possible for parsed phyloXML and NeXML tree objects to > share a base class, so the same methods are available on each? > > -Eric > > From mjldehoon at yahoo.com Sat Oct 23 05:19:26 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 23 Oct 2010 02:19:26 -0700 (PDT) Subject: [Biopython-dev] Tracking DTD files in Bio.Entrez Message-ID: <893615.35060.qm@web62401.mail.re1.yahoo.com> Hi everybody, As you may know, the parser for XML data generated by NCBI in Bio.Entrez makes use of DTD files (from NCBI) to correctly interpret the XML data. Most (if not all) DTD files are included in the Biopython distribution in Bio/Entrez/DTDs, but particularly when NCBI updates their DTD files it may happen that a required DTD file is missing. I have now modified the parser so that it tracks the URL of DTD files, so that it can access DTDs over the internet if they are not available locally. Still, parsing local DTD files is much faster than retrieving a remote DTD file, so when a DTD file is missing the parser will show a warning with the missing DTD, the URL where it can be found, and which directory it should be saved in (which typically is something like /usr/local/lib/python2.7/site-packages/Bio/Entrez/DTDs). For users who do not have write permission to this directory, it may be good to also allow storing these files in the users home directory, for example in ~/.biopython/Bio/Entrez/DTDs. If we start using such a directory, we could also consider to automatically retrieve DTD files and save them in that directory without asking the user to do that manually. I guess it's a trade-off between convenience for the user (if we download and save DTDs automatically), and transparency (we would be saving files in the user's home directory without him/her being aware of it). Any opinions? Is this a good idea? -Michiel. From biopython at maubp.freeserve.co.uk Mon Oct 25 17:28:24 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Oct 2010 22:28:24 +0100 Subject: [Biopython-dev] [Biopython] Getting involved In-Reply-To: References: Message-ID: On Mon, Oct 25, 2010 at 9:32 PM, Dragoslav Zaric wrote: > Dear Peter, > > I think that this: > > "Can you program in C and are you familiar with the C/Python > API? We will need to look at porting our C code from Python 2 > to Python 3, and this is quite complicated." > > is best idea for start. I can code in C, and have experience > both with python 2.7 and 3. Will read tomorrow about C/Python > API. > > Kind regards Hi Dragoslav, I'm glad you sound enthusiastic, and I hope you can make some progress... Our plan (following what the NumPy project are doing) is to have a single code base targeting Python 2.x. All the Python code is automatically converted using the 2to3 script into Python 3. There are a few special cases, but that work is mostly done now. All the C code will need to use #ifdef statements to make the same C file work on both Python 2 and Python 3. The bad news is that the basic API for writing C extension modules for Python has changed. What I suggest you do first, is make sure you can get the latest Biopython source code from git, compile it under Python 2, and run the unit tests. Then try 2to3 and running the tests under Python 3 (see the README file). Next I would trying updating one of the smaller C modules in Biopython to work on Python 3. You'll need to edit our setup.py to compile what you are working on (currently we compile none of the C code on Python 3). I don't yet have a feel for how much work this will be. Please sign up to the biopython-dev mailing list where we can discuss things in more detail. The main list is more for user support and general discussion. Thanks, and good luck! Peter From zaricdragoslav at gmail.com Mon Oct 25 19:34:29 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 03:34:29 +0400 Subject: [Biopython-dev] [Biopython] Getting involved In-Reply-To: References: Message-ID: Dear Peter, I have subscribed to biopython-dev mailing list and I have downloaded source code with git. kind regards On Tue, Oct 26, 2010 at 1:28 AM, Peter wrote: > On Mon, Oct 25, 2010 at 9:32 PM, Dragoslav Zaric wrote: >> Dear Peter, >> >> I think that this: >> >> "Can you program in C and are you familiar with the C/Python >> API? We will need to look at porting our C code from Python 2 >> to Python 3, and this is quite complicated." >> >> is best idea for start. I can code in C, and have experience >> both with python 2.7 and 3. Will read tomorrow about C/Python >> API. >> >> Kind regards > > Hi Dragoslav, > > I'm glad you sound enthusiastic, and I hope you can make > some progress... > > Our plan (following what the NumPy project are doing) is > to have a single code base targeting Python 2.x. > > All the Python code is automatically converted using the > 2to3 script into Python 3. There are a few special cases, > but that work is mostly done now. > > All the C code will need to use #ifdef statements to make > the same C file work on both Python 2 and Python 3. The > bad news is that the basic API for writing C extension > modules for Python has changed. > > What I suggest you do first, is make sure you can get > the latest Biopython source code from git, compile it > under Python 2, and run the unit tests. Then try 2to3 > and running the tests under Python 3 (see the README > file). > > Next I would trying updating one of the smaller C > modules in Biopython to work on Python 3. You'll > need to edit our setup.py to compile what you are > working on (currently we compile none of the C > code on Python 3). I don't yet have a feel for how > much work this will be. > > Please sign up to the biopython-dev mailing list where > we can discuss things in more detail. The main list is > more for user support and general discussion. > > Thanks, and good luck! > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 03:24:50 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 11:24:50 +0400 Subject: [Biopython-dev] Plan for upgrade Message-ID: Dear Peter, This is what I found on python 3 web pages: ---------------------------------------------------------------------------------------------------------------- Porting To Python 3.0 For porting existing Python 2.5 or 2.6 source code to Python 3.0, the best strategy is the following: (Prerequisite:) Start with excellent test coverage. Port to Python 2.6. This should be no more work than the average port from Python 2.x to Python 2.(x+1). Make sure all your tests pass. (Still using 2.6:) Turn on the -3 command line switch. This enables warnings about features that will be removed (or change) in 3.0. Run your test suite again, and fix code that you get warnings about until there are no warnings left, and all your tests still pass. Run the 2to3 source-to-source translator over your source code tree. (See 2to3 - Automated Python 2 to 3 code translation for more on this tool.) Run the result of the translation under Python 3.0. Manually fix up any remaining issues, fixing problems until all tests pass again. It is not recommended to try to write source code that runs unchanged under both Python 2.6 and 3.0; you?d have to use a very contorted coding style, e.g. avoiding print statements, metaclasses, and much more. If you are maintaining a library that needs to support both Python 2.6 and Python 3.0, the best approach is to modify step 3 above by editing the 2.6 version of the source code and running the 2to3 translator again, rather than editing the 3.0 version of the source code. ---------------------------------------------------------------------------------------------------------------- And this is page for 2to3 translator: http://docs.python.org/release/3.0.1/library/2to3.html#to3-reference So can we start to agree on approach and tactics. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 03:34:31 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 11:34:31 +0400 Subject: [Biopython-dev] Changed in C API Message-ID: Dear Peter, The list of changes in Python 3 is not complete. This is current list: ------------------------------------------------------------------------------------------------ Due to time constraints, here is a very incomplete list of changes to the C API. Support for several platforms was dropped, including but not limited to Mac OS 9, BeOS, RISCOS, Irix, and Tru64. PEP 3118: New Buffer API. PEP 3121: Extension Module Initialization & Finalization. PEP 3123: Making PyObject_HEAD conform to standard C. No more C API support for restricted execution. PyNumber_Coerce, PyNumber_CoerceEx, PyMember_Get, and PyMember_Set C APIs are removed. New C API PyImport_ImportModuleNoBlock, works like PyImport_ImportModule but won?t block on the import lock (returning an error instead). Renamed the boolean conversion C-level slot and method: nb_nonzero is now nb_bool. Removed METH_OLDARGS and WITH_CYCLE_GC from the C API. ------------------------------------------------------------------------------------------------ Can you tell me what are exactly versions that we are converting, from 2.7 to 3.0.1 ?? This is also what I have read on python 3 web site: ------------------------------------------------------------------------------------------------ The net result of the 3.0 generalizations is that Python 3.0 runs the pystone benchmark around 10% slower than Python 2.5. Most likely the biggest cause is the removal of special-casing for small integers. There?s room for improvement, but it will happen after 3.0 is released! ------------------------------------------------------------------------------------------------ This means that python 3 is still no optimized or like all software start to be worse with new versions :) Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 04:43:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 09:43:48 +0100 Subject: [Biopython-dev] Plan for upgrade In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 8:24 AM, Dragoslav Zaric wrote: > Dear Peter, > > This is what I found on python 3 web pages: > ---------------------------------------------------------------------------------------------------------------- > Porting To Python 3.0 > For porting existing Python 2.5 or 2.6 source code to Python 3.0, the > best strategy is the following: > > (Prerequisite:) Start with excellent test coverage. > Port to Python 2.6. This should be no more work than the average port > from Python 2.x to Python 2.(x+1). Make sure all your tests pass. > (Still using 2.6:) Turn on the -3 command line switch. This enables > warnings about features that will be removed (or change) in 3.0. Run > your test suite again, and fix code that you get warnings about until > there are no warnings left, and all your tests still pass. > Run the 2to3 source-to-source translator over your source code tree. > (See 2to3 - Automated Python 2 to 3 code translation for more on this > tool.) Run the result of the translation under Python 3.0. Manually > fix up any remaining issues, fixing problems until all tests pass > again. > It is not recommended to try to write source code that runs unchanged > under both Python 2.6 and 3.0; you?d have to use a very contorted > coding style, e.g. avoiding print statements, metaclasses, and much > more. If you are maintaining a library that needs to support both > Python 2.6 and Python 3.0, the best approach is to modify step 3 above > by editing the 2.6 version of the source code and running the 2to3 > translator again, rather than editing the 3.0 version of the source > code. > ---------------------------------------------------------------------------------------------------------------- > > And this is page for 2to3 translator: > > http://docs.python.org/release/3.0.1/library/2to3.html#to3-reference > > So can we start to agree on approach and tactics. > > Kind regards Hi Dragoslav, Yes, that is basically what we are doing for the pure python code. We still write our code for Python 2.x (currently Python 2.4 to 2.7), and then use 2to3 convert it to work on Python 3.x (currently testing on 3.1, at the end of the year we'll be trying the planned Python 3.2 beta as well). That is the easy part - its the C code we need to handle now for our extension modules (and the 2to3 script does not do this). Perhaps I was too concise earlier. http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008311.html Peter From biopython at maubp.freeserve.co.uk Tue Oct 26 04:47:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 09:47:58 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 8:34 AM, Dragoslav Zaric wrote: > Dear Peter, > > The list of changes in Python 3 is not complete. This is current list: > ... > > Can you tell me what are exactly versions that we are converting, from > 2.7 to 3.0.1 ?? We currently support Python 2.4 to 2.7 (but plan to drop support for Python 2.4 soon). We've been testing on Python 3.1 and except to support later versions as they are released. I personally don't really care about Python 3.0 (it would be nice if that works too, but it is not essential). > This means that python 3 is still no optimized or like all software > start to be worse with new versions :) Python 3.1 is already out and is faster than Python 3.0. Some things are still slower than Python 2 though, in particular we've noticed this for parsing since by default Python 3 uses unicode instead of byte strings. Peter From zaricdragoslav at gmail.com Tue Oct 26 05:11:32 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 13:11:32 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Ok Peter, First it is my mistake that I was talking about python code upgrade. I understand that you want me to do C code upgrade, but at the end all should work together so this is why I looked at overall upgrade process. I have downloaded latest python source code and I have searched for .c files and .h files and this is what I have found: Bio\Cluster\cluster.c Bio\Cluster\clustermodule.c Bio\cMarkovModelmodule.c Bio\cpairwise2module.c Bio\csupport.c Bio\KDTree\KDTree.c Bio\KDTree\KDTreemodule.c Bio\Motif\_pwm.c Bio\Nexus\cnexus.c Bio\PDB\mmCIF\lex.yy.c Bio\PDB\mmCIF\mmcif_test.c Bio\PDB\mmCIF\MMCIFlexmodule.c Bio\trie.c Bio\triemodule.c Bio\Cluster\cluster.h Bio\csupport.h Bio\KDTree\KDTree.h Bio\KDTree\Neighbor.h Bio\trie.h Are these all files you want me to upgrade to python 3.1 ? Kind regards On Tue, Oct 26, 2010 at 12:47 PM, Peter wrote: > On Tue, Oct 26, 2010 at 8:34 AM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> The list of changes in Python 3 is not complete. This is current list: >> ... >> >> Can you tell me what are exactly versions that we are converting, from >> 2.7 to 3.0.1 ?? > > We currently support Python 2.4 to 2.7 (but plan to drop support > for Python 2.4 soon). We've been testing on Python 3.1 and except > to support later versions as they are released. > > I personally don't really care about Python 3.0 (it would be nice if > that works too, but it is not essential). > >> This means that python 3 is still no optimized or like all software >> start to be worse with new versions :) > > Python 3.1 is already out and is faster than Python 3.0. Some things > are still slower than Python 2 though, in particular we've noticed this > for parsing since by default Python 3 uses unicode instead of byte > strings. > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 05:47:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 10:47:21 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric wrote: > Ok Peter, > > First it is my mistake that I was talking about python code upgrade. > I understand that you want me to do C code upgrade, but at the > end all should work together so this is why I looked at overall upgrade > process. > > I have downloaded latest python source code and I have searched > for .c files and .h files and this is what I have found: > > Bio\Cluster\cluster.c > Bio\Cluster\clustermodule.c > Bio\cMarkovModelmodule.c > Bio\cpairwise2module.c > Bio\csupport.c > Bio\KDTree\KDTree.c > Bio\KDTree\KDTreemodule.c > Bio\Motif\_pwm.c > Bio\Nexus\cnexus.c > Bio\PDB\mmCIF\lex.yy.c > Bio\PDB\mmCIF\mmcif_test.c > Bio\PDB\mmCIF\MMCIFlexmodule.c > Bio\trie.c > Bio\triemodule.c > > Bio\Cluster\cluster.h > Bio\csupport.h > Bio\KDTree\KDTree.h > Bio\KDTree\Neighbor.h > Bio\trie.h What OS are you using? From the slashes I'd guess Windows (which may complicate things - getting the compilers all setup is more work). > > Are these all files you want me to upgrade to python 3.1 ? > Yes - but not all of them are equally important, and some will be more complicated to port. For example, the Nexus, MarkovModelmodule and cMarkovModelmodule C code have a Python fallback (i.e. the C code is not essential, just faster). Some of those (e.g. Bio.Cluster and Bio.KDTree) depend on NumPy, which may make things more complicated. You will need to install NumPy (for both Python 2 and 3). Some may have string encoding issues (bytes vs unicode), e.g. Nexus, Motif The mmCIF module is not urgent. This is a file parser for the Bio.PDB code, and we have discussed replacing this in C. One reason for this is it currently depends on the 3rd party library flex. I think Bio/Motif/_pwm.c would be a good module to start with. It is a short simple module exposing a single function to Python. You should read this: http://wiki.python.org/moin/PortingExtensionModulesToPy3k Peter From zaricdragoslav at gmail.com Tue Oct 26 06:01:46 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:01:46 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Do not worry Peter, I am writing from work, that is why I am using windows. At home I have two lap tops and both are Linux :) I do not have windows on any partition :) I use and developo only on Linux outside of main job. Ok, when I get home I will read http://wiki.python.org/moin/PortingExtensionModulesToPy3k and start to work on Bio/Motif/_pwm.c Kind regards On Tue, Oct 26, 2010 at 1:47 PM, Peter wrote: > On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric > wrote: >> Ok Peter, >> >> First it is my mistake that I was talking about python code upgrade. >> I understand that you want me to do C code upgrade, but at the >> end all should work together so this is why I looked at overall upgrade >> process. >> >> I have downloaded latest python source code and I have searched >> for .c files and .h files and this is what I have found: >> >> Bio\Cluster\cluster.c >> Bio\Cluster\clustermodule.c >> Bio\cMarkovModelmodule.c >> Bio\cpairwise2module.c >> Bio\csupport.c >> Bio\KDTree\KDTree.c >> Bio\KDTree\KDTreemodule.c >> Bio\Motif\_pwm.c >> Bio\Nexus\cnexus.c >> Bio\PDB\mmCIF\lex.yy.c >> Bio\PDB\mmCIF\mmcif_test.c >> Bio\PDB\mmCIF\MMCIFlexmodule.c >> Bio\trie.c >> Bio\triemodule.c >> >> Bio\Cluster\cluster.h >> Bio\csupport.h >> Bio\KDTree\KDTree.h >> Bio\KDTree\Neighbor.h >> Bio\trie.h > > What OS are you using? From the slashes I'd guess > Windows (which may complicate things - getting the > compilers all setup is more work). > >> >> Are these all files you want me to upgrade to python 3.1 ? >> > > Yes - but not all of them are equally important, and some > will be more complicated to port. > > For example, the Nexus, MarkovModelmodule and > cMarkovModelmodule C code have a Python fallback > (i.e. the C code is not essential, just faster). > > Some of those (e.g. Bio.Cluster and Bio.KDTree) depend > on NumPy, which may make things more complicated. > You will need to install NumPy (for both Python 2 and 3). > > Some may have string encoding issues (bytes vs unicode), > e.g. Nexus, Motif > > The mmCIF module is not urgent. This is a file parser for > the Bio.PDB code, and we have discussed replacing this > in C. One reason for this is it currently depends on the > 3rd party library flex. > > I think Bio/Motif/_pwm.c would be a good module to start > with. It is a short simple module exposing a single > function to Python. > > You should read this: > http://wiki.python.org/moin/PortingExtensionModulesToPy3k > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 06:03:30 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:03:30 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Dear Peter, You write this: "What I suggest you do first, is make sure you can get the latest Biopython source code from git, compile it under Python 2, and run the unit tests. Then try 2to3 and running the tests under Python 3 (see the README file)." Can you tell me how do I run unit tests in any python version ?? Are there unit tests for C modules or these tests cover everything ?? Kind regards On Tue, Oct 26, 2010 at 2:01 PM, Dragoslav Zaric wrote: > Do not worry Peter, > > I am writing from work, that is why I am using windows. At home I have > two lap tops > and both are Linux :) I do not have windows on any partition :) > > I use and developo only on Linux outside of main job. > > Ok, when I get home I will read > > http://wiki.python.org/moin/PortingExtensionModulesToPy3k > > and start to work on > > Bio/Motif/_pwm.c > > Kind regards > > > On Tue, Oct 26, 2010 at 1:47 PM, Peter wrote: >> On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric >> wrote: >>> Ok Peter, >>> >>> First it is my mistake that I was talking about python code upgrade. >>> I understand that you want me to do C code upgrade, but at the >>> end all should work together so this is why I looked at overall upgrade >>> process. >>> >>> I have downloaded latest python source code and I have searched >>> for .c files and .h files and this is what I have found: >>> >>> Bio\Cluster\cluster.c >>> Bio\Cluster\clustermodule.c >>> Bio\cMarkovModelmodule.c >>> Bio\cpairwise2module.c >>> Bio\csupport.c >>> Bio\KDTree\KDTree.c >>> Bio\KDTree\KDTreemodule.c >>> Bio\Motif\_pwm.c >>> Bio\Nexus\cnexus.c >>> Bio\PDB\mmCIF\lex.yy.c >>> Bio\PDB\mmCIF\mmcif_test.c >>> Bio\PDB\mmCIF\MMCIFlexmodule.c >>> Bio\trie.c >>> Bio\triemodule.c >>> >>> Bio\Cluster\cluster.h >>> Bio\csupport.h >>> Bio\KDTree\KDTree.h >>> Bio\KDTree\Neighbor.h >>> Bio\trie.h >> >> What OS are you using? From the slashes I'd guess >> Windows (which may complicate things - getting the >> compilers all setup is more work). >> >>> >>> Are these all files you want me to upgrade to python 3.1 ? >>> >> >> Yes - but not all of them are equally important, and some >> will be more complicated to port. >> >> For example, the Nexus, MarkovModelmodule and >> cMarkovModelmodule C code have a Python fallback >> (i.e. the C code is not essential, just faster). >> >> Some of those (e.g. Bio.Cluster and Bio.KDTree) depend >> on NumPy, which may make things more complicated. >> You will need to install NumPy (for both Python 2 and 3). >> >> Some may have string encoding issues (bytes vs unicode), >> e.g. Nexus, Motif >> >> The mmCIF module is not urgent. This is a file parser for >> the Bio.PDB code, and we have discussed replacing this >> in C. One reason for this is it currently depends on the >> 3rd party library flex. >> >> I think Bio/Motif/_pwm.c would be a good module to start >> with. It is a short simple module exposing a single >> function to Python. >> >> You should read this: >> http://wiki.python.org/moin/PortingExtensionModulesToPy3k >> >> Peter >> > > > > -- > Dragoslav Zaric > > Professional Programmer > MSc Astrophysics > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 06:12:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 11:12:02 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 11:03 AM, Dragoslav Zaric wrote: > Dear Peter, > > You write this: > > "What I suggest you do first, is make sure you can get > the latest Biopython source code from git, compile it > under Python 2, and run the unit tests. Then try 2to3 > and running the tests under Python 3 (see the README > file)." > > Can you tell me how do I run unit tests in any python version ?? Have a look at the "The Biopython testing framework" chapter in the tutorial (although this does not talk about Python 3). For python 2.x, from the Tests directory do: python run_tests.py For a particular version of Python, do: python2.6 run_tests.py For Python 3.x first convert the code with 2to3 as described in the README file, then: python3 run_tests.py For a particular version of Python 3, do: python3.1 run_tests.py You can run selected tests rather than all of them, e.g. python run_tests.py test_Motif.py > > Are there unit tests for C modules or these tests cover everything ?? > The tests are all written in Python, and will test the C modules via their Python interface. Peter From zaricdragoslav at gmail.com Tue Oct 26 06:59:05 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:59:05 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Ok Peter, I will work on this tonight and let you know how is it going, Kind regards On Tue, Oct 26, 2010 at 2:12 PM, Peter wrote: > On Tue, Oct 26, 2010 at 11:03 AM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> You write this: >> >> "What I suggest you do first, is make sure you can get >> the latest Biopython source code from git, compile it >> under Python 2, and run the unit tests. Then try 2to3 >> and running the tests under Python 3 (see the README >> file)." >> >> Can you tell me how do I run unit tests in any python version ?? > > Have a look at the "The Biopython testing framework" > chapter in the tutorial (although this does not talk about > Python 3). > > For python 2.x, from the Tests directory do: > > python run_tests.py > > For a particular version of Python, do: > > python2.6 run_tests.py > > For Python 3.x first convert the code with 2to3 as described > in the README file, then: > > python3 run_tests.py > > For a particular version of Python 3, do: > > python3.1 run_tests.py > > You can run selected tests rather than all of them, e.g. > > python run_tests.py test_Motif.py > >> >> Are there unit tests for C modules or these tests cover everything ?? >> > > The tests are all written in Python, and will test the C modules via > their Python interface. > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 12:28:00 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 20:28:00 +0400 Subject: [Biopython-dev] Test python 2.6 Message-ID: Dear Peter, I run python run_tests.py in Tests folder with python 2.6.2 I got one error in test file test_SeqIO_online.py I open the file and went to line that caused error and it looks like it is not functional error, it is just data error, because there is no data for in database for supplied parameters: ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") So I commented this line and leave other two and all tests passed after this. Anyway, now I am installing python 3.1.2 and will run tests when finish installation. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 12:41:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 17:41:57 +0100 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 5:28 PM, Dragoslav Zaric wrote: > Dear Peter, > > I run > > python run_tests.py in Tests folder with python 2.6.2 I got one error > in test file test_SeqIO_online.py > I open the file and went to line that caused error and it looks like > it is not functional error, it is just data error, because there is no data > for in database for supplied parameters: > > ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") > > So I commented this line and leave other two and all tests passed after this. I'd noticed that failing a little while back, and had assumed it was just a temporary network problem. In fact looks like the NCBI have changed how searching against the genome database works. This update fixes the test on Python 2.6: http://github.com/biopython/biopython/commit/ad1dd31828c1488c72bffba3bc769c012439ea90 > Anyway, now I am installing python 3.1.2 and will run tests when > finish installation. Note there are some known failures on Python 3, this includes test_SeqIO_online.py (bytes vs unicode). Peter From zaricdragoslav at gmail.com Tue Oct 26 12:45:37 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 20:45:37 +0400 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: I run now 2to3 on biopython folder and when it finish I will run run_tests.py This will also test C modules ? Kind regards On Tue, Oct 26, 2010 at 8:41 PM, Peter wrote: > On Tue, Oct 26, 2010 at 5:28 PM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> I run >> >> python run_tests.py in Tests folder with python 2.6.2 I got one error >> in test file test_SeqIO_online.py >> I open the file and went to line that caused error and it looks like >> it is not functional error, it is just data error, because there is no data >> for in database for supplied parameters: >> >> ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") >> >> So I commented this line and leave other two and all tests passed after this. > > I'd noticed that failing a little while back, and had assumed it was just a > temporary network problem. In fact looks like the NCBI have changed > how searching against the genome database works. This update fixes > the test on Python 2.6: > > http://github.com/biopython/biopython/commit/ad1dd31828c1488c72bffba3bc769c012439ea90 > >> Anyway, now I am installing python 3.1.2 and will run tests when >> finish installation. > > Note there are some known failures on Python 3, this includes > test_SeqIO_online.py (bytes vs unicode). > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 12:59:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 17:59:44 +0100 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 5:45 PM, Dragoslav Zaric wrote: > I run now 2to3 on biopython folder and when it finish I will run run_tests.py > This will also test C modules ? Using run_tests.py would cover everything unless it has been disabled on Python 3, or depends on some C code which hasn't been compiled (in which case the test should be skipped). Note we've edited setup.py not to try and compile any C code on Python 3 (because currently none of it works). You'll need to edit setup.py to compile any C code you work on for Python 3. For C modules which don't use NumPy, change this bit: ... elif sys.version_info[0] == 3: # TODO - Must update our C extensions for Python 3 EXTENSIONS = [] ... For extensions using NumPy, see class build_ext_biopython Peter From barwil at gmail.com Tue Oct 26 16:46:50 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Tue, 26 Oct 2010 22:46:50 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi all, I've added the Bio.Motif section to the tutorial and pushed this to github. I was able to build the tutorial in pdf, but I'm not sure about the html version and whether it works for other people. Any other comments are welcome as well cheers Bartek On Tue, Oct 19, 2010 at 2:45 PM, Peter wrote: > On Tue, Oct 19, 2010 at 1:34 PM, Bartek Wilczynski > wrote: > > Hi, > > > > I've started to look into merging Bio.Motif docs with the Tutorial. I > have a > > few questions: > > - First, I need to find a good place in the tutorial to put it. > > One possibility is to make a separate chapter for it, another option > is > > to put it as a subchapter in chapter 15 (cookbook). > > I think it would be better to make it a separate chapter, similar to > one > > the ones discussing Bio.popgen or bio.phylo, So i thought it would make > > sense to create it as a new chapter 13, entitled Sequence motif analysis > > with Bio.Motif > > I agree, create a new chapter (and add yourself to the authors list). > I'd definitely put it before the "Cookbook Chapter", and between the > Phylogenetics and "Supervised learning methods" chapters seems > reasonable. > > > -second, I have links and references to papers in there. The question > would > > be should I remove those to keep to the style of the tutorial > > Keep them - links to external webpages are fine - they work well in both > PDF > and HTML. For references we currently don't have a formal bibliography - > but > we do have some existing case of links to papers already, e.g. > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:SeqIO-fastq-conversion > > Peter > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Oct 26 17:53:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 22:53:47 +0100 Subject: [Biopython-dev] Tests in python 3.1.2 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 9:41 PM, Dragoslav Zaric wrote: > Dear Peter, > > I installed python 3.1.2, than run > > 2to3 -w biopython Don't do that - see our README file: 2to3 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples 2to3 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples You have to run 2to3 twice (strange design choice in the tool, this is once for the code, and again with -d for the doctests which are code examples within the docstring comments). You also need to turn off the "long" fixer (otherwise it causes problems in Bio.Phylo). > and after that > > python3.1 run_tests.py > > I capture screen output in log.txt file that I am sending you in attachment. > > Based on this log, can you advise me which way to go. Fix error one by one, > or maybe I made mistake in installation/upgrade. The attachment is too big for the mailing list, so your message was rejected. I hope that helps. Peter From biopython at maubp.freeserve.co.uk Tue Oct 26 18:01:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 23:01:08 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 9:46 PM, Bartek Wilczynski wrote: > Hi all, > > I've added the Bio.Motif section to the tutorial and pushed this to github. > I was able to build the tutorial in pdf, but I'm not sure about the html > version and whether it works for other people. > > Any other comments are welcome as well > > cheers > Bartek Thanks Bartek - the HTML looks fine (but I haven't read it all yet): http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html That should be updated automatically by a cron task running under my username - let me know if it looks out of date. Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 06:22:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:22:59 +0100 Subject: [Biopython-dev] Bio.Motif and FASTA output Message-ID: Hi Bartek, I noticed a concern with one of your examples in the tutorial, going from a Motif object to FASTA format, >>> print m.format("fasta") > instance 0 TATAA > instance 1 TATTA > instance 2 TATAA > instance 3 TATAA Our FASTA parser will treat that has having no identifiers (because it goes greater than sign, space, text). How about this: >>> print m.format("fasta") >instance0 TATAA >instance1 TATTA >instance2 TATAA >instance3 TATAA With the above output, each sequence gets a unique identifier. Peter From barwil at gmail.com Wed Oct 27 06:34:52 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 12:34:52 +0200 Subject: [Biopython-dev] Bio.Motif and FASTA output In-Reply-To: References: Message-ID: Thanks for spotting the problem. Fixed now. cheers Bartek On Wed, Oct 27, 2010 at 12:22 PM, Peter wrote: > Hi Bartek, > > I noticed a concern with one of your examples in the tutorial, going > from a Motif object to FASTA format, > > >>> print m.format("fasta") > > instance 0 > TATAA > > instance 1 > TATTA > > instance 2 > TATAA > > instance 3 > TATAA > > Our FASTA parser will treat that has having no identifiers (because > it goes greater than sign, space, text). How about this: > > >>> print m.format("fasta") > >instance0 > TATAA > >instance1 > TATTA > >instance2 > TATAA > >instance3 > TATAA > > With the above output, each sequence gets a unique identifier. > > Peter > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Oct 27 06:34:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:34:39 +0100 Subject: [Biopython-dev] Bio.Motif length Message-ID: Hi Bartek, (Another query after scanning over your new text in the tutorial) Why do you have motif.length when len(motif) seems to do basically the same thing? Can we deprecate the length property (Zen of Python: There should be one -- and preferably only one -- obvious way to do it)? Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 06:46:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:46:21 +0100 Subject: [Biopython-dev] Bio.Motif and FASTA output In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 11:34 AM, Bartek Wilczynski wrote: > Thanks for spotting the problem. Fixed now. > > cheers > Bartek Thanks. BTW - Do you have two git usernames? Your recent commits show up as authored by barwil but committed by bartekw - curious. Peter From barwil at gmail.com Wed Oct 27 06:53:58 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 12:53:58 +0200 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: Hi Peter, On Wed, Oct 27, 2010 at 12:34 PM, Peter wrote: > > Why do you have motif.length when len(motif) seems to do > basically the same thing? Can we deprecate the length > property (Zen of Python: There should be one -- and > preferably only one -- obvious way to do it)? > > I guess this is there just out of habit. I know that the .length property and I tend to use it, but I agree that in the tutorial we should use len(m) instead of m.length. Speaking more globally, the length property is there from the beginning, I don't think we should remove it. If we really want to make the API clean, we could rename it to m._length to indicate that it should not be used directly (especially setting it to some other value could have unwanted consequences). I can make the change in the tutorial (I need to change the expected output of m.format("fasta") anyway), but making the change from .length to ._length in the code would require a bit more time to make sure I'm not using it anywhere in the code. What is your suggestion here? cheers B From biopython at maubp.freeserve.co.uk Wed Oct 27 07:03:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 12:03:15 +0100 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 11:53 AM, Bartek Wilczynski wrote: > Hi Peter, > > On Wed, Oct 27, 2010 at 12:34 PM, Peter wrote: > >> >> Why do you have motif.length when len(motif) seems to do >> basically the same thing? Can we deprecate the length >> property (Zen of Python: There should be one -- and >> preferably only one -- obvious way to do it)? >> > > I guess this is there just out of habit. I know that the .length property > and I tend to use it, but I agree that in the tutorial we should use len(m) > instead of m.length. > > Speaking more globally, the length property is there from the beginning, I > don't think we should remove it. If we really want to make the API clean, we > could rename it to m._length to indicate that it should not be used directly > (especially setting it to some other value could have unwanted > consequences). > > I can make the change in the tutorial (I need to change the expected output > of m.format("fasta") anyway), but making the change from .length to ._length > in the code would require a bit more time to make sure I'm not using it > anywhere in the code. What is your suggestion here? What I would suggest is right now: (1) Use len(...) in the tutorial and any docstrings. Also in the __len__ docstring you could mention that using the .length property is discouraged. Then later as your time permits, (2) Rename self.length to self._length throughout the code, check tests pass (3) Add a property length which acts as a proxy for self._length and say in the docstring that you encourage len(...) instead. This is to ensure existing code using .length still works. Then later, (4) Add a deprecation warning to the new length property. One year and two releases later: (5) Remove the length property (leaving the private _length property only). Peter From barwil at gmail.com Wed Oct 27 09:10:46 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 15:10:46 +0200 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: Hi, On Wed, Oct 27, 2010 at 1:03 PM, Peter wrote: > > What I would suggest is right now: > > (1) Use len(...) in the tutorial and any docstrings. Also in the > __len__ docstring you could mention that using the .length > property is discouraged. > > This is now done and commited to the trunk. > Then later as your time permits, > > (2) Rename self.length to self._length throughout the code, check > tests pass > (3) Add a property length which acts as a proxy for self._length > and say in the docstring that you encourage len(...) instead. > This is to ensure existing code using .length still works. > > I'll put these things on my todo list, and I'll make them on a branch, not to mess things up. > Then later, > > (4) Add a deprecation warning to the new length property. > > One year and two releases later: > > (5) Remove the length property (leaving the private _length > property only). > > Is there a scheduled time for the next release? I'm just asking to see whether I can try to still squeeze it into the nearest release or it will need to wait for the next one. Thanks for your input Bartek From biopython at maubp.freeserve.co.uk Wed Oct 27 09:25:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 14:25:02 +0100 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 2:10 PM, Bartek Wilczynski wrote: > > Is there a scheduled time for the next release? I'm just asking to see > whether I can try to still squeeze it into the nearest release or it will > need to wait for the next one. > I was thinking some point next month (November 2010), certainly we want to do this well before the end of the year (when the NCBI will be changing the DTD files for Entrez). Peter From zaricdragoslav at gmail.com Wed Oct 27 11:19:13 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Wed, 27 Oct 2010 19:19:13 +0400 Subject: [Biopython-dev] upgrade to python 3.1.2 Message-ID: Hi Peter, I did everything from scratch, get biopython with git, than tun those two commands for 2to3 from README file and at the end I run run_tests.py from Tests folder. I am sending you log file just to check am I on right track. I will continue to investigate errors. One error is related to numpy module, so I will try to install numpy for python 3.1.2 Anyway, two test FAIL, test_SeqIO_online and test_Wise Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics -------------- next part -------------- test_Ace ... ok test_AlignIO ... ok test_AlignIO_convert ... ok test_BioSQL ... /home/maiev/work/biopython/BioSQL/Loader.py:799: UserWarning: order location operators are not fully supported % feature.location_operator) ok test_BioSQL_SeqIO ... /home/maiev/work/biopython/BioSQL/Loader.py:799: UserWarning: bond location operators are not fully supported % feature.location_operator) ok test_CAPS ... ok test_Clustalw ... /home/maiev/work/biopython/Bio/Clustalw/__init__.py:83: PendingDeprecationWarning: This function is obsolete, and any new code should call Bio.AlignIO instead. warnings.warn("This function is obsolete, and any new code should call Bio.AlignIO instead.", PendingDeprecationWarning) ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... skipping. Install NumPy if you want to use Bio.KDTree. test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... skipping. This deprecated module doesn't work on Python 3. test_LogisticRegression ... skipping. Install NumPy if you want to use Bio.LogisticRegression. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... skipping. Install NumPy if you want to use Bio.MarkovModel. test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:53: PendingDeprecationWarning: The plain text parser in this module still works at the time of writing, but is considered obsolete and updating it to cope with the latest versions of BLAST is not a priority for us. warnings.warn("The plain text parser in this module still works at the time of writing, but is considered obsolete and updating it to cope with the latest versions of BLAST is not a priority for us.", PendingDeprecationWarning) /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:1850: PendingDeprecationWarning: This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastpgpCommandline instead. warnings.warn("This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastpgpCommandline instead.", PendingDeprecationWarning) /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:1970: PendingDeprecationWarning: This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastrpsCommandline instead. warnings.warn("This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastrpsCommandline instead.", PendingDeprecationWarning) ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... skipping. Install NumPy if you want to use Bio.PDB. test_PDB_KDTree ... skipping. Install NumPy if you want to use Bio.PDB. test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SVDSuperimposer ... skipping. Install NumPy if you want to use Bio.SVDSuperimposer. test_SeqIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... skipping. Skipping since currently this is very slow on Python 3. test_SeqIO_online ... FAIL test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_UniGene ... ok test_UniGene_obsolete ... ok test_Wise ... FAIL test_align ... ok test_geo ... ok test_kNN ... ERROR test_lowess ... skipping. Install NumPy if you want to use Bio.Statistics.lowess. test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... skipping. This deprecated module doesn't work on Python 3. test_prosite1 ... ok test_prosite2 ... ok test_prosite_patterns ... skipping. The (deprecated) Bio.Prosite module uses the Python library sgmllib which is not supported on Python 3 test_psw ... ok test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Blast.Applications docstring test ... ok Bio.Clustalw docstring test ... ok Bio.Emboss.Applications docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Motif docstring test ... ok ====================================================================== ERROR: test_nuccore_X52960 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(nuccore, X52960, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== ERROR: test_nucleotide_6273291 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(nucleotide, 6273291, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== ERROR: test_protein_16130152 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(protein, 16130152, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== FAIL: test_dnal (test_Wise.TestWiseDryRun) Call dnal, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 26, in test_dnal self.assertTrue(sys.stdout.getvalue().startswith("dnal -kbyte 100000 seq1.fna seq2.fna")) AssertionError: False is not True ====================================================================== FAIL: test_psw (test_Wise.TestWiseDryRun) Call psw, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 31, in test_psw self.assertTrue(sys.stdout.getvalue().startswith("psw -kbyte 4 seq1.faa seq2.faa")) AssertionError: False is not True ====================================================================== ERROR: test_kNN ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_kNN.py", line 12, in import numpy ImportError: No module named numpy During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run_tests.py", line 289, in runTest suite = unittest.TestLoader().loadTestsFromName(name) File "/usr/local/lib/python3.1/unittest.py", line 1266, in loadTestsFromName module = __import__('.'.join(parts_copy)) File "/home/maiev/work/biopython/Tests/test_kNN.py", line 15, in raise MissingPythonDependencyError( NameError: name 'MissingPythonDependencyError' is not defined ---------------------------------------------------------------------- Ran 140 tests in 478.298 seconds FAILED (failures = 3) From biopython at maubp.freeserve.co.uk Wed Oct 27 11:33:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 16:33:05 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 4:19 PM, Dragoslav Zaric wrote: > Hi Peter, > > I did everything from scratch, get biopython with git, than tun those > two commands for 2to3 from README file and at the end I run > run_tests.py from Tests folder. > > I am sending you log file just to check am I on right track. I will > continue to investigate errors. > > One error is related to numpy module, so I will try to install numpy > for python 3.1.2 > > Anyway, two test FAIL, test_SeqIO_online and test_Wise > > Kind regards The problem with test_kNN.py was my mistake - it is meant to be skipped when numpy is not installed. Fixed here: http://github.com/biopython/biopython/commit/2ae15f94e7e90b237e982145f9697157ed1f801e The "IndexError: index out of range" problem on Python 3 with test_SeqIO_online.py is the known failure I mentioned before. This is to do with bytes versus unicode handles. The output from test_Wise.py is unexpected through (I don't have Wise installed on my Mac - I should do that): ====================================================================== FAIL: test_psw (test_Wise.TestWiseDryRun) Call psw, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 31, in test_psw self.assertTrue(sys.stdout.getvalue().startswith("psw -kbyte 4 seq1.faa seq2.faa")) AssertionError: False is not True Hopefully with the following change we'll get a more useful message: http://github.com/biopython/biopython/commit/811f5ced0305fa41539b8867c594a119135ef682 Could you update your Biopython and re-test? You'll have to repeat the 2to3 conversion, e.g. git reset --hard 2to3 ... etc Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 11:46:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 16:46:13 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 4:40 PM, Dragoslav Zaric wrote: > > ok, will do that and send you log file again, > You can just cut and paste the error messages - that should be all we need. Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 12:35:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 17:35:11 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 5:14 PM, Dragoslav Zaric wrote: > > Ok, errors: > > test_SeqIO_online ... FAIL > test_Wise ... FAIL > > ====================================================================== > ERROR: test_nuccore_X52960 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(nuccore, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range > > ====================================================================== > ERROR: test_nucleotide_6273291 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(nucleotide, 6273291, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range > > ====================================================================== > ERROR: test_protein_16130152 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(protein, 16130152, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range We're ignoring the above problem with test_SeqIO_online.py on Python 3 for now. > ====================================================================== > FAIL: test_dnal (test_Wise.TestWiseDryRun) > Call dnal, and do a trivial check on its output. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/maiev/work/biopython/Tests/test_Wise.py", line 27, in test_dnal > ? ?self.assertTrue(output.startswith("dnal -kbyte 100000 seq1.fna > seq2.fna"), output[:200]) > AssertionError: dnal -kbyte 100000 -quiet seq1.fna seq2.fna > /tmp/tmpEVkZM8 > > > ====================================================================== > FAIL: test_psw (test_Wise.TestWiseDryRun) > Call psw, and do a trivial check on its output. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/maiev/work/biopython/Tests/test_Wise.py", line 33, in test_psw > ? ?self.assertTrue(output.startswith("psw -kbyte 4 seq1.faa > seq2.faa"), output[:200]) > AssertionError: psw -kbyte 4 -quiet seq1.faa seq2.faa > /tmp/tmpOJ3QL3 I remember this issue now: http://lists.open-bio.org/pipermail/biopython-dev/2010-June/007904.html (very end) ... http://lists.open-bio.org/pipermail/biopython-dev/2010-June/007908.html This was due to the psw/dnal wrappers sometimes automatically including the command line switch -quiet switch. It happens if you redirect the unit test output to a file. This change should solve it: http://github.com/biopython/biopython/commit/4f430adad7a5b8bc021dec8b188963ca76612393 Thanks! Peter From tiagoantao at gmail.com Wed Oct 27 14:36:46 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Oct 2010 19:36:46 +0100 Subject: [Biopython-dev] README and python3 Message-ID: Hi, Just a minor issue with the README and python3. The option --nofix does not exist in 2to3 for the 2.x version. So that line will not work if the 2to3 happens to be from Python 2.X (can happen if you have several versions installed). -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Thu Oct 28 05:17:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 10:17:09 +0100 Subject: [Biopython-dev] README and python3 In-Reply-To: References: Message-ID: 2010/10/27 Tiago Ant?o : > Hi, > > Just a minor issue with the README and python3. > The option --nofix does not exist in 2to3 for the 2.x version. So that > line will not work if the 2to3 happens to be from Python 2.X (can > happen if you have several versions installed). > Hi Tiago, Can you work out which version of 2to3 lacks the --nofix (or -x) option, and which version of Python it came from? The (Apple provided) Python 2.6.1 on my Mac seems to have a 2to3 with the --nofix option, and I don't have Python 3 installed on this machine. In addition to running 2to3 as a command line script, you can call the library from within Python: $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: refactor.py [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations (fixes/fix_*.py) -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. Likewise on our Linux server the 2to3 from Python 2.6.6, 2.7 and 3.1.2 all seem to have it: $ python2.6 Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. $ python2.7 Python 2.7 (r27:82500, Jul 13 2010, 14:02:41) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. $ python3.1 Python 3.1.2 (r312:79147, Jul 15 2010, 12:43:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations (fixes/fix_*.py) -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. Note that we *need* the --nofix option for the conversion of Bio.Phylo to work (it uses long as an argument name, short longitude). Peter From zaricdragoslav at gmail.com Thu Oct 28 10:16:34 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 18:16:34 +0400 Subject: [Biopython-dev] _pwm.c Message-ID: Dear Peter, I wrote you this yesterday: I put this in setup.py: class build_ext_biopython(build_ext): def run(self): if not check_dependencies_once(): return # add software that requires NumPy to install # TODO - Convert these for Python 3 if is_Numpy_installed(): import numpy numpy_include_dir = numpy.get_include() #self.extensions.append( # Extension('Bio.Cluster.cluster', # ['Bio/Cluster/clustermodule.c', # 'Bio/Cluster/cluster.c'], # include_dirs=[numpy_include_dir], # )) #self.extensions.append( # Extension('Bio.KDTree._CKDTree', # ["Bio/KDTree/KDTree.c", # "Bio/KDTree/KDTreemodule.c"], # include_dirs=[numpy_include_dir], # )) self.extensions.append( Extension('Bio.Motif._pwm', ["Bio/Motif/_pwm.c"], include_dirs=[numpy_include_dir], )) build_ext.run(self) and than I run: python3.1 setup.py build_ext This is output: Biopython does not yet officially support Python 3, but you can try it by first using the 2to3 script on our source code. For details on how to use 2to3 with Biopython see README. If you still haven't applied 2to3 to Biopython please abort now. Do you want to continue this installation? (y/N): y running build_ext building 'Bio.Motif._pwm' extension creating build/temp.linux-i686-3.1 creating build/temp.linux-i686-3.1/Bio creating build/temp.linux-i686-3.1/Bio/Motif gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python3.1/site-packages/numpy/core/include -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o build/temp.linux-i686-3.1/Bio/Motif/_pwm.o Bio/Motif/_pwm.c: In function ?init_pwm?: Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function returning void Bio/Motif/_pwm.c:125: warning: implicit declaration of function ?Py_InitModule4? Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer without a cast gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o build/lib/Bio/Motif/_pwm.so So as you can see this is compiling, but there are some warnings. So what is plan, to compile totally without warnings ?? regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 10:27:04 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 15:27:04 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:16 PM, Dragoslav Zaric wrote: > Dear Peter, > > I wrote you this yesterday: > > I put this in setup.py: > > class build_ext_biopython(build_ext): > ? def run(self): > ? ? ? if not check_dependencies_once(): > ? ? ? ? ? return > ? ? ? # add software that requires NumPy to install > ? ? ? # TODO - Convert these for Python 3 > ? ? ? if is_Numpy_installed(): > ? ? ? ? ? import numpy > ? ? ? ? ? numpy_include_dir = numpy.get_include() > ? ? ? ? ? #self.extensions.append( > ? ? ? ? ? # ? ?Extension('Bio.Cluster.cluster', > ? ? ? ? ? # ? ? ? ? ? ? ?['Bio/Cluster/clustermodule.c', > ? ? ? ? ? # ? ? ? ? ? ? ? 'Bio/Cluster/cluster.c'], > ? ? ? ? ? # ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? # ? ? ? ? ? ? ?)) > ? ? ? ? ? #self.extensions.append( > ? ? ? ? ? # ? ?Extension('Bio.KDTree._CKDTree', > ? ? ? ? ? # ? ? ? ? ? ? ?["Bio/KDTree/KDTree.c", > ? ? ? ? ? # ? ? ? ? ? ? ? "Bio/KDTree/KDTreemodule.c"], > ? ? ? ? ? # ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? # ? ? ? ? ? ? ?)) > ? ? ? ? ? self.extensions.append( > ? ? ? ? ? ? ? Extension('Bio.Motif._pwm', > ? ? ? ? ? ? ? ? ? ? ? ? ["Bio/Motif/_pwm.c"], > ? ? ? ? ? ? ? ? ? ? ? ? include_dirs=[numpy_include_dir], > ? ? ? ? ? ? ? ? ? ? ? ? )) > ? ? ? build_ext.run(self) > > and than I run: > > python3.1 setup.py build_ext > > This is output: > > Biopython does not yet officially support Python 3, but you > can try it by first using the 2to3 script on our source code. > For details on how to use 2to3 with Biopython see README. > If you still haven't applied 2to3 to Biopython please abort now. > Do you want to continue this installation? (y/N): > y > running build_ext > building 'Bio.Motif._pwm' extension > creating build/temp.linux-i686-3.1 > creating build/temp.linux-i686-3.1/Bio > creating build/temp.linux-i686-3.1/Bio/Motif > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC > -I/usr/local/lib/python3.1/site-packages/numpy/core/include > -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o > build/temp.linux-i686-3.1/Bio/Motif/_pwm.o > Bio/Motif/_pwm.c: In function ?init_pwm?: > Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function returning void > Bio/Motif/_pwm.c:125: warning: implicit declaration of function ?Py_InitModule4? > Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer > without a cast > gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o > build/lib/Bio/Motif/_pwm.so > > > So as you can see this is compiling, but there are some warnings. So what is > plan, to compile totally without warnings ?? Well ideally no warnings - but of those three warnings only the one about Py_InitModule4 strikes me as important. This was part of the Python 2.x C API used to tell Python about the functions your code provides, and has been changed in Python 3.x (I think you must use PyModule_Create instead). What happens if you try to use the compiled module in Python 3? e.g. from Bio import Motif from Bio.Motif import _pwm Bartek - could you give us a short (Python 2) example of Bio.Motif which uses the C module _pwm? Peter From barwil at gmail.com Thu Oct 28 10:37:18 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Thu, 28 Oct 2010 16:37:18 +0200 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 4:27 PM, Peter wrote: > On Thu, Oct 28, 2010 at 3:16 PM, Dragoslav Zaric > wrote: > > running build_ext > > building 'Bio.Motif._pwm' extension > > creating build/temp.linux-i686-3.1 > > creating build/temp.linux-i686-3.1/Bio > > creating build/temp.linux-i686-3.1/Bio/Motif > > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall > > -Wstrict-prototypes -fPIC > > -I/usr/local/lib/python3.1/site-packages/numpy/core/include > > -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o > > build/temp.linux-i686-3.1/Bio/Motif/_pwm.o > > Bio/Motif/_pwm.c: In function ?init_pwm?: > > Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function > returning void > > Bio/Motif/_pwm.c:125: warning: implicit declaration of function > ?Py_InitModule4? > > Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer > > without a cast > > gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o > > build/lib/Bio/Motif/_pwm.so > > > > > > So as you can see this is compiling, but there are some warnings. So what > is > > plan, to compile totally without warnings ?? > > Well ideally no warnings - but of those three warnings only the one about > Py_InitModule4 strikes me as important. This was part of the Python 2.x > C API used to tell Python about the functions your code provides, and has > been changed in Python 3.x (I think you must use PyModule_Create instead). > > What happens if you try to use the compiled module in Python 3? e.g. > > from Bio import Motif > from Bio.Motif import _pwm > > Bartek - could you give us a short (Python 2) example of Bio.Motif > which uses the C module _pwm? > Hi, this is the fast implementation of DNA motif searching written by Michiel some time ago. It is exposed in the Bio.Motif API in the form of .scanPWM method: Definition: m.scanPWM(self, seq) Docstring: Matrix of log-odds scores for a nucleotide sequence. scans (using a fast C extension) a nucleotide sequence and returns the matrix of log-odds scores for all positions - the result is a one-dimensional numpy array - the sequence can only be a DNA sequence - the search is performed only on one strand It's a very simple module so it should be relatively easy to convert it to python3. Unfortunately, I have no experience in c extensions so I cannot help much. If you need a snippet for testing, you can use this: from Bio import Seq from Bio import Motif m=Motif.read(open("Doc/cookbook/motif/SRF.pfm"),"jaspar-pfm") m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) result should be: array([-29.18363571, -38.3365097 , -29.17756271, -38.04542542, -20.3014183 , -25.18009186], dtype=float32) hope this helps -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Thu Oct 28 11:54:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 16:54:07 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:37 PM, Bartek Wilczynski wrote: > > this is the fast implementation of DNA motif searching written by Michiel > some time ago. It is exposed in the Bio.Motif API in the form of .scanPWM > method: > On a related topic, is there a pure Python fall back for _pwm.c in Bio.Motif? If not, would it be easy to add (e.g. for Jython). Thanks, Peter From zaricdragoslav at gmail.com Thu Oct 28 12:24:05 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 20:24:05 +0400 Subject: [Biopython-dev] Build & Test Message-ID: Hi Peter, To bring me up to speed for build&test process, can I ask you how exactly this process should go. For example if I put this in setup.py file: class build_ext_biopython(build_ext): def run(self): if not check_dependencies_once(): return if is_Numpy_installed(): import numpy numpy_include_dir = numpy.get_include() self.extensions.append( Extension('Bio.Motif._pwm', ["Bio/Motif/_pwm.c"], include_dirs=[numpy_include_dir], )) build_ext.run(self) what command I should run from command line to build and test: python3.1 setup.py build or/and python3.1 setup.py install After this I will have folder build/lib/Bio, so should I go to folder build/lib and start python3.1 to test this, or after python3.1 setup.py install it is copied to root folder. Also after building I will have -pwm.so file, is this final file that is imported from python code ? Currently i can import Seq and Motif but when I run m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) I get: Traceback (most recent call last): File "", line 1, in File "Bio/Motif/_Motif.py", line 778, in scanPWM import _pwm ImportError: No module named _pwm regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 12:36:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:36:31 +0100 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 5:24 PM, Dragoslav Zaric wrote: > Hi Peter, > > To bring me up to speed for build&test process, can I ask you how > exactly this process should go. > > For example if I put this in setup.py file: > > class build_ext_biopython(build_ext): > ? ?def run(self): > ? ? ? ?if not check_dependencies_once(): > ? ? ? ? ? ?return > ? ? ? ?if is_Numpy_installed(): > ? ? ? ? ? ?import numpy > ? ? ? ? ? ?numpy_include_dir = numpy.get_include() > ? ? ? ? ? ?self.extensions.append( > ? ? ? ? ? ? ? ?Extension('Bio.Motif._pwm', > ? ? ? ? ? ? ? ? ? ? ? ? ?["Bio/Motif/_pwm.c"], > ? ? ? ? ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? ? ? ? ? ? ? ? ?)) > ? ? ? ?build_ext.run(self) > > what command I should run from command line to build and test: > > python3.1 setup.py build > or/and > python3.1 setup.py install > > After this I will have folder build/lib/Bio, so should I go to folder build/lib > and start python3.1 to test this, or after python3.1 setup.py install it is > copied to root folder. You can do this: python3.1 setup,py build python3.1 setup.py test and it should use the compiled C code from the build folder. This is equivalent to: python3.1 setup,py build cd Tests python3.1 run_tests.py The advantage of calling run_tests.py directly is you can test particular bits of Biopython rather than all of it, e.g. python3.1 setup,py build cd Tests python3.1 run_tests.py test_Motif.py If you try and run a test directly (e.g. python3.31 test_Motif.py) then in will use the installed version of Biopython (and it will fail if you haven't installed Biopython). > Also after building I will have -pwm.so file, is this final file that > is imported from python code ? > > Currently i can import Seq and Motif but when I run > > m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) > > I get: > > Traceback (most recent call last): > ?File "", line 1, in > ?File "Bio/Motif/_Motif.py", line 778, in scanPWM > ? ?import _pwm > ImportError: No module named _pwm > On Python 2.6 (on my Mac) I have these files: $ ls build/lib.macosx-10.6-universal-2.6/Bio/Motif/ Applications Thresholds.py _Motif.py __init__.py _pwm.so Parsers Thresholds.pyc _Motif.pyc __init__.pyc Trying to import _pwm will load the _pwm.so file. It sounds like you were able to compile _pwm.so under Python 3, but it doesn't import. Is your _pwm.c file still using Py_InitModule4 or have you changed it to something Python 3 compatible yet like PyModule_Create? [It may not have been clear to you earlier, but porting Python C extensions from Python 2 to Python 3 requires quite a lot of background knowledge about Python, C, compiling, and so on. I hope this wasn't too ambitious.] Regards, Peter From biopython at maubp.freeserve.co.uk Thu Oct 28 12:45:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:45:52 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:37 PM, Bartek Wilczynski wrote: > > If you need a snippet for testing, you can use this: > from Bio import Seq > from Bio import Motif > m=Motif.read(open("Doc/cookbook/motif/SRF.pfm"),"jaspar-pfm") > m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) > > result should be: > array([-29.18363571, -38.3365097 , -29.17756271, -38.04542542, -20.3014183 , > -25.18009186], dtype=float32) > > hope this helps I've made that into a new unit test, file Tests/test_Motif_pwm.py http://github.com/biopython/biopython/commit/b265f341352b7c59ceaf7fa0fc4bfafc32185408 Peter From devaniranjan at gmail.com Thu Oct 28 12:49:41 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 12:49:41 -0400 Subject: [Biopython-dev] RMSD calculation Message-ID: I was wondering why there is two functions for calculating RMSD 1)in the SVDSuperimposer() 2)in PDB.Superimposer() In the code its says RMS-is RMS being calculated instead of RMSD??? I ask because VMD gives a different value for RMSD to the one from Biopython Thank you From biopython at maubp.freeserve.co.uk Thu Oct 28 13:04:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 18:04:53 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 5:49 PM, George Devaniranjan wrote: > I was wondering why there is two functions for calculating RMSD > > 1)in the SVDSuperimposer() > 2)in PDB.Superimposer() Can you clarify? There is no function in Bio.PDB.Superimposer to calculate RMSD (or RMS), but it does call the Bio.SVDSuperimposer module internally. > In the code its says RMS-is RMS being calculated instead of RMSD??? There could be some confusion in the comments about root mean squared (RMS) *deviation* (i.e. statistical standard deviation) versus RMS *distance*. > I ask because VMD gives a different value for RMSD to the one from Biopython Very different? There are bound to be some floating point differences. Peter From devaniranjan at gmail.com Thu Oct 28 13:14:44 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 13:14:44 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: For SVD x, y being arrays-I took this from the example/comment section of SVDSuperimposer.py sup=SVDSuperimposer() sup.set(x, y) sup.run() rms=sup.get_rms() For PDBSuperimposer super_imposer = Bio.PDB.Superimposer() super_imposer.set_atoms(ref_atoms, alt_atoms) super_imposer.apply(alt_model.get_atoms()) print "RMS( using PDB superimposer ) = %0.4f" % ( super_imposer.rms) (I took this from the Warwick example and modified for my use) http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/python/protein_superposition/ Yes there is a difference-for 2 proteins having exact same residues of 36 residues the values from 4 sources are as follows VMD RMSD=1.61 SVD RMSD =3.2 PDB RMSD=3.2 >From the EU Bioinformatics server (link below) RMSD =1.75 (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) So Biopython really is computing the RMSD and not RMS? Thanks you On Thu, Oct 28, 2010 at 1:04 PM, Peter wrote: > On Thu, Oct 28, 2010 at 5:49 PM, George Devaniranjan > wrote: > > I was wondering why there is two functions for calculating RMSD > > > > 1)in the SVDSuperimposer() > > 2)in PDB.Superimposer() > > Can you clarify? There is no function in Bio.PDB.Superimposer to calculate > RMSD (or RMS), but it does call the Bio.SVDSuperimposer module internally. > > > In the code its says RMS-is RMS being calculated instead of RMSD??? > > There could be some confusion in the comments about root mean squared > (RMS) *deviation* (i.e. statistical standard deviation) versus RMS > *distance*. > > > I ask because VMD gives a different value for RMSD to the one from > Biopython > > Very different? There are bound to be some floating point differences. > > Peter > From biopython at maubp.freeserve.co.uk Thu Oct 28 13:46:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 18:46:29 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan wrote: > Yes there is a difference-for 2 proteins having exact same residues of 36 > residues the values from 4 sources are as follows > VMD RMSD=1.61 > SVD RMSD =3.2 > PDB RMSD=3.2 > > From the EU Bioinformatics server (link below) RMSD =1.75 > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) > > So Biopython really is computing the RMSD and not RMS? > Thanks you It has been a while since I looked at this (but I can still edit the Warwick page if is is unclear). Which definition of RMSD are you using? Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. The comment for this code *says* is calculates the RMS deviation, here: diff=coords1-coords2 l=coords1.shape[0] return sqrt(sum(sum(diff*diff))/l) Here variable l will be the number of atoms. What are the two examples you are using? Can you at perhaps share a small example pair of PDB files? Peter From zaricdragoslav at gmail.com Thu Oct 28 14:05:21 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 22:05:21 +0400 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: I am sorry Peter, but you are right. It is too ambitious for me. I can not do it. I am sorry if I wasted your time. Good luck and all best. On Thu, Oct 28, 2010 at 8:36 PM, Peter wrote: > On Thu, Oct 28, 2010 at 5:24 PM, Dragoslav Zaric > wrote: >> Hi Peter, >> >> To bring me up to speed for build&test process, can I ask you how >> exactly this process should go. >> >> For example if I put this in setup.py file: >> >> class build_ext_biopython(build_ext): >> ? ?def run(self): >> ? ? ? ?if not check_dependencies_once(): >> ? ? ? ? ? ?return >> ? ? ? ?if is_Numpy_installed(): >> ? ? ? ? ? ?import numpy >> ? ? ? ? ? ?numpy_include_dir = numpy.get_include() >> ? ? ? ? ? ?self.extensions.append( >> ? ? ? ? ? ? ? ?Extension('Bio.Motif._pwm', >> ? ? ? ? ? ? ? ? ? ? ? ? ?["Bio/Motif/_pwm.c"], >> ? ? ? ? ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], >> ? ? ? ? ? ? ? ? ? ? ? ? ?)) >> ? ? ? ?build_ext.run(self) >> >> what command I should run from command line to build and test: >> >> python3.1 setup.py build >> or/and >> python3.1 setup.py install >> >> After this I will have folder build/lib/Bio, so should I go to folder build/lib >> and start python3.1 to test this, or after python3.1 setup.py install it is >> copied to root folder. > > You can do this: > > python3.1 setup,py build > python3.1 setup.py test > > and it should use the compiled C code from the build folder. > This is equivalent to: > > python3.1 setup,py build > cd Tests > python3.1 run_tests.py > > The advantage of calling run_tests.py directly is you can test particular > bits of Biopython rather than all of it, e.g. > > python3.1 setup,py build > cd Tests > python3.1 run_tests.py test_Motif.py > > If you try and run a test directly (e.g. python3.31 test_Motif.py) then > in will use the installed version of Biopython (and it will fail if you > haven't installed Biopython). > >> Also after building I will have -pwm.so file, is this final file that >> is imported from python code ? >> >> Currently i can import Seq and Motif but when I run >> >> m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) >> >> I get: >> >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "Bio/Motif/_Motif.py", line 778, in scanPWM >> ? ?import _pwm >> ImportError: No module named _pwm >> > > On Python 2.6 (on my Mac) I have these files: > > $ ls build/lib.macosx-10.6-universal-2.6/Bio/Motif/ > Applications ? Thresholds.py ?_Motif.py ? ? ?__init__.py ? ?_pwm.so > Parsers ? ? ? ?Thresholds.pyc _Motif.pyc ? ? __init__.pyc > > Trying to import _pwm will load the _pwm.so file. > > It sounds like you were able to compile _pwm.so under > Python 3, but it doesn't import. Is your _pwm.c file still > using Py_InitModule4 or have you changed it to something > Python 3 compatible yet like PyModule_Create? > > [It may not have been clear to you earlier, but porting > Python C extensions from Python 2 to Python 3 requires > quite a lot of background knowledge about Python, C, > compiling, and so on. I hope this wasn't too ambitious.] > > Regards, > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 14:30:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 19:30:52 +0100 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 7:05 PM, Dragoslav Zaric wrote: > I am sorry Peter, but you are right. It is too ambitious for me. > > I can not do it. I am sorry if I wasted your time. > > Good luck and all best. You tried - and in the process we've made some small improvements to Biopython (like the new test for the PWM code), so I'm happy. Maybe there is something less complicated you could try... Anyway, thank you for your time. Peter From tiagoantao at gmail.com Thu Oct 28 17:10:02 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 28 Oct 2010 22:10:02 +0100 Subject: [Biopython-dev] README and python3 In-Reply-To: References: Message-ID: This was a local installation (on my home dir). As soon as I noticed the problem I just removed 2to3 (as I had others - more recent). But the python from which it was installed reports >>> print sys.version_info (2, 6, 0, 'final', 0) 2010/10/28 Peter : > 2010/10/27 Tiago Ant?o : >> Hi, >> >> Just a minor issue with the README and python3. >> The option --nofix does not exist in 2to3 for the 2.x version. So that >> line will not work if the 2to3 happens to be from Python 2.X (can >> happen if you have several versions installed). >> > > Hi Tiago, > > Can you work out which version of 2to3 lacks the --nofix (or -x) > option, and which version of Python it came from? > > The (Apple provided) Python 2.6.1 on my Mac seems to have > a 2to3 with the --nofix option, and I don't have Python 3 installed > on this machine. In addition to running 2to3 as a command line > script, you can call the library from within Python: > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: refactor.py [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations (fixes/fix_*.py) > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > Likewise on our Linux server the 2to3 from Python 2.6.6, 2.7 and > 3.1.2 all seem to have it: > > $ python2.6 > Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > $ python2.7 > Python 2.7 (r27:82500, Jul 13 2010, 14:02:41) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > > $ python3.1 > Python 3.1.2 (r312:79147, Jul 15 2010, 12:43:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations (fixes/fix_*.py) > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > > Note that we *need* the --nofix option for the conversion of > Bio.Phylo to work (it uses long as an argument name, > short longitude). > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From devaniranjan at gmail.com Thu Oct 28 16:42:16 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 16:42:16 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Hello everyone, I tried with pymol and it gives a value of 1.792 for the RMSD after alignment The EU bioinformatics server gives a value of 1.74 VMD 1.62 But SVD and PDB Superimposer gives a value 3.2 I have attached the 2 PDB files concerned-is it something I am doing in calculating the RMSD using biopython? Thank you On Thu, Oct 28, 2010 at 1:46 PM, Peter wrote: > On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan > wrote: > > Yes there is a difference-for 2 proteins having exact same residues of 36 > > residues the values from 4 sources are as follows > > VMD RMSD=1.61 > > SVD RMSD =3.2 > > PDB RMSD=3.2 > > > > From the EU Bioinformatics server (link below) RMSD =1.75 > > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) > > > > So Biopython really is computing the RMSD and not RMS? > > Thanks you > > It has been a while since I looked at this (but I can still edit > the Warwick page if is is unclear). > > Which definition of RMSD are you using? > > Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. > The comment for this code *says* is calculates the RMS deviation, > here: > > diff=coords1-coords2 > l=coords1.shape[0] > return sqrt(sum(sum(diff*diff))/l) > > Here variable l will be the number of atoms. > > What are the two examples you are using? Can you at perhaps > share a small example pair of PDB files? > > Peter > -------------- next part -------------- A non-text attachment was scrubbed... Name: protein1.pdb Type: chemical/x-pdb Size: 16983 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein2.pdb Type: chemical/x-pdb Size: 16981 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Fri Oct 29 07:37:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Oct 2010 12:37:08 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan wrote: > Hello everyone, > I tried with pymol and it gives a value of 1.792 for the RMSD after > alignment > The EU bioinformatics server gives a value of 1.74 > VMD 1.62 > But SVD and PDB Superimposer gives a value 3.2 > I have attached the 2 PDB files concerned-is it something I am doing in > calculating the RMSD using biopython? > Thank you Are you doing the same comparison in each case? For example, are some doing only C-alpha atoms, while others use all atoms? Thanks for the example files - but you forgot the sample Python code ;) import Bio.PDB import numpy structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb") structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb") super_imposer = Bio.PDB.Superimposer() super_imposer.set_atoms(list(structure1.get_atoms()), list(structure2.get_atoms())) print super_imposer.rms This gives 3.19274942026 for all atoms, as you said. Using Bio.SVSuperimposer, coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) from Bio.SVDSuperimposer import SVDSuperimposer sup=SVDSuperimposer() sup.set(coord1, coord2) print sup.run() print sup.get_rms() Again, 3.19274953249 after moving the atoms. Alternatively if we bypass the alignment step and calculate the RMS of the unaligned structures we get a much higher RMS: sup=SVDSuperimposer() print sup._rms(coord1, coord2) #private method, don't use normally 14.8866750536 This matches what I get by doing it explicitly via numpy: import numpy as np coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) assert coord1.shape == coord2.shape diff = coord1-coord2 l = len(diff) #number of atoms from math import sqrt print sqrt(sum(sum(diff*diff))/l) print np.sqrt(np.sum(diff**2)/l) #should give same result as above line (This should be the same calculation as Bio.PDB.Superimposer uses) So, I think there are two potential sources of the disagreement (1) The alignment, and (2) the RMS calculation. Can you use the other tools to get the RMS without aligning the structures? Alternatively, can you save their aligned structures and give that to Biopython? Peter P.S. Why doesn't file protein2.pdb have the elements column? From devaniranjan at gmail.com Fri Oct 29 09:33:36 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 29 Oct 2010 09:33:36 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Hello Peter, Thanks for the answer-I also got the same values using biopython both SVD and PDB modules (3.2) My concern arose as a result of VMD + pymol + EU Bioinfomatics sever giving a value of 1.7 (well 1.6 for VMD) but biopython giving 3.2. Even if the two groups calculate RMSD differently (that is : what atoms the consider only CA or backbone or all atoms ) there is no way there can be such a big discrepency between biopyhton and VMD/Pymol/EUServer I tried RMSD calculation with VMD WITHOUT alignment and for 14.02 as the answer. For biopython PDB module only CA gives 3.2 while backbone gives 3.1 which is acceptable. Thanks once again, let me know if you have any further thoughts on this. On Fri, Oct 29, 2010 at 12:37 PM, Peter wrote: > On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan > wrote: > > Hello everyone, > > I tried with pymol and it gives a value of 1.792 for the RMSD after > > alignment > > The EU bioinformatics server gives a value of 1.74 > > VMD 1.62 > > But SVD and PDB Superimposer gives a value 3.2 > > I have attached the 2 PDB files concerned-is it something I am doing in > > calculating the RMSD using biopython? > > Thank you > > Are you doing the same comparison in each case? For example, > are some doing only C-alpha atoms, while others use all atoms? > > Thanks for the example files - but you forgot the sample Python code ;) > > import Bio.PDB > import numpy > structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb") > structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb") > super_imposer = Bio.PDB.Superimposer() > super_imposer.set_atoms(list(structure1.get_atoms()), > list(structure2.get_atoms())) > print super_imposer.rms > > This gives 3.19274942026 for all atoms, as you said. Using > Bio.SVSuperimposer, > > coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) > coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) > from Bio.SVDSuperimposer import SVDSuperimposer > sup=SVDSuperimposer() > sup.set(coord1, coord2) > print sup.run() > print sup.get_rms() > > Again, 3.19274953249 after moving the atoms. Alternatively if we > bypass the alignment step and calculate the RMS of the unaligned > structures we get a much higher RMS: > > sup=SVDSuperimposer() > print sup._rms(coord1, coord2) #private method, don't use normally > 14.8866750536 > > This matches what I get by doing it explicitly via numpy: > > import numpy as np > coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) > coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) > assert coord1.shape == coord2.shape > diff = coord1-coord2 > l = len(diff) #number of atoms > from math import sqrt > print sqrt(sum(sum(diff*diff))/l) > print np.sqrt(np.sum(diff**2)/l) #should give same result as above line > > (This should be the same calculation as Bio.PDB.Superimposer uses) > > So, I think there are two potential sources of the disagreement > (1) The alignment, and (2) the RMS calculation. Can you use > the other tools to get the RMS without aligning the structures? > Alternatively, can you save their aligned structures and give > that to Biopython? > > Peter > > P.S. Why doesn't file protein2.pdb have the elements column? > From eric.talevich at gmail.com Fri Oct 29 17:39:55 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 29 Oct 2010 17:39:55 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan < devaniranjan at gmail.com> wrote: > I was wondering why there is two functions for calculating RMSD > > 1)in the SVDSuperimposer() > 2)in PDB.Superimposer() > > In the code its says RMS-is RMS being calculated instead of RMSD??? > I ask because VMD gives a different value for RMSD to the one from > Biopython > > Hello George, Here's my understanding of it: 1. RMSD and "RMS distance" both mean root mean square deviation, in terms of the distances in 3D space between each corresponding pair of atoms. The RMSD between all atoms in two aligned structures may be different than the RMSD between backbone atoms only. Or, if the two structures don't have the same peptide sequence, that raises another set of issues. 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a simplified wrapper. 3. The SVDSuperimposer module allows you to either (i) align two structures in 3D space and then calculate RMSD, or (ii) just calculate RMSD without spatially (re-)aligning the structures. PDB.Superimposer just does the former. If the structures weren't already aligned, these can yield very different values. 4. There are many ways to perform a structural alignment; SVDSuperimposer implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement more advanced methods. So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer -- it just means VMD found a better alignment between the two structures. Best, Eric From tiagoantao at gmail.com Fri Oct 29 19:23:16 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 29 Oct 2010 23:23:16 +0000 Subject: [Biopython-dev] Continuous integration server Message-ID: Hi all, I've been hacking with buildbot, an integration server. This is to allow continuous testing of Biopython. So that we are alerted of any problems as soon as somebody does a dreadful commit (I have the top 5 of most dreadful commits, so it was fair that I should try to do something about it). Things are still incomplete, but I think it is time to inform the list of this effort... To know more about buildbot you can either go to the buildbot site http://buildbot.net/ or see the draft doc that I have been preparing http://biopython.org/wiki/Continuous_integration There is a draft server here: http://events.open-bio.org:8010/ The cool thing about buildbot is that actual testing is done by volunteer computers. Want to test on OS y, Python version z? You can offer the idle time of your laptop for that... Obvious things missing: 0. First and foremost, see if people like this? 1. Changing the biopython test code to avoid stressing the network (i.e., having a run_tests option that will not test network tests). This to avoid imposing continuous traffic on genbank and friends. This is a show stopper. 2. Maybe warn the mailing list when some fundamental build stops working (e.g. send an email when a python 2.x build stops working) 3. Have test servers with all the applications installed (do you want to volunteer? This is more to do with volunteers) 4. Maybe change run_tests to require all tests to be done. If we are doing integration testing, we want all tests to be done (missing applications or libraries should be an error). As an example, none of my tests are complete 5. Support mac (my access to Mr Job's fashion machines is limited). Again this is more a volunteer issue. 6. Discuss policies: One test a day? Full tests or updates? Full network tests (probably sporadically)? Send emails? 7. Find volunteers to cover several OSes and several Python versions. Assure that people do full tests (i.e. with all applications and libraries) 8. While I have volunteer Windows testing myself, I will not be able to maintain it regularly. Opinions are most welcome -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From devaniranjan at gmail.com Fri Oct 29 19:42:12 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 30 Oct 2010 00:42:12 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Thanks Eric and Peter, Your patience in answering this question is very much appreciated. I think Eric maybe right, I tried the RMSD calculation for several structures and VMD does give a lower value for them all. George Thanks once again for all of you for your answers On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich wrote: > On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan < > devaniranjan at gmail.com> wrote: > >> I was wondering why there is two functions for calculating RMSD >> >> 1)in the SVDSuperimposer() >> 2)in PDB.Superimposer() >> >> In the code its says RMS-is RMS being calculated instead of RMSD??? >> I ask because VMD gives a different value for RMSD to the one from >> Biopython >> >> > Hello George, > > Here's my understanding of it: > > 1. RMSD and "RMS distance" both mean root mean square deviation, in terms > of the distances in 3D space between each corresponding pair of atoms. The > RMSD between all atoms in two aligned structures may be different than the > RMSD between backbone atoms only. Or, if the two structures don't have the > same peptide sequence, that raises another set of issues. > > 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a > simplified wrapper. > > 3. The SVDSuperimposer module allows you to either (i) align two structures > in 3D space and then calculate RMSD, or (ii) just calculate RMSD without > spatially (re-)aligning the structures. PDB.Superimposer just does the > former. If the structures weren't already aligned, these can yield very > different values. > > 4. There are many ways to perform a structural alignment; SVDSuperimposer > implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement > more advanced methods. > > So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer > -- it just means VMD found a better alignment between the two structures. > > Best, > Eric > > > From mjldehoon at yahoo.com Sat Oct 30 00:06:06 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 29 Oct 2010 21:06:06 -0700 (PDT) Subject: [Biopython-dev] _pwm.c In-Reply-To: Message-ID: <297370.12835.qm@web62402.mail.re1.yahoo.com> --- On Thu, 10/28/10, Peter wrote: > On a related topic, is there a pure Python fall back for > _pwm.c in Bio.Motif? I added a pure Python fall back just now. Bartek, feel free to modify the code if needed. --Michiel. From mjldehoon at yahoo.com Sat Oct 30 00:13:45 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 29 Oct 2010 21:13:45 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c Message-ID: <80876.89806.qm@web62403.mail.re1.yahoo.com> Hi everybody, I was looking at our C modules to see if they can be made ready for Python 3. I noticed that Bio/cMarkovModelmodule.c currently contains only one function, _logadd, which is used to speed up Bio.MarkovModel. Numpy 1.3 and later contain a function (logaddexp) that does exactly the same as _logadd. Since Bio.MarkovModel itself already uses Numpy, I think we can remove Bio/cMarkovModelmodule.c. For this to work, we either need to require Numpy >= 1.3 in setup.py, or check for logaddexp when importing numpy in Bio.MarkovModel. I think requiring Numpy >= 1.3 in setup.py is better in the long run, so I would prefer that. Any other opinions? --Michiel From biopython at maubp.freeserve.co.uk Sat Oct 30 06:47:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 11:47:46 +0100 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <80876.89806.qm@web62403.mail.re1.yahoo.com> References: <80876.89806.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 5:13 AM, Michiel de Hoon wrote: > Hi everybody, > > I was looking at our C modules to see if they can be made > ready for Python 3. I noticed that Bio/cMarkovModelmodule.c > currently contains only one function, _logadd, which is used > to speed up Bio.MarkovModel. Numpy 1.3 and later contain > a function (logaddexp) that does exactly the same as _logadd. > Since Bio.MarkovModel itself already uses Numpy, I think > we can remove Bio/cMarkovModelmodule.c. Sounds good :) > For this to work, we either need to require Numpy >= 1.3 > in setup.py, or check for logaddexp when importing numpy > in Bio.MarkovModel. I think requiring Numpy >= 1.3 in > setup.py is better in the long run, so I would prefer that. > Any other opinions? The setup.py check sounds best. We should check Numpy >= 1.3 is will be available for Python 2.4 - this is relevant for Biopython 1.56 which will still support Python 2.4 The most recent NumPy for Windows installer for Python 2.4 was NumPy 1.2.1, but most Windows users able to install Biopython via our Windows installer would also be able to install a more recent Python and NumPy - so not a big issue. According to the old INSTALL.txt in Nump's github repository it says that for numpy 1.3.0 they still supported Python 2.4. http://github.com/numpy/numpy/blob/v1.3.0/INSTALL.txt If there is any doubt about getting NumPy 1.3.x on Python 2.4, we could postpone this change until after we do the Biopython 1.56 release (probably in November 2010) and drop support for Python 2.4. Peter From mjldehoon at yahoo.com Sat Oct 30 07:39:46 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 30 Oct 2010 04:39:46 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <643043.25292.qm@web62401.mail.re1.yahoo.com> Currently there is a pure-Python fall back for _logadd in MarkovModel.py. We could check if the numpy version is at least 1.3 in setup.py, show a warning if an older numpy is found, and use the fall back in MarkovModel.py if numpy does not contain logaddexp. Then if we remove cMarkovModelmodule.c, in the worst case (a Windows user with Python 2.4 who cannot update to a more recent Python) MarkovModel.py will be a bit slower, but no functionality is lost. --Michiel. --- On Sat, 10/30/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/cMarkovModelmodule.c > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Saturday, October 30, 2010, 6:47 AM > On Sat, Oct 30, 2010 at 5:13 AM, > Michiel de Hoon wrote: > > Hi everybody, > > > > I was looking at our C modules to see if they can be > made > > ready for Python 3. I noticed that > Bio/cMarkovModelmodule.c > > currently contains only one function, _logadd, which > is used > > to speed up Bio.MarkovModel. Numpy 1.3 and later > contain > > a function (logaddexp) that does exactly the same as > _logadd. > > Since Bio.MarkovModel itself already uses Numpy, I > think > > we can remove Bio/cMarkovModelmodule.c. > > Sounds good :) > > > For this to work, we either need to require Numpy > >= 1.3 > > in setup.py, or check for logaddexp when importing > numpy > > in Bio.MarkovModel. I think requiring Numpy >= 1.3 > in > > setup.py is better in the long run, so I would prefer > that. > > Any other opinions? > > The setup.py check sounds best. > > We should check Numpy >= 1.3 is will be available for > Python 2.4 - this is relevant for Biopython 1.56 which > will still support Python 2.4 > > The most recent NumPy for Windows installer > for Python 2.4 was NumPy 1.2.1, but most Windows > users able to install Biopython via our Windows > installer would also be able to install a more recent > Python and NumPy - so not a big issue. > > According to the old INSTALL.txt in Nump's github > repository it says that for numpy 1.3.0 they still > supported Python 2.4. > > http://github.com/numpy/numpy/blob/v1.3.0/INSTALL.txt > > If there is any doubt about getting NumPy 1.3.x on > Python 2.4, we could postpone this change until > after we do the Biopython 1.56 release (probably in > November 2010) and drop support for Python 2.4. > > Peter > From biopython at maubp.freeserve.co.uk Sat Oct 30 08:43:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 13:43:53 +0100 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <643043.25292.qm@web62401.mail.re1.yahoo.com> References: <643043.25292.qm@web62401.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 12:39 PM, Michiel de Hoon wrote: > Currently there is a pure-Python fall back for _logadd in > MarkovModel.py. We could check if the numpy version > is at least 1.3 in setup.py, show a warning if an older > numpy is found, and use the fall back in MarkovModel.py > if numpy does not contain logaddexp. Then if we remove > cMarkovModelmodule.c, in the worst case (a Windows > user with Python 2.4 who cannot update to a more recent > Python) MarkovModel.py will be a bit slower, but no > functionality is lost. > > --Michiel. That sounds OK to me :) Once we drop Python 2.4 maybe we can also list NumPy 1.3 as the minimum supported NumPy? Peter From bugzilla-daemon at portal.open-bio.org Sat Oct 30 09:07:36 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Oct 2010 09:07:36 -0400 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <201010301307.o9UD7aaP029881@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 Bug 2866 depends on bug 2870, which changed state. Bug 2870 Summary: Add BioSQL schema for SQLite http://bugzilla.open-bio.org/show_bug.cgi?id=2870 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Oct 30 10:23:31 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 30 Oct 2010 07:23:31 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <781588.85801.qm@web62407.mail.re1.yahoo.com> OK, done. In the end, I put the warning message in MarkovModel.py anyway, since it's very easy to miss if it's in setup.py. --Michiel. --- On Sat, 10/30/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/cMarkovModelmodule.c > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Saturday, October 30, 2010, 8:43 AM > On Sat, Oct 30, 2010 at 12:39 PM, > Michiel de Hoon wrote: > > Currently there is a pure-Python fall back for _logadd > in > > MarkovModel.py. We could check if the numpy version > > is at least 1.3 in setup.py, show a warning if an > older > > numpy is found, and use the fall back in > MarkovModel.py > > if numpy does not contain logaddexp. Then if we > remove > > cMarkovModelmodule.c, in the worst case (a Windows > > user with Python 2.4 who cannot update to a more > recent > > Python) MarkovModel.py will be a bit slower, but no > > functionality is lost. > > > > --Michiel. > > That sounds OK to me :) > > Once we drop Python 2.4 maybe we can also list > NumPy 1.3 as the minimum supported NumPy? > > Peter > From bugzilla-daemon at portal.open-bio.org Sun Oct 3 02:51:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 2 Oct 2010 22:51:14 -0400 Subject: [Biopython-dev] [Bug 2608] Gcc "differ in signedness" warnings with trie.c In-Reply-To: Message-ID: <201010030251.o932pEUM020278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2608 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2010-10-02 22:51 EST ------- The problem here was the strdup is not an ANSI-C function, and its implementation show differences between platforms. Replacing strdup removes the need for unsigned chars. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Oct 3 13:51:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 3 Oct 2010 09:51:50 -0400 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <201010031351.o93DpoGZ023133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2010-10-03 09:51 EST ------- (In reply to comment #7) > Does the current funny XML file have anything useful in it? Yes, but I doubt that many people (if any) are using the Journals database. If they do, we could make a straightforward parser for plain-text output from the Journals database, which is supported by NCBI. See this discussion on the mailing list: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008239.html To resolve this bug, I have modified the parser such that an error is raised whenever the XML data do not start with the XML declaration ( References: <486264729.08793@eyou.net> Message-ID: On Tue, Oct 5, 2010 at 8:45 AM, Yong wrote: > Hello everyone, > > I am testing a database and its web interface > (http://pbl.neau.edu.cn:8080/)?established with Plone4Bio, BioPython and > BioSQL, when query database from webpage it always return the default date > for sequence: "01-JAN-1980". > > I found that the error happened here in file Bio::SeqIO::InsdcIO.py (lines: > 366-371) of BioPython: > > ??? def _get_date(self, record) : > ??????? default = "01-JAN-1980" > ??????? try : > ??????????? date = record.annotations["date"] > ??????? except KeyError : > ??????????? return default > > It looks like that it does not have "date" key, is it a bug of BioPython or > Plone4Bio? anybody know how to solve it? Hi As I recall, reading/writing a GenBank file with Bio.SeqIO (note single dot in Python, two colons is Perl - grin), the date is preserved. I think the problem is in Biopython loading/retrieving a GenBank file in BioSQL, and I thought there was a bug open on this... I can probably suggest a hack in the Plone4Bio code, but it would be better to tweak Biopython. Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 5 09:16:05 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 5 Oct 2010 05:16:05 -0400 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <201010050916.o959G53F031667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-05 05:16 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > 1) Fixed date/dates typo. > > > > Why is it a typo? Change not checked in. > > The function _load_bioentry_date in Loader.py inserts the annotation 'date', > if present, or the current date if not, into the bioentry_qualifier_value > table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as > the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, > which should be 'date' and not 'dates'. Also, because Loader.py handles dates > separately, they should not be handled by the function load_annotations. I'd forgotten about this issue - I was just reminded by a query on the Plone4Bio mailing list. Yes, I think you are right: http://github.com/biopython/biopython/commit/6aca2c0dbc17a172e76483d925248184080bb654 Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Oct 5 09:18:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 5 Oct 2010 10:18:16 +0100 Subject: [Biopython-dev] [P4b] Always return default date: "01-JAN-1980" In-Reply-To: References: <486264729.08793@eyou.net> Message-ID: On Tue, Oct 5, 2010 at 10:00 AM, Peter wrote: > On Tue, Oct 5, 2010 at 8:45 AM, Yong wrote: >> Hello everyone, >> >> I am testing a database and its web interface >> (http://pbl.neau.edu.cn:8080/)?established with Plone4Bio, BioPython and >> BioSQL, when query database from webpage it always return the default date >> for sequence: "01-JAN-1980". >> >> I found that the error happened here in file Bio::SeqIO::InsdcIO.py (lines: >> 366-371) of BioPython: >> >> ??? def _get_date(self, record) : >> ??????? default = "01-JAN-1980" >> ??????? try : >> ??????????? date = record.annotations["date"] >> ??????? except KeyError : >> ??????????? return default >> >> It looks like that it does not have "date" key, is it a bug of BioPython or >> Plone4Bio? anybody know how to solve it? > > Hi > > As I recall, reading/writing a GenBank file with Bio.SeqIO (note single > dot in Python, two colons is Perl - grin), the date is preserved. I think > the problem is in Biopython loading/retrieving a GenBank file in BioSQL, > and I thought there was a bug open on this... > > I can probably suggest a hack in the Plone4Bio code, but it would > be better to tweak Biopython. > > Peter Hi Yong, I found the open bug Biopython report I was thinking of, and committed a fix: http://bugzilla.open-bio.org/show_bug.cgi?id=2681#c9 Are you able to update your copy of Biopython to the latest source code to test this fix? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Oct 11 08:53:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Oct 2010 09:53:45 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/9/28 Tiago Ant?o : > Hi, > > I've been playing with buildbot a bit (for continuous integration > stuff). I am creating a page on the wiki with some info on that front. > > This is just concept/exploratory stuff: if people don't like it, it is > just a question to delete the page. Hopefully this will at least > permit to see if continuous integration is worthwhile the effort and > if buildbot is a good platform for Biopython. > > Any comments most welcome. I expect to have a working > prototype very soon. If people don't ?like it, I just trash it (no > problems with > that). > > Tiago I see from your notes on the wiki you have been making good progress. I have a couple of queries/ideas: (1) Several of our tests go online to the NCBI or UniProt etc. These tests can and do fail sometimes due to network issues. Also, having some/many buildbot slaves running on a regular basis (once a week? once a day?) would add up and this load may be unwelcome. Perhaps we need to add an -offline flag to run_tests.py which can skip any online tests? (2) You mention buildbot doesn't have built in support for spotting changes in a git repository - but can it do this for SVN? Since github.com also allow access to the git repo via svn that might be a more elegant workaround. (3) Does the buildbot master require the buildbot slaves be online most/all of the time? Would a desktop machine which is typically only on during office hours on week days still be useful? I could probably answer this myself with a bit more background reading ;) Thanks Peter From tiagoantao at gmail.com Mon Oct 11 12:21:01 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Oct 2010 13:21:01 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: Hi Peter, 2010/10/11 Peter : > (1) Several of our tests go online to the NCBI or UniProt etc. > These tests can and do fail sometimes due to network issues. > Also, having some/many buildbot slaves running on a regular > basis (once a week? once a day?) would add up and this > load may be unwelcome. Perhaps we need to add an -offline > flag to run_tests.py which can skip any online tests? That might be a good idea (to have an --offline flag, I mean). A very good idea, indeed. I would like to put the infrastructure in place (if people are interested in going ahead with this...), but after that we need to stabilize a test policy and that will mean answering questions like that. As far as I see we will have many builders (tests under different conditions). Say 5 different Python versions (Jython included), at least 3 OSes. This is already 15 builders. This can easily creep up. Though the numbers are high, it is quite easy to maintain all this stuff: 3 volunteer machines (one for each OS) are enough. The cool thing about buildbot is that it is designed for volunteer machines to be added, so you can start your buildbot slave on your laptop when you are idle. It does not need an array of servers on demand to produce the tests. NCBI and Uniprot might not like to see 30 daily connections for tests :( . So we might need to have, say, one weekly test for each OS doing the network stuff (just a single Python version per OS, maybe) and dailies not doing network loads. > (2) You mention buildbot doesn't have built in support for > spotting changes in a git repository - but can it do this for > SVN? Since github.com also allow access to the git repo > via svn that might be a more elegant workaround. There are 2 different things to consider: 1. Spotting the git repository. There is no builtin support, but this is TRIVIAL nonetheless with the general adaptor of buildbot. It works like this: a. a developer does a push b. github has a hook system which allows for reporting a change to the repository to a certain URL/CGI. Fully automated, transparent to the developer. c. We supply a CGI that receives the event and informs buildbot. There are CGIs for github. We just have to stuff one in a webserver. 2. The slaves/builders have to download github code. In this case, buildbot HAS NATIVE SUPPORT. > (3) Does the buildbot master require the buildbot slaves > be online most/all of the time? Would a desktop machine > which is typically only on during office hours on week > days still be useful? I could probably answer this myself > with a bit more background reading ;) That is one of the wonders of buildbot. Just the server needs to be online. You can indeed have a desktop machine: Whenever it suits you better you start your buildbot slave, it connects to the server to see if there is work to do and the server supplies work to be done. The server can be instructed to only allow the slave to do a single task at a time (to avoid overloading the slave). I am now at a stage were I really need a server to test (with a public address). I would volunteer to do the installation myself, but I would need shell access to a machine where I could run a server process. No root access is needed, but a web server is not enough as buildbot is twisted based. Maybe we can convince the OBF to help. Again, I can volunteer to do the installation. No root access is needed, just the ability to run a server process and a couple of open ports. Tiago From biopython at maubp.freeserve.co.uk Mon Oct 11 13:05:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 11 Oct 2010 14:05:40 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/10/11 Tiago Ant?o : > Hi Peter, > > 2010/10/11 Peter : >> (1) Several of our tests go online to the NCBI or UniProt etc. >> These tests can and do fail sometimes due to network issues. >> Also, having some/many buildbot slaves running on a regular >> basis (once a week? once a day?) would add up and this >> load may be unwelcome. Perhaps we need to add an -offline >> flag to run_tests.py which can skip any online tests? > > That might be a good idea (to have an --offline flag, I mean). A very > good idea, indeed. > > I would like to put the infrastructure in place (if people are > interested in going ahead with this...), but after that we need to > stabilize a test policy and that will mean answering questions like > that. > > As far as I see we will have many builders (tests under different > conditions). Say 5 different Python versions (Jython included), at > least 3 OSes. This is already 15 builders. This can easily creep up. > Though the numbers are high, it is quite easy to maintain all this > stuff: 3 volunteer machines (one for each OS) are enough. The cool > thing about buildbot is that it is designed for volunteer machines to > be added, so you can start your buildbot slave on your laptop when you > are idle. It does not need an array of servers on demand to produce > the tests. > > NCBI and Uniprot might not like to see 30 daily connections for tests > :( . So we might need to have, say, one weekly test for each OS doing > the network stuff (just a single Python version per OS, maybe) and > dailies not doing network loads. Exactly. >> (2) You mention buildbot doesn't have built in support for >> spotting changes in a git repository - but can it do this for >> SVN? Since github.com also allow access to the git repo >> via svn that might be a more elegant workaround. > > There are 2 different things to consider: > 1. Spotting the git repository. There is no builtin support, but this > is TRIVIAL nonetheless with the general adaptor of buildbot. It works > like this: > ? ? a. a developer does a push > ? ? b. github has a hook system which allows for reporting a change > to the repository to a certain URL/CGI. Fully automated, transparent > to the developer. > ? ? c. We supply a CGI that receives the event and informs buildbot. > There are CGIs for github. We just have to stuff one in a webserver. Do you even need a post-commit hook? Unless you want to automatically run the tests after every commit (which might be useful) wouldn't it be enough to do a daily checkout? > 2. The slaves/builders have to download github code. In this case, > buildbot HAS NATIVE SUPPORT. Understood. >> (3) Does the buildbot master require the buildbot slaves >> be online most/all of the time? Would a desktop machine >> which is typically only on during office hours on week >> days still be useful? I could probably answer this myself >> with a bit more background reading ;) > > That is one of the wonders of buildbot. Just the server needs to be online. > You can indeed have a desktop machine: Whenever it suits you better > you start your buildbot slave, it connects to the server to see if > there is work to do and the server supplies work to be done. The > server can be instructed to only allow the slave to do a single task > at a time (to avoid overloading the slave). Excellent. I guess the specifics of starting the buildbot slave will be OS specific, thus it would be up to the machine owner if this should happen automatically at login or not. > I am now at a stage were I really need a server to test (with a public > address). I would volunteer to do the installation myself, but I would > need shell access to a machine where I could run a server process. No > root access is needed, but a web server is not enough as buildbot is > twisted based. Maybe we can convince the OBF to help. Again, I can > volunteer to do the installation. No root access is needed, just the > ability to run a server process and a couple of open ports. I think we should have a work with the OBF as running this on one of their servers does seem the best plan. I'll email you. Peter From tiagoantao at gmail.com Mon Oct 11 13:23:15 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Oct 2010 14:23:15 +0100 Subject: [Biopython-dev] Continuous integration In-Reply-To: References: Message-ID: 2010/10/11 Peter : > Do you even need a post-commit hook? Unless you want > to automatically run the tests after every commit (which might > be useful) wouldn't it be enough to do a daily checkout? I have actually being doing this for my tests: a daily checkout. So we do not need the hook. I would go with the simpler solution for now: ignore the post-commit hook, get something useful working (maybe a nightly build) and in the future we might revisit this when things are better understood and tested. From bugzilla-daemon at portal.open-bio.org Mon Oct 18 11:16:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 18 Oct 2010 07:16:49 -0400 Subject: [Biopython-dev] [Bug 3146] New: DSSP ungraceful failure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3146 Summary: DSSP ungraceful failure Product: Biopython Version: 1.53 Platform: PC OS/Version: All Status: NEW Severity: minor Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: patrick.winters at gmail.com The DSSP annotator should probably fail gracefully when the PDBParser and DSSP disagree about the existence of a residue at a certain position. Here, DSSP reports values for residue 115 of chain A, while the PDBParser throws a key error. from Bio.PDB import PDBParser parser = PDBParser() from Bio.PDB.DSSP import DSSP structure=parser.get_structure("2p0i","pdb2p0i.ent") model=structure[0] dssp=DSSP(model, "pdb2p0i.ent") Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/PDB/DSSP.py", line 175, in __init__ res=chain[res_id] File "/usr/lib/pymodules/python2.6/Bio/PDB/Chain.py", line 71, in __getitem__ return Entity.__getitem__(self, id) File "/usr/lib/pymodules/python2.6/Bio/PDB/Entity.py", line 38, in __getitem__ return self.child_dict[id] KeyError: (' ', 115, ' ') >>> model['A'][114] >>> model['A'][116] >>> model['A'][115] Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/PDB/Chain.py", line 71, in __getitem__ return Entity.__getitem__(self, id) File "/usr/lib/pymodules/python2.6/Bio/PDB/Entity.py", line 38, in __getitem__ return self.child_dict[id] KeyError: (' ', 115, ' ') -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From barwil at gmail.com Tue Oct 19 12:34:45 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Tue, 19 Oct 2010 14:34:45 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi, I've started to look into merging Bio.Motif docs with the Tutorial. I have a few questions: - First, I need to find a good place in the tutorial to put it. One possibility is to make a separate chapter for it, another option is to put it as a subchapter in chapter 15 (cookbook). I think it would be better to make it a separate chapter, similar to one the ones discussing Bio.popgen or bio.phylo, So i thought it would make sense to create it as a new chapter 13, entitled Sequence motif analysis with Bio.Motif -second, I have links and references to papers in there. The question would be should I remove those to keep to the style of the tutorial any thoughts are welcome Bartek On Sat, Sep 18, 2010 at 3:04 PM, Bartek Wilczynski wrote: > Hi, > > On Sat, Sep 18, 2010 at 2:25 PM, Peter wrote: > >> Hi Bartek, >> >> I think it would be good to try and move your Bio.Motif >> documentation from file Docs/cookbook/motif/motif.tex >> into the main Docs/Tutorial.tex as a new chapter. >> Currently it isn't obvious that Biopython supports >> things like a Position Weight Matrix (PWM). >> >> What do you think? >> >> The text will need a slight update since we have now >> deprecated and removed Bio.AlignAce and Bio.MEME, >> but that should be easy. >> > > In general, I'm all for it. It's just that right now is not necessarily the > best time for me to put much work into it. I'm trying to meet a RECOMB > deadline of Oct. 8th with a paper, so if it would not be a problem, I could > update it to the current state of the API after that. On the other hand, if > there's anybody who wants to do it before then, I can review the changes > even earlier. > > thanks for remembering about it. > > Bartek > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Oct 19 12:45:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Oct 2010 13:45:47 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 1:34 PM, Bartek Wilczynski wrote: > Hi, > > I've started to look into merging Bio.Motif docs with the Tutorial. I have a > few questions: > - First, I need to find a good place in the tutorial to put it. > ? ?One possibility is to make a separate chapter for it, another option is > to put it as a subchapter in chapter 15 (cookbook). > ? ?I think it would be better to make it a separate chapter, similar to one > the ones discussing Bio.popgen or bio.phylo, So i thought it would make > sense to create it as a new chapter 13, entitled Sequence motif analysis > with Bio.Motif I agree, create a new chapter (and add yourself to the authors list). I'd definitely put it before the "Cookbook Chapter", and between the Phylogenetics and "Supervised learning methods" chapters seems reasonable. > -second, I have links and references to papers in there. The question would > be should I remove those to keep to the style of the tutorial Keep them - links to external webpages are fine - they work well in both PDF and HTML. For references we currently don't have a formal bibliography - but we do have some existing case of links to papers already, e.g. http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:SeqIO-fastq-conversion Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 19 14:17:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 Oct 2010 10:17:22 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201010191417.o9JEHM0x029641@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-19 10:17 EST ------- (In reply to comment #4) > I do not know what I would like to happen here in addition to the improved > error message. Probably not get an error at all and have biopython able to > cope with these cases as well. I have just asked asimpson at ludwig.org.br > whether fix of the data in dbEST would be feasible. The plain text GenBank file from the NCBI is fine (see comment 1), but the HTML version is not. I don't think this is really a problem with the raw data... Anyway, I've just committed a fix which means Biopython will write an over long line and issue a warning: http://github.com/biopython/biopython/commit/f25ccef1e07129a377954021e08e980b82b6e795 Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Oct 19 15:54:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Oct 2010 16:54:43 +0100 Subject: [Biopython-dev] Merging Uniprot XML parser? Message-ID: Hi all, I've fixed a few issues I felt were holding up merging Andrea's UniProt XML parser. I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed into more or less equivalent objects, and that these can be written out as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent work to support protein EMBL files - which do exist but are rarely used). This required "fixing" Bug 3026 to cope with long annotation that cannot be line wrapper nicely (lots of long URL strings in UniProt XML comments). http://bugzilla.open-bio.org/show_bug.cgi?id=3026 I'm tempted to remove the warning because it is so common... or make it use the same text each time so you get warned once. There are also some additions to the Bio.SeqFeature position classes, since SwissProt/UniProt files can have uncertain positions. Could someone take a look at the code here (a rebased branch), as I'd like some independent testing (and better yet, code review): http://github.com/peterjc/biopython/tree/uniprot Thanks, Peter From eric.talevich at gmail.com Wed Oct 20 02:01:20 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 19 Oct 2010 22:01:20 -0400 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Mon, Aug 16, 2010 at 9:47 AM, Peter wrote: > Hi all, > > A while back I installed NumPy from their svn under Python 3, so that I > could test more of Biopython. I hadn't really looked at Bio.PDB until > recently because test_PDB.py depended on Bio.KDTree which needs > some C code to be compiled (which we haven't tried yet). > [...] > > This has revealed there are at least two issues with Bio.PDB to be > addressed (see below). > [...] > > ====================================================================== > ERROR: test_ExposureCN (__main__.Exposure) > HSExposureCN. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PDB.py", line 612, in setUp > ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 64, in get_structure > ? ?self._parse(file.readlines()) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 84, in _parse > ? ?self.trailer=self._parse_coordinates(coords_trailer) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", > line 200, in _parse_coordinates > ? ?fullname, serial_number, element) > ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", > line 185, in init_atom > ? ?duplicate_atom=residue[name] > TypeError: 'DisorderedResidue' object is not subscriptable > These errors occur when parsing Tests/PDB/a_structure.pdb under permissive mode. In this structure, residue 3 is disordered, and that triggers some exciting things. The bug seems to be related to this method of DisorderedEntityWrapper in Bio/PDB/Entity.py: def __getattr__(self, method): "Forward the method call to the selected child." if not hasattr(self, 'selected_child'): # Avoid problems with pickling # Unpickling goes into infinite loop! raise AttributeError return getattr(self.selected_child, method) When running the test script, where we reach lines 185-186 in StructureBuilder.py: if residue.has_id(name): duplicate_atom=residue[name] it gets magical. The method 'has_id' is not defined on the DisorderedResidue class. Instead, if residue is an instance of DisorderedResidue (subclass of DisorderedEntityWrapper), instead of Residue (subclass of Entity), then accessing residue.has_id on that object calls __getattr__, which in turn calls residue.selected_child.has_id(id). The next line raises a TypeError in Python 3, but not in Python 2 -- residue[name] seems to find the appropriate __getitem__ implementation in Python 2 only. My hypothesis is that Python 2 treats this magic-method call to residue.__getitem__ as an attribute access, allowing DisorderedEntityWrapper.__getattr__ to forward this access to the appropriate child, some Residue instance, which does implement __getitem__. In Python 3, __getitem__-related syntax could be implemented slightly differently, so it's not seen as a __getattr__ access and everything falls apart. (I could be wrong about all of this.) So here's what I'm doing: - In DisorderedEntityWrapper, implement __getitem__(self, id) such that self.selected_child[id] is returned instead. This fixes most of the errors but produces/uncovers three new ones. These new errors also seem to indicate that magic methods on DisorderedEntityWrapper aren't being handled through __getattr__ in Python 3. - Fix the new errors. I'll post the patch here before pushing it upstream once I get it working. Best, Eric From eric.talevich at gmail.com Wed Oct 20 02:52:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 19 Oct 2010 22:52:27 -0400 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 10:01 PM, Eric Talevich wrote: > On Mon, Aug 16, 2010 at 9:47 AM, Peter wrote: >> Hi all, >> >> A while back I installed NumPy from their svn under Python 3, so that I >> could test more of Biopython. I hadn't really looked at Bio.PDB until >> recently because test_PDB.py depended on Bio.KDTree which needs >> some C code to be compiled (which we haven't tried yet). >> > [...] >> >> This has revealed there are at least two issues with Bio.PDB to be >> addressed (see below). >> > [...] >> >> ====================================================================== >> ERROR: test_ExposureCN (__main__.Exposure) >> HSExposureCN. >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "test_PDB.py", line 612, in setUp >> ? ?structure=PDBParser(PERMISSIVE=True).get_structure('X', pdb_filename) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 64, in get_structure >> ? ?self._parse(file.readlines()) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 84, in _parse >> ? ?self.trailer=self._parse_coordinates(coords_trailer) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/PDBParser.py", >> line 200, in _parse_coordinates >> ? ?fullname, serial_number, element) >> ?File "/home/xxx/lib/python3.1/site-packages/Bio/PDB/StructureBuilder.py", >> line 185, in init_atom >> ? ?duplicate_atom=residue[name] >> TypeError: 'DisorderedResidue' object is not subscriptable >> > [...] > > So here's what I'm doing: > ?- In DisorderedEntityWrapper, implement __getitem__(self, id) such > that self.selected_child[id] is returned instead. This fixes most of > the errors but produces/uncovers three new ones. These new errors also > seem to indicate that magic methods on DisorderedEntityWrapper aren't > being handled through __getattr__ in Python 3. > ?- Fix the new errors. > > > I'll post the patch here before pushing it upstream once I get it working. As if we didn't have a better mechanism for this... here's a patch that seems to work on both Pythons. -Eric diff --git a/Bio/PDB/Entity.py b/Bio/PDB/Entity.py index ed17308..af2fcc7 100644 --- a/Bio/PDB/Entity.py +++ b/Bio/PDB/Entity.py @@ -165,10 +165,27 @@ class DisorderedEntityWrapper: raise AttributeError return getattr(self.selected_child, method) + def __getitem__(self, id): + "Return the child with the given id." + return self.selected_child[id] + def __setitem__(self, id, child): "Add a child, associated with a certain id." self.child_dict[id]=child + def __iter__(self): + "Return the number of children." + return iter(self.selected_child) + + def __len__(self): + "Return the number of children." + return len(self.selected_child) + + def __sub__(self, other): + """Subtraction with another object.""" + return self.selected_child - other + + # Public methods def get_id(self): From bugzilla-daemon at portal.open-bio.org Wed Oct 20 06:22:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 02:22:56 -0400 Subject: [Biopython-dev] [Bug 3147] New: AlignIO.parse doesn't raise StopIteration on empty files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3147 Summary: AlignIO.parse doesn't raise StopIteration on empty files Product: Biopython Version: 1.55 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp For example: $ rm -rf test.aln $ touch test.aln $ python Python 2.7 (r27:82500, Jul 6 2010, 13:27:45) [GCC 4.3.4 20090804 (release) 1] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import AlignIO >>> records = AlignIO.parse(open("test.aln"), 'clustal') >>> records.next() >>> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 20 09:12:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 05:12:06 -0400 Subject: [Biopython-dev] [Bug 3147] AlignIO.parse doesn't raise StopIteration on empty files In-Reply-To: Message-ID: <201010200912.o9K9C6og005150@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3147 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-20 05:12 EST ------- In this case you are getting back None - which may have been allowed back on Python 2.2, see also: http://docs.python.org/release/2.4/lib/typeiter.html I'm used to iterators either returning None or raising StopIteration at the end of the elements - but quite often I've had to write code like this: while True: try: record = i.next() except StopIteration: record = None if record is None: break ... The above documentation implies it would be correct to expect a StopIteration exception here. This also applies to some of the Bio.SeqIO parsers too I'm sure, and potentially other parsers in Biopython. To identify most issues we can just change test_SeqIO.py and test_AlignIO.py to check for the exception... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 20 10:03:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 20 Oct 2010 06:03:27 -0400 Subject: [Biopython-dev] [Bug 3147] AlignIO.parse doesn't raise StopIteration on empty files In-Reply-To: Message-ID: <201010201003.o9KA3RY5006926@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3147 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-10-20 06:03 EST ------- Added test and fixed for AlignIO, http://github.com/biopython/biopython/commit/208d926d8e2e706a8bd5d0eee215a26c0457946c Added test for SeqIO (passes already), http://github.com/biopython/biopython/commit/246bd426094ecba9943aba2f58da8f3b7cc4a5f5 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Oct 20 10:45:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 20 Oct 2010 11:45:33 +0100 Subject: [Biopython-dev] Bio.PDB on Python 3 In-Reply-To: References: Message-ID: On Wed, Oct 20, 2010 at 3:52 AM, Eric Talevich wrote: > > As if we didn't have a better mechanism for this... here's a patch > that seems to work on both Pythons. > -Eric Ha ha. Thanks for digging into this - I thought it was going to be complicated, and it sounds like it was. The patch works fine for me - please check it into the master. Cheers, Peter From eric.talevich at gmail.com Thu Oct 21 02:41:19 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 20 Oct 2010 22:41:19 -0400 Subject: [Biopython-dev] DendroPy is now BSD-licensed (was: [Nexml-discuss] NexML schema question) Message-ID: Folks, About two months ago, Jeet Sukumaran mentioned on the NeXML-discuss mailing list that he would be willing to relicense DendroPy, an excellent phylogenetics library for Python, from GPL to the more permissive BSD license. And shortly thereafter, he did: http://github.com/jeetsukumaran/DendroPy/commit/d3a91621fb62b37c311a462cae150772dd735771 For those just tuning in, DendroPy supports tree I/O in the NeXML format, but not phyloXML; Biopython supports phyloXML but not NeXML. Since the licenses are now compatible, we could probably make good use of Jeet's NeXML parsing code at the very least. Unfortunately, as evidenced by my two-month delay, I don't really have the leeway to do the integration myself this semester. But here's a heads-up anyway. Relatedly, Jaime Huerta Cepas (author of ETE, a Python Environment for Tree Exploration) indicated interest in generating phyloXML and NeXML parsers from XSD schemas -- another one to keep an eye out for. Regards, Eric ---------- Forwarded message ---------- From: Jeet Sukumaran Date: Wed, Sep 22, 2010 at 12:51 PM Subject: Re: [Nexml-discuss] NexML schema question To: Eric Talevich Cc: Jaime Huerta Cepas , "NeXML-discuss (list)" < nexml-discuss at lists.sourceforge.net> Hi Eric, Neither Mark nor I have any objections to releasing the DendroPy code to the Biopython library under the BSD license. Not sure what legalities are involved beyond saying "go for it", but if that's all it takes then "go for it!". -- jeet On 9/22/10 10:49 AM, Eric Talevich wrote: > On Sep 22, 2010, at 9:13 AM, Jaime Huerta Cepas wrote: > >> >> all I know is that Eric Talevich (the person who wrote the phyloXML parser >> in biopython) seems to be working on this, as claimed in the biopython wiki. >> But I don't think is ready yet. >> > > I took a crack at NeXML parsing a while ago, but it's nowhere near > ready, and I don't expect to be able to work on it again for several > more months. > > If you're looking for a currently usable library for working with > NeXML (I didn't catch the rest of this discussion), DendroPy is nice. > Its internal representation of tree objects isn't the same as > Biopython's Bio.Phylo, and it's GPL, so we can't just plug it directly > into Biopython (which uses a more permissive BSD-style license). But > serializing a tree to Newick from Biopython and then parsing the > Newick string in DendroPy, or the reverse, would give you some basic > interoperability. > > > What I think is that XSD schemas could be automatically parsed to >> generate parsers :) This would allow us to have a comprehensive and up to >> date parser for the NexML schema that everyone can use. >> > > Sure! PhyloXML is defined by an XSD schema, too. With this approach, > would it be possible for parsed phyloXML and NeXML tree objects to > share a base class, so the same methods are available on each? > > -Eric > > From mjldehoon at yahoo.com Sat Oct 23 09:19:26 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 23 Oct 2010 02:19:26 -0700 (PDT) Subject: [Biopython-dev] Tracking DTD files in Bio.Entrez Message-ID: <893615.35060.qm@web62401.mail.re1.yahoo.com> Hi everybody, As you may know, the parser for XML data generated by NCBI in Bio.Entrez makes use of DTD files (from NCBI) to correctly interpret the XML data. Most (if not all) DTD files are included in the Biopython distribution in Bio/Entrez/DTDs, but particularly when NCBI updates their DTD files it may happen that a required DTD file is missing. I have now modified the parser so that it tracks the URL of DTD files, so that it can access DTDs over the internet if they are not available locally. Still, parsing local DTD files is much faster than retrieving a remote DTD file, so when a DTD file is missing the parser will show a warning with the missing DTD, the URL where it can be found, and which directory it should be saved in (which typically is something like /usr/local/lib/python2.7/site-packages/Bio/Entrez/DTDs). For users who do not have write permission to this directory, it may be good to also allow storing these files in the users home directory, for example in ~/.biopython/Bio/Entrez/DTDs. If we start using such a directory, we could also consider to automatically retrieve DTD files and save them in that directory without asking the user to do that manually. I guess it's a trade-off between convenience for the user (if we download and save DTDs automatically), and transparency (we would be saving files in the user's home directory without him/her being aware of it). Any opinions? Is this a good idea? -Michiel. From biopython at maubp.freeserve.co.uk Mon Oct 25 21:28:24 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 25 Oct 2010 22:28:24 +0100 Subject: [Biopython-dev] [Biopython] Getting involved In-Reply-To: References: Message-ID: On Mon, Oct 25, 2010 at 9:32 PM, Dragoslav Zaric wrote: > Dear Peter, > > I think that this: > > "Can you program in C and are you familiar with the C/Python > API? We will need to look at porting our C code from Python 2 > to Python 3, and this is quite complicated." > > is best idea for start. I can code in C, and have experience > both with python 2.7 and 3. Will read tomorrow about C/Python > API. > > Kind regards Hi Dragoslav, I'm glad you sound enthusiastic, and I hope you can make some progress... Our plan (following what the NumPy project are doing) is to have a single code base targeting Python 2.x. All the Python code is automatically converted using the 2to3 script into Python 3. There are a few special cases, but that work is mostly done now. All the C code will need to use #ifdef statements to make the same C file work on both Python 2 and Python 3. The bad news is that the basic API for writing C extension modules for Python has changed. What I suggest you do first, is make sure you can get the latest Biopython source code from git, compile it under Python 2, and run the unit tests. Then try 2to3 and running the tests under Python 3 (see the README file). Next I would trying updating one of the smaller C modules in Biopython to work on Python 3. You'll need to edit our setup.py to compile what you are working on (currently we compile none of the C code on Python 3). I don't yet have a feel for how much work this will be. Please sign up to the biopython-dev mailing list where we can discuss things in more detail. The main list is more for user support and general discussion. Thanks, and good luck! Peter From zaricdragoslav at gmail.com Mon Oct 25 23:34:29 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 03:34:29 +0400 Subject: [Biopython-dev] [Biopython] Getting involved In-Reply-To: References: Message-ID: Dear Peter, I have subscribed to biopython-dev mailing list and I have downloaded source code with git. kind regards On Tue, Oct 26, 2010 at 1:28 AM, Peter wrote: > On Mon, Oct 25, 2010 at 9:32 PM, Dragoslav Zaric wrote: >> Dear Peter, >> >> I think that this: >> >> "Can you program in C and are you familiar with the C/Python >> API? We will need to look at porting our C code from Python 2 >> to Python 3, and this is quite complicated." >> >> is best idea for start. I can code in C, and have experience >> both with python 2.7 and 3. Will read tomorrow about C/Python >> API. >> >> Kind regards > > Hi Dragoslav, > > I'm glad you sound enthusiastic, and I hope you can make > some progress... > > Our plan (following what the NumPy project are doing) is > to have a single code base targeting Python 2.x. > > All the Python code is automatically converted using the > 2to3 script into Python 3. There are a few special cases, > but that work is mostly done now. > > All the C code will need to use #ifdef statements to make > the same C file work on both Python 2 and Python 3. The > bad news is that the basic API for writing C extension > modules for Python has changed. > > What I suggest you do first, is make sure you can get > the latest Biopython source code from git, compile it > under Python 2, and run the unit tests. Then try 2to3 > and running the tests under Python 3 (see the README > file). > > Next I would trying updating one of the smaller C > modules in Biopython to work on Python 3. You'll > need to edit our setup.py to compile what you are > working on (currently we compile none of the C > code on Python 3). I don't yet have a feel for how > much work this will be. > > Please sign up to the biopython-dev mailing list where > we can discuss things in more detail. The main list is > more for user support and general discussion. > > Thanks, and good luck! > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 07:24:50 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 11:24:50 +0400 Subject: [Biopython-dev] Plan for upgrade Message-ID: Dear Peter, This is what I found on python 3 web pages: ---------------------------------------------------------------------------------------------------------------- Porting To Python 3.0 For porting existing Python 2.5 or 2.6 source code to Python 3.0, the best strategy is the following: (Prerequisite:) Start with excellent test coverage. Port to Python 2.6. This should be no more work than the average port from Python 2.x to Python 2.(x+1). Make sure all your tests pass. (Still using 2.6:) Turn on the -3 command line switch. This enables warnings about features that will be removed (or change) in 3.0. Run your test suite again, and fix code that you get warnings about until there are no warnings left, and all your tests still pass. Run the 2to3 source-to-source translator over your source code tree. (See 2to3 - Automated Python 2 to 3 code translation for more on this tool.) Run the result of the translation under Python 3.0. Manually fix up any remaining issues, fixing problems until all tests pass again. It is not recommended to try to write source code that runs unchanged under both Python 2.6 and 3.0; you?d have to use a very contorted coding style, e.g. avoiding print statements, metaclasses, and much more. If you are maintaining a library that needs to support both Python 2.6 and Python 3.0, the best approach is to modify step 3 above by editing the 2.6 version of the source code and running the 2to3 translator again, rather than editing the 3.0 version of the source code. ---------------------------------------------------------------------------------------------------------------- And this is page for 2to3 translator: http://docs.python.org/release/3.0.1/library/2to3.html#to3-reference So can we start to agree on approach and tactics. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 07:34:31 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 11:34:31 +0400 Subject: [Biopython-dev] Changed in C API Message-ID: Dear Peter, The list of changes in Python 3 is not complete. This is current list: ------------------------------------------------------------------------------------------------ Due to time constraints, here is a very incomplete list of changes to the C API. Support for several platforms was dropped, including but not limited to Mac OS 9, BeOS, RISCOS, Irix, and Tru64. PEP 3118: New Buffer API. PEP 3121: Extension Module Initialization & Finalization. PEP 3123: Making PyObject_HEAD conform to standard C. No more C API support for restricted execution. PyNumber_Coerce, PyNumber_CoerceEx, PyMember_Get, and PyMember_Set C APIs are removed. New C API PyImport_ImportModuleNoBlock, works like PyImport_ImportModule but won?t block on the import lock (returning an error instead). Renamed the boolean conversion C-level slot and method: nb_nonzero is now nb_bool. Removed METH_OLDARGS and WITH_CYCLE_GC from the C API. ------------------------------------------------------------------------------------------------ Can you tell me what are exactly versions that we are converting, from 2.7 to 3.0.1 ?? This is also what I have read on python 3 web site: ------------------------------------------------------------------------------------------------ The net result of the 3.0 generalizations is that Python 3.0 runs the pystone benchmark around 10% slower than Python 2.5. Most likely the biggest cause is the removal of special-casing for small integers. There?s room for improvement, but it will happen after 3.0 is released! ------------------------------------------------------------------------------------------------ This means that python 3 is still no optimized or like all software start to be worse with new versions :) Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 08:43:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 09:43:48 +0100 Subject: [Biopython-dev] Plan for upgrade In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 8:24 AM, Dragoslav Zaric wrote: > Dear Peter, > > This is what I found on python 3 web pages: > ---------------------------------------------------------------------------------------------------------------- > Porting To Python 3.0 > For porting existing Python 2.5 or 2.6 source code to Python 3.0, the > best strategy is the following: > > (Prerequisite:) Start with excellent test coverage. > Port to Python 2.6. This should be no more work than the average port > from Python 2.x to Python 2.(x+1). Make sure all your tests pass. > (Still using 2.6:) Turn on the -3 command line switch. This enables > warnings about features that will be removed (or change) in 3.0. Run > your test suite again, and fix code that you get warnings about until > there are no warnings left, and all your tests still pass. > Run the 2to3 source-to-source translator over your source code tree. > (See 2to3 - Automated Python 2 to 3 code translation for more on this > tool.) Run the result of the translation under Python 3.0. Manually > fix up any remaining issues, fixing problems until all tests pass > again. > It is not recommended to try to write source code that runs unchanged > under both Python 2.6 and 3.0; you?d have to use a very contorted > coding style, e.g. avoiding print statements, metaclasses, and much > more. If you are maintaining a library that needs to support both > Python 2.6 and Python 3.0, the best approach is to modify step 3 above > by editing the 2.6 version of the source code and running the 2to3 > translator again, rather than editing the 3.0 version of the source > code. > ---------------------------------------------------------------------------------------------------------------- > > And this is page for 2to3 translator: > > http://docs.python.org/release/3.0.1/library/2to3.html#to3-reference > > So can we start to agree on approach and tactics. > > Kind regards Hi Dragoslav, Yes, that is basically what we are doing for the pure python code. We still write our code for Python 2.x (currently Python 2.4 to 2.7), and then use 2to3 convert it to work on Python 3.x (currently testing on 3.1, at the end of the year we'll be trying the planned Python 3.2 beta as well). That is the easy part - its the C code we need to handle now for our extension modules (and the 2to3 script does not do this). Perhaps I was too concise earlier. http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008311.html Peter From biopython at maubp.freeserve.co.uk Tue Oct 26 08:47:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 09:47:58 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 8:34 AM, Dragoslav Zaric wrote: > Dear Peter, > > The list of changes in Python 3 is not complete. This is current list: > ... > > Can you tell me what are exactly versions that we are converting, from > 2.7 to 3.0.1 ?? We currently support Python 2.4 to 2.7 (but plan to drop support for Python 2.4 soon). We've been testing on Python 3.1 and except to support later versions as they are released. I personally don't really care about Python 3.0 (it would be nice if that works too, but it is not essential). > This means that python 3 is still no optimized or like all software > start to be worse with new versions :) Python 3.1 is already out and is faster than Python 3.0. Some things are still slower than Python 2 though, in particular we've noticed this for parsing since by default Python 3 uses unicode instead of byte strings. Peter From zaricdragoslav at gmail.com Tue Oct 26 09:11:32 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 13:11:32 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Ok Peter, First it is my mistake that I was talking about python code upgrade. I understand that you want me to do C code upgrade, but at the end all should work together so this is why I looked at overall upgrade process. I have downloaded latest python source code and I have searched for .c files and .h files and this is what I have found: Bio\Cluster\cluster.c Bio\Cluster\clustermodule.c Bio\cMarkovModelmodule.c Bio\cpairwise2module.c Bio\csupport.c Bio\KDTree\KDTree.c Bio\KDTree\KDTreemodule.c Bio\Motif\_pwm.c Bio\Nexus\cnexus.c Bio\PDB\mmCIF\lex.yy.c Bio\PDB\mmCIF\mmcif_test.c Bio\PDB\mmCIF\MMCIFlexmodule.c Bio\trie.c Bio\triemodule.c Bio\Cluster\cluster.h Bio\csupport.h Bio\KDTree\KDTree.h Bio\KDTree\Neighbor.h Bio\trie.h Are these all files you want me to upgrade to python 3.1 ? Kind regards On Tue, Oct 26, 2010 at 12:47 PM, Peter wrote: > On Tue, Oct 26, 2010 at 8:34 AM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> The list of changes in Python 3 is not complete. This is current list: >> ... >> >> Can you tell me what are exactly versions that we are converting, from >> 2.7 to 3.0.1 ?? > > We currently support Python 2.4 to 2.7 (but plan to drop support > for Python 2.4 soon). We've been testing on Python 3.1 and except > to support later versions as they are released. > > I personally don't really care about Python 3.0 (it would be nice if > that works too, but it is not essential). > >> This means that python 3 is still no optimized or like all software >> start to be worse with new versions :) > > Python 3.1 is already out and is faster than Python 3.0. Some things > are still slower than Python 2 though, in particular we've noticed this > for parsing since by default Python 3 uses unicode instead of byte > strings. > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 09:47:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 10:47:21 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric wrote: > Ok Peter, > > First it is my mistake that I was talking about python code upgrade. > I understand that you want me to do C code upgrade, but at the > end all should work together so this is why I looked at overall upgrade > process. > > I have downloaded latest python source code and I have searched > for .c files and .h files and this is what I have found: > > Bio\Cluster\cluster.c > Bio\Cluster\clustermodule.c > Bio\cMarkovModelmodule.c > Bio\cpairwise2module.c > Bio\csupport.c > Bio\KDTree\KDTree.c > Bio\KDTree\KDTreemodule.c > Bio\Motif\_pwm.c > Bio\Nexus\cnexus.c > Bio\PDB\mmCIF\lex.yy.c > Bio\PDB\mmCIF\mmcif_test.c > Bio\PDB\mmCIF\MMCIFlexmodule.c > Bio\trie.c > Bio\triemodule.c > > Bio\Cluster\cluster.h > Bio\csupport.h > Bio\KDTree\KDTree.h > Bio\KDTree\Neighbor.h > Bio\trie.h What OS are you using? From the slashes I'd guess Windows (which may complicate things - getting the compilers all setup is more work). > > Are these all files you want me to upgrade to python 3.1 ? > Yes - but not all of them are equally important, and some will be more complicated to port. For example, the Nexus, MarkovModelmodule and cMarkovModelmodule C code have a Python fallback (i.e. the C code is not essential, just faster). Some of those (e.g. Bio.Cluster and Bio.KDTree) depend on NumPy, which may make things more complicated. You will need to install NumPy (for both Python 2 and 3). Some may have string encoding issues (bytes vs unicode), e.g. Nexus, Motif The mmCIF module is not urgent. This is a file parser for the Bio.PDB code, and we have discussed replacing this in C. One reason for this is it currently depends on the 3rd party library flex. I think Bio/Motif/_pwm.c would be a good module to start with. It is a short simple module exposing a single function to Python. You should read this: http://wiki.python.org/moin/PortingExtensionModulesToPy3k Peter From zaricdragoslav at gmail.com Tue Oct 26 10:01:46 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:01:46 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Do not worry Peter, I am writing from work, that is why I am using windows. At home I have two lap tops and both are Linux :) I do not have windows on any partition :) I use and developo only on Linux outside of main job. Ok, when I get home I will read http://wiki.python.org/moin/PortingExtensionModulesToPy3k and start to work on Bio/Motif/_pwm.c Kind regards On Tue, Oct 26, 2010 at 1:47 PM, Peter wrote: > On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric > wrote: >> Ok Peter, >> >> First it is my mistake that I was talking about python code upgrade. >> I understand that you want me to do C code upgrade, but at the >> end all should work together so this is why I looked at overall upgrade >> process. >> >> I have downloaded latest python source code and I have searched >> for .c files and .h files and this is what I have found: >> >> Bio\Cluster\cluster.c >> Bio\Cluster\clustermodule.c >> Bio\cMarkovModelmodule.c >> Bio\cpairwise2module.c >> Bio\csupport.c >> Bio\KDTree\KDTree.c >> Bio\KDTree\KDTreemodule.c >> Bio\Motif\_pwm.c >> Bio\Nexus\cnexus.c >> Bio\PDB\mmCIF\lex.yy.c >> Bio\PDB\mmCIF\mmcif_test.c >> Bio\PDB\mmCIF\MMCIFlexmodule.c >> Bio\trie.c >> Bio\triemodule.c >> >> Bio\Cluster\cluster.h >> Bio\csupport.h >> Bio\KDTree\KDTree.h >> Bio\KDTree\Neighbor.h >> Bio\trie.h > > What OS are you using? From the slashes I'd guess > Windows (which may complicate things - getting the > compilers all setup is more work). > >> >> Are these all files you want me to upgrade to python 3.1 ? >> > > Yes - but not all of them are equally important, and some > will be more complicated to port. > > For example, the Nexus, MarkovModelmodule and > cMarkovModelmodule C code have a Python fallback > (i.e. the C code is not essential, just faster). > > Some of those (e.g. Bio.Cluster and Bio.KDTree) depend > on NumPy, which may make things more complicated. > You will need to install NumPy (for both Python 2 and 3). > > Some may have string encoding issues (bytes vs unicode), > e.g. Nexus, Motif > > The mmCIF module is not urgent. This is a file parser for > the Bio.PDB code, and we have discussed replacing this > in C. One reason for this is it currently depends on the > 3rd party library flex. > > I think Bio/Motif/_pwm.c would be a good module to start > with. It is a short simple module exposing a single > function to Python. > > You should read this: > http://wiki.python.org/moin/PortingExtensionModulesToPy3k > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 10:03:30 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:03:30 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Dear Peter, You write this: "What I suggest you do first, is make sure you can get the latest Biopython source code from git, compile it under Python 2, and run the unit tests. Then try 2to3 and running the tests under Python 3 (see the README file)." Can you tell me how do I run unit tests in any python version ?? Are there unit tests for C modules or these tests cover everything ?? Kind regards On Tue, Oct 26, 2010 at 2:01 PM, Dragoslav Zaric wrote: > Do not worry Peter, > > I am writing from work, that is why I am using windows. At home I have > two lap tops > and both are Linux :) I do not have windows on any partition :) > > I use and developo only on Linux outside of main job. > > Ok, when I get home I will read > > http://wiki.python.org/moin/PortingExtensionModulesToPy3k > > and start to work on > > Bio/Motif/_pwm.c > > Kind regards > > > On Tue, Oct 26, 2010 at 1:47 PM, Peter wrote: >> On Tue, Oct 26, 2010 at 10:11 AM, Dragoslav Zaric >> wrote: >>> Ok Peter, >>> >>> First it is my mistake that I was talking about python code upgrade. >>> I understand that you want me to do C code upgrade, but at the >>> end all should work together so this is why I looked at overall upgrade >>> process. >>> >>> I have downloaded latest python source code and I have searched >>> for .c files and .h files and this is what I have found: >>> >>> Bio\Cluster\cluster.c >>> Bio\Cluster\clustermodule.c >>> Bio\cMarkovModelmodule.c >>> Bio\cpairwise2module.c >>> Bio\csupport.c >>> Bio\KDTree\KDTree.c >>> Bio\KDTree\KDTreemodule.c >>> Bio\Motif\_pwm.c >>> Bio\Nexus\cnexus.c >>> Bio\PDB\mmCIF\lex.yy.c >>> Bio\PDB\mmCIF\mmcif_test.c >>> Bio\PDB\mmCIF\MMCIFlexmodule.c >>> Bio\trie.c >>> Bio\triemodule.c >>> >>> Bio\Cluster\cluster.h >>> Bio\csupport.h >>> Bio\KDTree\KDTree.h >>> Bio\KDTree\Neighbor.h >>> Bio\trie.h >> >> What OS are you using? From the slashes I'd guess >> Windows (which may complicate things - getting the >> compilers all setup is more work). >> >>> >>> Are these all files you want me to upgrade to python 3.1 ? >>> >> >> Yes - but not all of them are equally important, and some >> will be more complicated to port. >> >> For example, the Nexus, MarkovModelmodule and >> cMarkovModelmodule C code have a Python fallback >> (i.e. the C code is not essential, just faster). >> >> Some of those (e.g. Bio.Cluster and Bio.KDTree) depend >> on NumPy, which may make things more complicated. >> You will need to install NumPy (for both Python 2 and 3). >> >> Some may have string encoding issues (bytes vs unicode), >> e.g. Nexus, Motif >> >> The mmCIF module is not urgent. This is a file parser for >> the Bio.PDB code, and we have discussed replacing this >> in C. One reason for this is it currently depends on the >> 3rd party library flex. >> >> I think Bio/Motif/_pwm.c would be a good module to start >> with. It is a short simple module exposing a single >> function to Python. >> >> You should read this: >> http://wiki.python.org/moin/PortingExtensionModulesToPy3k >> >> Peter >> > > > > -- > Dragoslav Zaric > > Professional Programmer > MSc Astrophysics > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 10:12:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 11:12:02 +0100 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 11:03 AM, Dragoslav Zaric wrote: > Dear Peter, > > You write this: > > "What I suggest you do first, is make sure you can get > the latest Biopython source code from git, compile it > under Python 2, and run the unit tests. Then try 2to3 > and running the tests under Python 3 (see the README > file)." > > Can you tell me how do I run unit tests in any python version ?? Have a look at the "The Biopython testing framework" chapter in the tutorial (although this does not talk about Python 3). For python 2.x, from the Tests directory do: python run_tests.py For a particular version of Python, do: python2.6 run_tests.py For Python 3.x first convert the code with 2to3 as described in the README file, then: python3 run_tests.py For a particular version of Python 3, do: python3.1 run_tests.py You can run selected tests rather than all of them, e.g. python run_tests.py test_Motif.py > > Are there unit tests for C modules or these tests cover everything ?? > The tests are all written in Python, and will test the C modules via their Python interface. Peter From zaricdragoslav at gmail.com Tue Oct 26 10:59:05 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 14:59:05 +0400 Subject: [Biopython-dev] Changed in C API In-Reply-To: References: Message-ID: Ok Peter, I will work on this tonight and let you know how is it going, Kind regards On Tue, Oct 26, 2010 at 2:12 PM, Peter wrote: > On Tue, Oct 26, 2010 at 11:03 AM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> You write this: >> >> "What I suggest you do first, is make sure you can get >> the latest Biopython source code from git, compile it >> under Python 2, and run the unit tests. Then try 2to3 >> and running the tests under Python 3 (see the README >> file)." >> >> Can you tell me how do I run unit tests in any python version ?? > > Have a look at the "The Biopython testing framework" > chapter in the tutorial (although this does not talk about > Python 3). > > For python 2.x, from the Tests directory do: > > python run_tests.py > > For a particular version of Python, do: > > python2.6 run_tests.py > > For Python 3.x first convert the code with 2to3 as described > in the README file, then: > > python3 run_tests.py > > For a particular version of Python 3, do: > > python3.1 run_tests.py > > You can run selected tests rather than all of them, e.g. > > python run_tests.py test_Motif.py > >> >> Are there unit tests for C modules or these tests cover everything ?? >> > > The tests are all written in Python, and will test the C modules via > their Python interface. > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From zaricdragoslav at gmail.com Tue Oct 26 16:28:00 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 20:28:00 +0400 Subject: [Biopython-dev] Test python 2.6 Message-ID: Dear Peter, I run python run_tests.py in Tests folder with python 2.6.2 I got one error in test file test_SeqIO_online.py I open the file and went to line that caused error and it looks like it is not functional error, it is just data error, because there is no data for in database for supplied parameters: ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") So I commented this line and leave other two and all tests passed after this. Anyway, now I am installing python 3.1.2 and will run tests when finish installation. Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 16:41:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 17:41:57 +0100 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 5:28 PM, Dragoslav Zaric wrote: > Dear Peter, > > I run > > python run_tests.py in Tests folder with python 2.6.2 I got one error > in test file test_SeqIO_online.py > I open the file and went to line that caused error and it looks like > it is not functional error, it is just data error, because there is no data > for in database for supplied parameters: > > ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") > > So I commented this line and leave other two and all tests passed after this. I'd noticed that failing a little while back, and had assumed it was just a temporary network problem. In fact looks like the NCBI have changed how searching against the genome database works. This update fixes the test on Python 2.6: http://github.com/biopython/biopython/commit/ad1dd31828c1488c72bffba3bc769c012439ea90 > Anyway, now I am installing python 3.1.2 and will run tests when > finish installation. Note there are some known failures on Python 3, this includes test_SeqIO_online.py (bytes vs unicode). Peter From zaricdragoslav at gmail.com Tue Oct 26 16:45:37 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Tue, 26 Oct 2010 20:45:37 +0400 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: I run now 2to3 on biopython folder and when it finish I will run run_tests.py This will also test C modules ? Kind regards On Tue, Oct 26, 2010 at 8:41 PM, Peter wrote: > On Tue, Oct 26, 2010 at 5:28 PM, Dragoslav Zaric > wrote: >> Dear Peter, >> >> I run >> >> python run_tests.py in Tests folder with python 2.6.2 I got one error >> in test file test_SeqIO_online.py >> I open the file and went to line that caused error and it looks like >> it is not functional error, it is just data error, because there is no data >> for in database for supplied parameters: >> >> ("genome", ["fasta", "gb"], "X52960", 248, "Ktxz0HgMlhQmrKTuZpOxPZJ6zGU") >> >> So I commented this line and leave other two and all tests passed after this. > > I'd noticed that failing a little while back, and had assumed it was just a > temporary network problem. In fact looks like the NCBI have changed > how searching against the genome database works. This update fixes > the test on Python 2.6: > > http://github.com/biopython/biopython/commit/ad1dd31828c1488c72bffba3bc769c012439ea90 > >> Anyway, now I am installing python 3.1.2 and will run tests when >> finish installation. > > Note there are some known failures on Python 3, this includes > test_SeqIO_online.py (bytes vs unicode). > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Tue Oct 26 16:59:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 17:59:44 +0100 Subject: [Biopython-dev] Test python 2.6 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 5:45 PM, Dragoslav Zaric wrote: > I run now 2to3 on biopython folder and when it finish I will run run_tests.py > This will also test C modules ? Using run_tests.py would cover everything unless it has been disabled on Python 3, or depends on some C code which hasn't been compiled (in which case the test should be skipped). Note we've edited setup.py not to try and compile any C code on Python 3 (because currently none of it works). You'll need to edit setup.py to compile any C code you work on for Python 3. For C modules which don't use NumPy, change this bit: ... elif sys.version_info[0] == 3: # TODO - Must update our C extensions for Python 3 EXTENSIONS = [] ... For extensions using NumPy, see class build_ext_biopython Peter From barwil at gmail.com Tue Oct 26 20:46:50 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Tue, 26 Oct 2010 22:46:50 +0200 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: Hi all, I've added the Bio.Motif section to the tutorial and pushed this to github. I was able to build the tutorial in pdf, but I'm not sure about the html version and whether it works for other people. Any other comments are welcome as well cheers Bartek On Tue, Oct 19, 2010 at 2:45 PM, Peter wrote: > On Tue, Oct 19, 2010 at 1:34 PM, Bartek Wilczynski > wrote: > > Hi, > > > > I've started to look into merging Bio.Motif docs with the Tutorial. I > have a > > few questions: > > - First, I need to find a good place in the tutorial to put it. > > One possibility is to make a separate chapter for it, another option > is > > to put it as a subchapter in chapter 15 (cookbook). > > I think it would be better to make it a separate chapter, similar to > one > > the ones discussing Bio.popgen or bio.phylo, So i thought it would make > > sense to create it as a new chapter 13, entitled Sequence motif analysis > > with Bio.Motif > > I agree, create a new chapter (and add yourself to the authors list). > I'd definitely put it before the "Cookbook Chapter", and between the > Phylogenetics and "Supervised learning methods" chapters seems > reasonable. > > > -second, I have links and references to papers in there. The question > would > > be should I remove those to keep to the style of the tutorial > > Keep them - links to external webpages are fine - they work well in both > PDF > and HTML. For references we currently don't have a formal bibliography - > but > we do have some existing case of links to papers already, e.g. > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:SeqIO-fastq-conversion > > Peter > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Oct 26 21:53:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 22:53:47 +0100 Subject: [Biopython-dev] Tests in python 3.1.2 In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 9:41 PM, Dragoslav Zaric wrote: > Dear Peter, > > I installed python 3.1.2, than run > > 2to3 -w biopython Don't do that - see our README file: 2to3 --nofix=long --no-diffs -n -w Bio BioSQL Tests Scripts Doc/examples 2to3 --nofix=long --no-diffs -n -w -d Bio BioSQL Tests Scripts Doc/examples You have to run 2to3 twice (strange design choice in the tool, this is once for the code, and again with -d for the doctests which are code examples within the docstring comments). You also need to turn off the "long" fixer (otherwise it causes problems in Bio.Phylo). > and after that > > python3.1 run_tests.py > > I capture screen output in log.txt file that I am sending you in attachment. > > Based on this log, can you advise me which way to go. Fix error one by one, > or maybe I made mistake in installation/upgrade. The attachment is too big for the mailing list, so your message was rejected. I hope that helps. Peter From biopython at maubp.freeserve.co.uk Tue Oct 26 22:01:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 26 Oct 2010 23:01:08 +0100 Subject: [Biopython-dev] Moving Bio.Motif documentation into Tutorial.tex In-Reply-To: References: Message-ID: On Tue, Oct 26, 2010 at 9:46 PM, Bartek Wilczynski wrote: > Hi all, > > I've added the Bio.Motif section to the tutorial and pushed this to github. > I was able to build the tutorial in pdf, but I'm not sure about the html > version and whether it works for other people. > > Any other comments are welcome as well > > cheers > Bartek Thanks Bartek - the HTML looks fine (but I haven't read it all yet): http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html That should be updated automatically by a cron task running under my username - let me know if it looks out of date. Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 10:22:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:22:59 +0100 Subject: [Biopython-dev] Bio.Motif and FASTA output Message-ID: Hi Bartek, I noticed a concern with one of your examples in the tutorial, going from a Motif object to FASTA format, >>> print m.format("fasta") > instance 0 TATAA > instance 1 TATTA > instance 2 TATAA > instance 3 TATAA Our FASTA parser will treat that has having no identifiers (because it goes greater than sign, space, text). How about this: >>> print m.format("fasta") >instance0 TATAA >instance1 TATTA >instance2 TATAA >instance3 TATAA With the above output, each sequence gets a unique identifier. Peter From barwil at gmail.com Wed Oct 27 10:34:52 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 12:34:52 +0200 Subject: [Biopython-dev] Bio.Motif and FASTA output In-Reply-To: References: Message-ID: Thanks for spotting the problem. Fixed now. cheers Bartek On Wed, Oct 27, 2010 at 12:22 PM, Peter wrote: > Hi Bartek, > > I noticed a concern with one of your examples in the tutorial, going > from a Motif object to FASTA format, > > >>> print m.format("fasta") > > instance 0 > TATAA > > instance 1 > TATTA > > instance 2 > TATAA > > instance 3 > TATAA > > Our FASTA parser will treat that has having no identifiers (because > it goes greater than sign, space, text). How about this: > > >>> print m.format("fasta") > >instance0 > TATAA > >instance1 > TATTA > >instance2 > TATAA > >instance3 > TATAA > > With the above output, each sequence gets a unique identifier. > > Peter > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Oct 27 10:34:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:34:39 +0100 Subject: [Biopython-dev] Bio.Motif length Message-ID: Hi Bartek, (Another query after scanning over your new text in the tutorial) Why do you have motif.length when len(motif) seems to do basically the same thing? Can we deprecate the length property (Zen of Python: There should be one -- and preferably only one -- obvious way to do it)? Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 10:46:21 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 11:46:21 +0100 Subject: [Biopython-dev] Bio.Motif and FASTA output In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 11:34 AM, Bartek Wilczynski wrote: > Thanks for spotting the problem. Fixed now. > > cheers > Bartek Thanks. BTW - Do you have two git usernames? Your recent commits show up as authored by barwil but committed by bartekw - curious. Peter From barwil at gmail.com Wed Oct 27 10:53:58 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 12:53:58 +0200 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: Hi Peter, On Wed, Oct 27, 2010 at 12:34 PM, Peter wrote: > > Why do you have motif.length when len(motif) seems to do > basically the same thing? Can we deprecate the length > property (Zen of Python: There should be one -- and > preferably only one -- obvious way to do it)? > > I guess this is there just out of habit. I know that the .length property and I tend to use it, but I agree that in the tutorial we should use len(m) instead of m.length. Speaking more globally, the length property is there from the beginning, I don't think we should remove it. If we really want to make the API clean, we could rename it to m._length to indicate that it should not be used directly (especially setting it to some other value could have unwanted consequences). I can make the change in the tutorial (I need to change the expected output of m.format("fasta") anyway), but making the change from .length to ._length in the code would require a bit more time to make sure I'm not using it anywhere in the code. What is your suggestion here? cheers B From biopython at maubp.freeserve.co.uk Wed Oct 27 11:03:15 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 12:03:15 +0100 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 11:53 AM, Bartek Wilczynski wrote: > Hi Peter, > > On Wed, Oct 27, 2010 at 12:34 PM, Peter wrote: > >> >> Why do you have motif.length when len(motif) seems to do >> basically the same thing? Can we deprecate the length >> property (Zen of Python: There should be one -- and >> preferably only one -- obvious way to do it)? >> > > I guess this is there just out of habit. I know that the .length property > and I tend to use it, but I agree that in the tutorial we should use len(m) > instead of m.length. > > Speaking more globally, the length property is there from the beginning, I > don't think we should remove it. If we really want to make the API clean, we > could rename it to m._length to indicate that it should not be used directly > (especially setting it to some other value could have unwanted > consequences). > > I can make the change in the tutorial (I need to change the expected output > of m.format("fasta") anyway), but making the change from .length to ._length > in the code would require a bit more time to make sure I'm not using it > anywhere in the code. What is your suggestion here? What I would suggest is right now: (1) Use len(...) in the tutorial and any docstrings. Also in the __len__ docstring you could mention that using the .length property is discouraged. Then later as your time permits, (2) Rename self.length to self._length throughout the code, check tests pass (3) Add a property length which acts as a proxy for self._length and say in the docstring that you encourage len(...) instead. This is to ensure existing code using .length still works. Then later, (4) Add a deprecation warning to the new length property. One year and two releases later: (5) Remove the length property (leaving the private _length property only). Peter From barwil at gmail.com Wed Oct 27 13:10:46 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Wed, 27 Oct 2010 15:10:46 +0200 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: Hi, On Wed, Oct 27, 2010 at 1:03 PM, Peter wrote: > > What I would suggest is right now: > > (1) Use len(...) in the tutorial and any docstrings. Also in the > __len__ docstring you could mention that using the .length > property is discouraged. > > This is now done and commited to the trunk. > Then later as your time permits, > > (2) Rename self.length to self._length throughout the code, check > tests pass > (3) Add a property length which acts as a proxy for self._length > and say in the docstring that you encourage len(...) instead. > This is to ensure existing code using .length still works. > > I'll put these things on my todo list, and I'll make them on a branch, not to mess things up. > Then later, > > (4) Add a deprecation warning to the new length property. > > One year and two releases later: > > (5) Remove the length property (leaving the private _length > property only). > > Is there a scheduled time for the next release? I'm just asking to see whether I can try to still squeeze it into the nearest release or it will need to wait for the next one. Thanks for your input Bartek From biopython at maubp.freeserve.co.uk Wed Oct 27 13:25:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 14:25:02 +0100 Subject: [Biopython-dev] Bio.Motif length In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 2:10 PM, Bartek Wilczynski wrote: > > Is there a scheduled time for the next release? I'm just asking to see > whether I can try to still squeeze it into the nearest release or it will > need to wait for the next one. > I was thinking some point next month (November 2010), certainly we want to do this well before the end of the year (when the NCBI will be changing the DTD files for Entrez). Peter From zaricdragoslav at gmail.com Wed Oct 27 15:19:13 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Wed, 27 Oct 2010 19:19:13 +0400 Subject: [Biopython-dev] upgrade to python 3.1.2 Message-ID: Hi Peter, I did everything from scratch, get biopython with git, than tun those two commands for 2to3 from README file and at the end I run run_tests.py from Tests folder. I am sending you log file just to check am I on right track. I will continue to investigate errors. One error is related to numpy module, so I will try to install numpy for python 3.1.2 Anyway, two test FAIL, test_SeqIO_online and test_Wise Kind regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics -------------- next part -------------- test_Ace ... ok test_AlignIO ... ok test_AlignIO_convert ... ok test_BioSQL ... /home/maiev/work/biopython/BioSQL/Loader.py:799: UserWarning: order location operators are not fully supported % feature.location_operator) ok test_BioSQL_SeqIO ... /home/maiev/work/biopython/BioSQL/Loader.py:799: UserWarning: bond location operators are not fully supported % feature.location_operator) ok test_CAPS ... ok test_Clustalw ... /home/maiev/work/biopython/Bio/Clustalw/__init__.py:83: PendingDeprecationWarning: This function is obsolete, and any new code should call Bio.AlignIO instead. warnings.warn("This function is obsolete, and any new code should call Bio.AlignIO instead.", PendingDeprecationWarning) ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsBitmaps ... skipping. Install ReportLab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... skipping. Install NumPy if you want to use Bio.KDTree. test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... skipping. This deprecated module doesn't work on Python 3. test_LogisticRegression ... skipping. Install NumPy if you want to use Bio.LogisticRegression. test_Mafft_tool ... skipping. Install MAFFT if you want to use the Bio.Align.Applications wrapper. test_MarkovModel ... skipping. Install NumPy if you want to use Bio.MarkovModel. test_Medline ... ok test_Motif ... ok test_Muscle_tool ... skipping. Install MUSCLE if you want to use the Bio.Align.Applications wrapper. test_NCBIStandalone ... /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:53: PendingDeprecationWarning: The plain text parser in this module still works at the time of writing, but is considered obsolete and updating it to cope with the latest versions of BLAST is not a priority for us. warnings.warn("The plain text parser in this module still works at the time of writing, but is considered obsolete and updating it to cope with the latest versions of BLAST is not a priority for us.", PendingDeprecationWarning) /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:1850: PendingDeprecationWarning: This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastpgpCommandline instead. warnings.warn("This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastpgpCommandline instead.", PendingDeprecationWarning) /home/maiev/work/biopython/Bio/Blast/NCBIStandalone.py:1970: PendingDeprecationWarning: This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastrpsCommandline instead. warnings.warn("This function is obsolete, you are encouraged to the command line wrapper Bio.Blast.Applications.BlastrpsCommandline instead.", PendingDeprecationWarning) ok test_NCBITextParser ... ok test_NCBIXML ... ok test_NCBI_BLAST_tools ... skipping. Install the NCBI BLAST+ command line tools if you want to use the Bio.Blast.Applications wrapper. test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... skipping. Install NumPy if you want to use Bio.PDB. test_PDB_KDTree ... skipping. Install NumPy if you want to use Bio.PDB. test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_Phylo ... ok test_PhyloXML ... ok test_Phylo_depend ... skipping. Install NetworkX if you want to use Bio.Phylo._utils. test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_EasyController ... skipping. Install GenePop if you want to use Bio.PopGen.GenePop. test_PopGen_GenePop_nodepend ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_Prank_tool ... skipping. Install PRANK if you want to use the Bio.Align.Applications wrapper. test_Probcons_tool ... skipping. Install PROBCONS if you want to use the Bio.Align.Applications wrapper. test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SVDSuperimposer ... skipping. Install NumPy if you want to use Bio.SVDSuperimposer. test_SeqIO ... ok test_SeqIO_FastaIO ... ok test_SeqIO_QualityIO ... ok test_SeqIO_convert ... ok test_SeqIO_features ... ok test_SeqIO_index ... skipping. Skipping since currently this is very slow on Python 3. test_SeqIO_online ... FAIL test_SeqRecord ... ok test_SeqUtils ... ok test_Seq_objs ... ok test_SubsMat ... ok test_SwissProt ... ok test_TCoffee_tool ... skipping. Install TCOFFEE if you want to use the Bio.Align.Applications wrapper. test_UniGene ... ok test_UniGene_obsolete ... ok test_Wise ... FAIL test_align ... ok test_geo ... ok test_kNN ... ERROR test_lowess ... skipping. Install NumPy if you want to use Bio.Statistics.lowess. test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... skipping. This deprecated module doesn't work on Python 3. test_prosite1 ... ok test_prosite2 ... ok test_prosite_patterns ... skipping. The (deprecated) Bio.Prosite module uses the Python library sgmllib which is not supported on Python 3 test_psw ... ok test_seq ... ok test_translate ... ok test_trie ... skipping. Could not import Bio.trie, check C code was compiled. Bio.Alphabet docstring test ... ok Bio.Application docstring test ... ok Bio.SeqFeature docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.SeqIO.AceIO docstring test ... ok Bio.SeqIO.PhdIO docstring test ... ok Bio.SeqIO.QualityIO docstring test ... ok Bio.SeqIO.SffIO docstring test ... ok Bio.SeqUtils docstring test ... ok Bio.Align docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.AlignIO.StockholmIO docstring test ... ok Bio.Blast.Applications docstring test ... ok Bio.Clustalw docstring test ... ok Bio.Emboss.Applications docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Motif docstring test ... ok ====================================================================== ERROR: test_nuccore_X52960 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(nuccore, X52960, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== ERROR: test_nucleotide_6273291 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(nucleotide, 6273291, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== ERROR: test_protein_16130152 (test_SeqIO_online.EntrezTests) Bio.Entrez.efetch(protein, 16130152, ...) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 85, in method = lambda x : x.simple(d, f, e, l, c) File "/home/maiev/work/biopython/Tests/test_SeqIO_online.py", line 63, in simple record = SeqIO.read(handle, f) File "/home/maiev/work/biopython/Bio/SeqIO/__init__.py", line 585, in read first = next(iterator) File "/home/maiev/work/biopython/Bio/SeqIO/FastaIO.py", line 39, in FastaIterator if line[0] == ">": IndexError: index out of range ====================================================================== FAIL: test_dnal (test_Wise.TestWiseDryRun) Call dnal, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 26, in test_dnal self.assertTrue(sys.stdout.getvalue().startswith("dnal -kbyte 100000 seq1.fna seq2.fna")) AssertionError: False is not True ====================================================================== FAIL: test_psw (test_Wise.TestWiseDryRun) Call psw, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 31, in test_psw self.assertTrue(sys.stdout.getvalue().startswith("psw -kbyte 4 seq1.faa seq2.faa")) AssertionError: False is not True ====================================================================== ERROR: test_kNN ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_kNN.py", line 12, in import numpy ImportError: No module named numpy During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run_tests.py", line 289, in runTest suite = unittest.TestLoader().loadTestsFromName(name) File "/usr/local/lib/python3.1/unittest.py", line 1266, in loadTestsFromName module = __import__('.'.join(parts_copy)) File "/home/maiev/work/biopython/Tests/test_kNN.py", line 15, in raise MissingPythonDependencyError( NameError: name 'MissingPythonDependencyError' is not defined ---------------------------------------------------------------------- Ran 140 tests in 478.298 seconds FAILED (failures = 3) From biopython at maubp.freeserve.co.uk Wed Oct 27 15:33:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 16:33:05 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 4:19 PM, Dragoslav Zaric wrote: > Hi Peter, > > I did everything from scratch, get biopython with git, than tun those > two commands for 2to3 from README file and at the end I run > run_tests.py from Tests folder. > > I am sending you log file just to check am I on right track. I will > continue to investigate errors. > > One error is related to numpy module, so I will try to install numpy > for python 3.1.2 > > Anyway, two test FAIL, test_SeqIO_online and test_Wise > > Kind regards The problem with test_kNN.py was my mistake - it is meant to be skipped when numpy is not installed. Fixed here: http://github.com/biopython/biopython/commit/2ae15f94e7e90b237e982145f9697157ed1f801e The "IndexError: index out of range" problem on Python 3 with test_SeqIO_online.py is the known failure I mentioned before. This is to do with bytes versus unicode handles. The output from test_Wise.py is unexpected through (I don't have Wise installed on my Mac - I should do that): ====================================================================== FAIL: test_psw (test_Wise.TestWiseDryRun) Call psw, and do a trivial check on its output. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/maiev/work/biopython/Tests/test_Wise.py", line 31, in test_psw self.assertTrue(sys.stdout.getvalue().startswith("psw -kbyte 4 seq1.faa seq2.faa")) AssertionError: False is not True Hopefully with the following change we'll get a more useful message: http://github.com/biopython/biopython/commit/811f5ced0305fa41539b8867c594a119135ef682 Could you update your Biopython and re-test? You'll have to repeat the 2to3 conversion, e.g. git reset --hard 2to3 ... etc Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 15:46:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 16:46:13 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 4:40 PM, Dragoslav Zaric wrote: > > ok, will do that and send you log file again, > You can just cut and paste the error messages - that should be all we need. Thanks, Peter From biopython at maubp.freeserve.co.uk Wed Oct 27 16:35:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 27 Oct 2010 17:35:11 +0100 Subject: [Biopython-dev] upgrade to python 3.1.2 In-Reply-To: References: Message-ID: On Wed, Oct 27, 2010 at 5:14 PM, Dragoslav Zaric wrote: > > Ok, errors: > > test_SeqIO_online ... FAIL > test_Wise ... FAIL > > ====================================================================== > ERROR: test_nuccore_X52960 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(nuccore, X52960, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range > > ====================================================================== > ERROR: test_nucleotide_6273291 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(nucleotide, 6273291, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range > > ====================================================================== > ERROR: test_protein_16130152 (test_SeqIO_online.EntrezTests) > Bio.Entrez.efetch(protein, 16130152, ...) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > IndexError: index out of range We're ignoring the above problem with test_SeqIO_online.py on Python 3 for now. > ====================================================================== > FAIL: test_dnal (test_Wise.TestWiseDryRun) > Call dnal, and do a trivial check on its output. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/maiev/work/biopython/Tests/test_Wise.py", line 27, in test_dnal > ? ?self.assertTrue(output.startswith("dnal -kbyte 100000 seq1.fna > seq2.fna"), output[:200]) > AssertionError: dnal -kbyte 100000 -quiet seq1.fna seq2.fna > /tmp/tmpEVkZM8 > > > ====================================================================== > FAIL: test_psw (test_Wise.TestWiseDryRun) > Call psw, and do a trivial check on its output. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/maiev/work/biopython/Tests/test_Wise.py", line 33, in test_psw > ? ?self.assertTrue(output.startswith("psw -kbyte 4 seq1.faa > seq2.faa"), output[:200]) > AssertionError: psw -kbyte 4 -quiet seq1.faa seq2.faa > /tmp/tmpOJ3QL3 I remember this issue now: http://lists.open-bio.org/pipermail/biopython-dev/2010-June/007904.html (very end) ... http://lists.open-bio.org/pipermail/biopython-dev/2010-June/007908.html This was due to the psw/dnal wrappers sometimes automatically including the command line switch -quiet switch. It happens if you redirect the unit test output to a file. This change should solve it: http://github.com/biopython/biopython/commit/4f430adad7a5b8bc021dec8b188963ca76612393 Thanks! Peter From tiagoantao at gmail.com Wed Oct 27 18:36:46 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 27 Oct 2010 19:36:46 +0100 Subject: [Biopython-dev] README and python3 Message-ID: Hi, Just a minor issue with the README and python3. The option --nofix does not exist in 2to3 for the 2.x version. So that line will not work if the 2to3 happens to be from Python 2.X (can happen if you have several versions installed). -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Thu Oct 28 09:17:09 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 10:17:09 +0100 Subject: [Biopython-dev] README and python3 In-Reply-To: References: Message-ID: 2010/10/27 Tiago Ant?o : > Hi, > > Just a minor issue with the README and python3. > The option --nofix does not exist in 2to3 for the 2.x version. So that > line will not work if the 2to3 happens to be from Python 2.X (can > happen if you have several versions installed). > Hi Tiago, Can you work out which version of 2to3 lacks the --nofix (or -x) option, and which version of Python it came from? The (Apple provided) Python 2.6.1 on my Mac seems to have a 2to3 with the --nofix option, and I don't have Python 3 installed on this machine. In addition to running 2to3 as a command line script, you can call the library from within Python: $ python2.6 Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: refactor.py [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations (fixes/fix_*.py) -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. Likewise on our Linux server the 2to3 from Python 2.6.6, 2.7 and 3.1.2 all seem to have it: $ python2.6 Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. $ python2.7 Python 2.7 (r27:82500, Jul 13 2010, 14:02:41) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. $ python3.1 Python 3.1.2 (r312:79147, Jul 15 2010, 12:43:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lib2to3.main import main >>> main("lib2to3.fixes", ["--help"]) Usage: 2to3 [options] file|dir ... Options: -h, --help show this help message and exit -d, --doctests_only Fix up doctests only -f FIX, --fix=FIX Each FIX specifies a transformation; default: all -j PROCESSES, --processes=PROCESSES Run 2to3 concurrently -x NOFIX, --nofix=NOFIX Prevent a fixer from being run. -l, --list-fixes List available transformations (fixes/fix_*.py) -p, --print-function Modify the grammar so that print() is a function -v, --verbose More verbose logging --no-diffs Don't show diffs of the refactoring -w, --write Write back modified files -n, --nobackups Don't write backups for modified files. Note that we *need* the --nofix option for the conversion of Bio.Phylo to work (it uses long as an argument name, short longitude). Peter From zaricdragoslav at gmail.com Thu Oct 28 14:16:34 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 18:16:34 +0400 Subject: [Biopython-dev] _pwm.c Message-ID: Dear Peter, I wrote you this yesterday: I put this in setup.py: class build_ext_biopython(build_ext): def run(self): if not check_dependencies_once(): return # add software that requires NumPy to install # TODO - Convert these for Python 3 if is_Numpy_installed(): import numpy numpy_include_dir = numpy.get_include() #self.extensions.append( # Extension('Bio.Cluster.cluster', # ['Bio/Cluster/clustermodule.c', # 'Bio/Cluster/cluster.c'], # include_dirs=[numpy_include_dir], # )) #self.extensions.append( # Extension('Bio.KDTree._CKDTree', # ["Bio/KDTree/KDTree.c", # "Bio/KDTree/KDTreemodule.c"], # include_dirs=[numpy_include_dir], # )) self.extensions.append( Extension('Bio.Motif._pwm', ["Bio/Motif/_pwm.c"], include_dirs=[numpy_include_dir], )) build_ext.run(self) and than I run: python3.1 setup.py build_ext This is output: Biopython does not yet officially support Python 3, but you can try it by first using the 2to3 script on our source code. For details on how to use 2to3 with Biopython see README. If you still haven't applied 2to3 to Biopython please abort now. Do you want to continue this installation? (y/N): y running build_ext building 'Bio.Motif._pwm' extension creating build/temp.linux-i686-3.1 creating build/temp.linux-i686-3.1/Bio creating build/temp.linux-i686-3.1/Bio/Motif gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/lib/python3.1/site-packages/numpy/core/include -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o build/temp.linux-i686-3.1/Bio/Motif/_pwm.o Bio/Motif/_pwm.c: In function ?init_pwm?: Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function returning void Bio/Motif/_pwm.c:125: warning: implicit declaration of function ?Py_InitModule4? Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer without a cast gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o build/lib/Bio/Motif/_pwm.so So as you can see this is compiling, but there are some warnings. So what is plan, to compile totally without warnings ?? regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 14:27:04 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 15:27:04 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:16 PM, Dragoslav Zaric wrote: > Dear Peter, > > I wrote you this yesterday: > > I put this in setup.py: > > class build_ext_biopython(build_ext): > ? def run(self): > ? ? ? if not check_dependencies_once(): > ? ? ? ? ? return > ? ? ? # add software that requires NumPy to install > ? ? ? # TODO - Convert these for Python 3 > ? ? ? if is_Numpy_installed(): > ? ? ? ? ? import numpy > ? ? ? ? ? numpy_include_dir = numpy.get_include() > ? ? ? ? ? #self.extensions.append( > ? ? ? ? ? # ? ?Extension('Bio.Cluster.cluster', > ? ? ? ? ? # ? ? ? ? ? ? ?['Bio/Cluster/clustermodule.c', > ? ? ? ? ? # ? ? ? ? ? ? ? 'Bio/Cluster/cluster.c'], > ? ? ? ? ? # ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? # ? ? ? ? ? ? ?)) > ? ? ? ? ? #self.extensions.append( > ? ? ? ? ? # ? ?Extension('Bio.KDTree._CKDTree', > ? ? ? ? ? # ? ? ? ? ? ? ?["Bio/KDTree/KDTree.c", > ? ? ? ? ? # ? ? ? ? ? ? ? "Bio/KDTree/KDTreemodule.c"], > ? ? ? ? ? # ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? # ? ? ? ? ? ? ?)) > ? ? ? ? ? self.extensions.append( > ? ? ? ? ? ? ? Extension('Bio.Motif._pwm', > ? ? ? ? ? ? ? ? ? ? ? ? ["Bio/Motif/_pwm.c"], > ? ? ? ? ? ? ? ? ? ? ? ? include_dirs=[numpy_include_dir], > ? ? ? ? ? ? ? ? ? ? ? ? )) > ? ? ? build_ext.run(self) > > and than I run: > > python3.1 setup.py build_ext > > This is output: > > Biopython does not yet officially support Python 3, but you > can try it by first using the 2to3 script on our source code. > For details on how to use 2to3 with Biopython see README. > If you still haven't applied 2to3 to Biopython please abort now. > Do you want to continue this installation? (y/N): > y > running build_ext > building 'Bio.Motif._pwm' extension > creating build/temp.linux-i686-3.1 > creating build/temp.linux-i686-3.1/Bio > creating build/temp.linux-i686-3.1/Bio/Motif > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC > -I/usr/local/lib/python3.1/site-packages/numpy/core/include > -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o > build/temp.linux-i686-3.1/Bio/Motif/_pwm.o > Bio/Motif/_pwm.c: In function ?init_pwm?: > Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function returning void > Bio/Motif/_pwm.c:125: warning: implicit declaration of function ?Py_InitModule4? > Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer > without a cast > gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o > build/lib/Bio/Motif/_pwm.so > > > So as you can see this is compiling, but there are some warnings. So what is > plan, to compile totally without warnings ?? Well ideally no warnings - but of those three warnings only the one about Py_InitModule4 strikes me as important. This was part of the Python 2.x C API used to tell Python about the functions your code provides, and has been changed in Python 3.x (I think you must use PyModule_Create instead). What happens if you try to use the compiled module in Python 3? e.g. from Bio import Motif from Bio.Motif import _pwm Bartek - could you give us a short (Python 2) example of Bio.Motif which uses the C module _pwm? Peter From barwil at gmail.com Thu Oct 28 14:37:18 2010 From: barwil at gmail.com (Bartek Wilczynski) Date: Thu, 28 Oct 2010 16:37:18 +0200 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 4:27 PM, Peter wrote: > On Thu, Oct 28, 2010 at 3:16 PM, Dragoslav Zaric > wrote: > > running build_ext > > building 'Bio.Motif._pwm' extension > > creating build/temp.linux-i686-3.1 > > creating build/temp.linux-i686-3.1/Bio > > creating build/temp.linux-i686-3.1/Bio/Motif > > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall > > -Wstrict-prototypes -fPIC > > -I/usr/local/lib/python3.1/site-packages/numpy/core/include > > -I/usr/local/include/python3.1 -c Bio/Motif/_pwm.c -o > > build/temp.linux-i686-3.1/Bio/Motif/_pwm.o > > Bio/Motif/_pwm.c: In function ?init_pwm?: > > Bio/Motif/_pwm.c:123: warning: ?return? with a value, in function > returning void > > Bio/Motif/_pwm.c:125: warning: implicit declaration of function > ?Py_InitModule4? > > Bio/Motif/_pwm.c:129: warning: assignment makes pointer from integer > > without a cast > > gcc -pthread -shared build/temp.linux-i686-3.1/Bio/Motif/_pwm.o -o > > build/lib/Bio/Motif/_pwm.so > > > > > > So as you can see this is compiling, but there are some warnings. So what > is > > plan, to compile totally without warnings ?? > > Well ideally no warnings - but of those three warnings only the one about > Py_InitModule4 strikes me as important. This was part of the Python 2.x > C API used to tell Python about the functions your code provides, and has > been changed in Python 3.x (I think you must use PyModule_Create instead). > > What happens if you try to use the compiled module in Python 3? e.g. > > from Bio import Motif > from Bio.Motif import _pwm > > Bartek - could you give us a short (Python 2) example of Bio.Motif > which uses the C module _pwm? > Hi, this is the fast implementation of DNA motif searching written by Michiel some time ago. It is exposed in the Bio.Motif API in the form of .scanPWM method: Definition: m.scanPWM(self, seq) Docstring: Matrix of log-odds scores for a nucleotide sequence. scans (using a fast C extension) a nucleotide sequence and returns the matrix of log-odds scores for all positions - the result is a one-dimensional numpy array - the sequence can only be a DNA sequence - the search is performed only on one strand It's a very simple module so it should be relatively easy to convert it to python3. Unfortunately, I have no experience in c extensions so I cannot help much. If you need a snippet for testing, you can use this: from Bio import Seq from Bio import Motif m=Motif.read(open("Doc/cookbook/motif/SRF.pfm"),"jaspar-pfm") m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) result should be: array([-29.18363571, -38.3365097 , -29.17756271, -38.04542542, -20.3014183 , -25.18009186], dtype=float32) hope this helps -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Thu Oct 28 15:54:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 16:54:07 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:37 PM, Bartek Wilczynski wrote: > > this is the fast implementation of DNA motif searching written by Michiel > some time ago. It is exposed in the Bio.Motif API in the form of .scanPWM > method: > On a related topic, is there a pure Python fall back for _pwm.c in Bio.Motif? If not, would it be easy to add (e.g. for Jython). Thanks, Peter From zaricdragoslav at gmail.com Thu Oct 28 16:24:05 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 20:24:05 +0400 Subject: [Biopython-dev] Build & Test Message-ID: Hi Peter, To bring me up to speed for build&test process, can I ask you how exactly this process should go. For example if I put this in setup.py file: class build_ext_biopython(build_ext): def run(self): if not check_dependencies_once(): return if is_Numpy_installed(): import numpy numpy_include_dir = numpy.get_include() self.extensions.append( Extension('Bio.Motif._pwm', ["Bio/Motif/_pwm.c"], include_dirs=[numpy_include_dir], )) build_ext.run(self) what command I should run from command line to build and test: python3.1 setup.py build or/and python3.1 setup.py install After this I will have folder build/lib/Bio, so should I go to folder build/lib and start python3.1 to test this, or after python3.1 setup.py install it is copied to root folder. Also after building I will have -pwm.so file, is this final file that is imported from python code ? Currently i can import Seq and Motif but when I run m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) I get: Traceback (most recent call last): File "", line 1, in File "Bio/Motif/_Motif.py", line 778, in scanPWM import _pwm ImportError: No module named _pwm regards -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 16:36:31 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:36:31 +0100 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 5:24 PM, Dragoslav Zaric wrote: > Hi Peter, > > To bring me up to speed for build&test process, can I ask you how > exactly this process should go. > > For example if I put this in setup.py file: > > class build_ext_biopython(build_ext): > ? ?def run(self): > ? ? ? ?if not check_dependencies_once(): > ? ? ? ? ? ?return > ? ? ? ?if is_Numpy_installed(): > ? ? ? ? ? ?import numpy > ? ? ? ? ? ?numpy_include_dir = numpy.get_include() > ? ? ? ? ? ?self.extensions.append( > ? ? ? ? ? ? ? ?Extension('Bio.Motif._pwm', > ? ? ? ? ? ? ? ? ? ? ? ? ?["Bio/Motif/_pwm.c"], > ? ? ? ? ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], > ? ? ? ? ? ? ? ? ? ? ? ? ?)) > ? ? ? ?build_ext.run(self) > > what command I should run from command line to build and test: > > python3.1 setup.py build > or/and > python3.1 setup.py install > > After this I will have folder build/lib/Bio, so should I go to folder build/lib > and start python3.1 to test this, or after python3.1 setup.py install it is > copied to root folder. You can do this: python3.1 setup,py build python3.1 setup.py test and it should use the compiled C code from the build folder. This is equivalent to: python3.1 setup,py build cd Tests python3.1 run_tests.py The advantage of calling run_tests.py directly is you can test particular bits of Biopython rather than all of it, e.g. python3.1 setup,py build cd Tests python3.1 run_tests.py test_Motif.py If you try and run a test directly (e.g. python3.31 test_Motif.py) then in will use the installed version of Biopython (and it will fail if you haven't installed Biopython). > Also after building I will have -pwm.so file, is this final file that > is imported from python code ? > > Currently i can import Seq and Motif but when I run > > m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) > > I get: > > Traceback (most recent call last): > ?File "", line 1, in > ?File "Bio/Motif/_Motif.py", line 778, in scanPWM > ? ?import _pwm > ImportError: No module named _pwm > On Python 2.6 (on my Mac) I have these files: $ ls build/lib.macosx-10.6-universal-2.6/Bio/Motif/ Applications Thresholds.py _Motif.py __init__.py _pwm.so Parsers Thresholds.pyc _Motif.pyc __init__.pyc Trying to import _pwm will load the _pwm.so file. It sounds like you were able to compile _pwm.so under Python 3, but it doesn't import. Is your _pwm.c file still using Py_InitModule4 or have you changed it to something Python 3 compatible yet like PyModule_Create? [It may not have been clear to you earlier, but porting Python C extensions from Python 2 to Python 3 requires quite a lot of background knowledge about Python, C, compiling, and so on. I hope this wasn't too ambitious.] Regards, Peter From biopython at maubp.freeserve.co.uk Thu Oct 28 16:45:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 17:45:52 +0100 Subject: [Biopython-dev] _pwm.c In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 3:37 PM, Bartek Wilczynski wrote: > > If you need a snippet for testing, you can use this: > from Bio import Seq > from Bio import Motif > m=Motif.read(open("Doc/cookbook/motif/SRF.pfm"),"jaspar-pfm") > m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) > > result should be: > array([-29.18363571, -38.3365097 , -29.17756271, -38.04542542, -20.3014183 , > -25.18009186], dtype=float32) > > hope this helps I've made that into a new unit test, file Tests/test_Motif_pwm.py http://github.com/biopython/biopython/commit/b265f341352b7c59ceaf7fa0fc4bfafc32185408 Peter From devaniranjan at gmail.com Thu Oct 28 16:49:41 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 12:49:41 -0400 Subject: [Biopython-dev] RMSD calculation Message-ID: I was wondering why there is two functions for calculating RMSD 1)in the SVDSuperimposer() 2)in PDB.Superimposer() In the code its says RMS-is RMS being calculated instead of RMSD??? I ask because VMD gives a different value for RMSD to the one from Biopython Thank you From biopython at maubp.freeserve.co.uk Thu Oct 28 17:04:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 18:04:53 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 5:49 PM, George Devaniranjan wrote: > I was wondering why there is two functions for calculating RMSD > > 1)in the SVDSuperimposer() > 2)in PDB.Superimposer() Can you clarify? There is no function in Bio.PDB.Superimposer to calculate RMSD (or RMS), but it does call the Bio.SVDSuperimposer module internally. > In the code its says RMS-is RMS being calculated instead of RMSD??? There could be some confusion in the comments about root mean squared (RMS) *deviation* (i.e. statistical standard deviation) versus RMS *distance*. > I ask because VMD gives a different value for RMSD to the one from Biopython Very different? There are bound to be some floating point differences. Peter From devaniranjan at gmail.com Thu Oct 28 17:14:44 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 13:14:44 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: For SVD x, y being arrays-I took this from the example/comment section of SVDSuperimposer.py sup=SVDSuperimposer() sup.set(x, y) sup.run() rms=sup.get_rms() For PDBSuperimposer super_imposer = Bio.PDB.Superimposer() super_imposer.set_atoms(ref_atoms, alt_atoms) super_imposer.apply(alt_model.get_atoms()) print "RMS( using PDB superimposer ) = %0.4f" % ( super_imposer.rms) (I took this from the Warwick example and modified for my use) http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/python/protein_superposition/ Yes there is a difference-for 2 proteins having exact same residues of 36 residues the values from 4 sources are as follows VMD RMSD=1.61 SVD RMSD =3.2 PDB RMSD=3.2 >From the EU Bioinformatics server (link below) RMSD =1.75 (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) So Biopython really is computing the RMSD and not RMS? Thanks you On Thu, Oct 28, 2010 at 1:04 PM, Peter wrote: > On Thu, Oct 28, 2010 at 5:49 PM, George Devaniranjan > wrote: > > I was wondering why there is two functions for calculating RMSD > > > > 1)in the SVDSuperimposer() > > 2)in PDB.Superimposer() > > Can you clarify? There is no function in Bio.PDB.Superimposer to calculate > RMSD (or RMS), but it does call the Bio.SVDSuperimposer module internally. > > > In the code its says RMS-is RMS being calculated instead of RMSD??? > > There could be some confusion in the comments about root mean squared > (RMS) *deviation* (i.e. statistical standard deviation) versus RMS > *distance*. > > > I ask because VMD gives a different value for RMSD to the one from > Biopython > > Very different? There are bound to be some floating point differences. > > Peter > From biopython at maubp.freeserve.co.uk Thu Oct 28 17:46:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 18:46:29 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan wrote: > Yes there is a difference-for 2 proteins having exact same residues of 36 > residues the values from 4 sources are as follows > VMD RMSD=1.61 > SVD RMSD =3.2 > PDB RMSD=3.2 > > From the EU Bioinformatics server (link below) RMSD =1.75 > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) > > So Biopython really is computing the RMSD and not RMS? > Thanks you It has been a while since I looked at this (but I can still edit the Warwick page if is is unclear). Which definition of RMSD are you using? Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. The comment for this code *says* is calculates the RMS deviation, here: diff=coords1-coords2 l=coords1.shape[0] return sqrt(sum(sum(diff*diff))/l) Here variable l will be the number of atoms. What are the two examples you are using? Can you at perhaps share a small example pair of PDB files? Peter From zaricdragoslav at gmail.com Thu Oct 28 18:05:21 2010 From: zaricdragoslav at gmail.com (Dragoslav Zaric) Date: Thu, 28 Oct 2010 22:05:21 +0400 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: I am sorry Peter, but you are right. It is too ambitious for me. I can not do it. I am sorry if I wasted your time. Good luck and all best. On Thu, Oct 28, 2010 at 8:36 PM, Peter wrote: > On Thu, Oct 28, 2010 at 5:24 PM, Dragoslav Zaric > wrote: >> Hi Peter, >> >> To bring me up to speed for build&test process, can I ask you how >> exactly this process should go. >> >> For example if I put this in setup.py file: >> >> class build_ext_biopython(build_ext): >> ? ?def run(self): >> ? ? ? ?if not check_dependencies_once(): >> ? ? ? ? ? ?return >> ? ? ? ?if is_Numpy_installed(): >> ? ? ? ? ? ?import numpy >> ? ? ? ? ? ?numpy_include_dir = numpy.get_include() >> ? ? ? ? ? ?self.extensions.append( >> ? ? ? ? ? ? ? ?Extension('Bio.Motif._pwm', >> ? ? ? ? ? ? ? ? ? ? ? ? ?["Bio/Motif/_pwm.c"], >> ? ? ? ? ? ? ? ? ? ? ? ? ?include_dirs=[numpy_include_dir], >> ? ? ? ? ? ? ? ? ? ? ? ? ?)) >> ? ? ? ?build_ext.run(self) >> >> what command I should run from command line to build and test: >> >> python3.1 setup.py build >> or/and >> python3.1 setup.py install >> >> After this I will have folder build/lib/Bio, so should I go to folder build/lib >> and start python3.1 to test this, or after python3.1 setup.py install it is >> copied to root folder. > > You can do this: > > python3.1 setup,py build > python3.1 setup.py test > > and it should use the compiled C code from the build folder. > This is equivalent to: > > python3.1 setup,py build > cd Tests > python3.1 run_tests.py > > The advantage of calling run_tests.py directly is you can test particular > bits of Biopython rather than all of it, e.g. > > python3.1 setup,py build > cd Tests > python3.1 run_tests.py test_Motif.py > > If you try and run a test directly (e.g. python3.31 test_Motif.py) then > in will use the installed version of Biopython (and it will fail if you > haven't installed Biopython). > >> Also after building I will have -pwm.so file, is this final file that >> is imported from python code ? >> >> Currently i can import Seq and Motif but when I run >> >> m.scanPWM(Seq.Seq("ACGTGTGCGTAGTGCGT",m.alphabet)) >> >> I get: >> >> Traceback (most recent call last): >> ?File "", line 1, in >> ?File "Bio/Motif/_Motif.py", line 778, in scanPWM >> ? ?import _pwm >> ImportError: No module named _pwm >> > > On Python 2.6 (on my Mac) I have these files: > > $ ls build/lib.macosx-10.6-universal-2.6/Bio/Motif/ > Applications ? Thresholds.py ?_Motif.py ? ? ?__init__.py ? ?_pwm.so > Parsers ? ? ? ?Thresholds.pyc _Motif.pyc ? ? __init__.pyc > > Trying to import _pwm will load the _pwm.so file. > > It sounds like you were able to compile _pwm.so under > Python 3, but it doesn't import. Is your _pwm.c file still > using Py_InitModule4 or have you changed it to something > Python 3 compatible yet like PyModule_Create? > > [It may not have been clear to you earlier, but porting > Python C extensions from Python 2 to Python 3 requires > quite a lot of background knowledge about Python, C, > compiling, and so on. I hope this wasn't too ambitious.] > > Regards, > > Peter > -- Dragoslav Zaric Professional Programmer MSc Astrophysics From biopython at maubp.freeserve.co.uk Thu Oct 28 18:30:52 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 28 Oct 2010 19:30:52 +0100 Subject: [Biopython-dev] Build & Test In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 7:05 PM, Dragoslav Zaric wrote: > I am sorry Peter, but you are right. It is too ambitious for me. > > I can not do it. I am sorry if I wasted your time. > > Good luck and all best. You tried - and in the process we've made some small improvements to Biopython (like the new test for the PWM code), so I'm happy. Maybe there is something less complicated you could try... Anyway, thank you for your time. Peter From tiagoantao at gmail.com Thu Oct 28 21:10:02 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 28 Oct 2010 22:10:02 +0100 Subject: [Biopython-dev] README and python3 In-Reply-To: References: Message-ID: This was a local installation (on my home dir). As soon as I noticed the problem I just removed 2to3 (as I had others - more recent). But the python from which it was installed reports >>> print sys.version_info (2, 6, 0, 'final', 0) 2010/10/28 Peter : > 2010/10/27 Tiago Ant?o : >> Hi, >> >> Just a minor issue with the README and python3. >> The option --nofix does not exist in 2to3 for the 2.x version. So that >> line will not work if the 2to3 happens to be from Python 2.X (can >> happen if you have several versions installed). >> > > Hi Tiago, > > Can you work out which version of 2to3 lacks the --nofix (or -x) > option, and which version of Python it came from? > > The (Apple provided) Python 2.6.1 on my Mac seems to have > a 2to3 with the --nofix option, and I don't have Python 3 installed > on this machine. In addition to running 2to3 as a command line > script, you can call the library from within Python: > > $ python2.6 > Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) > [GCC 4.2.1 (Apple Inc. build 5646)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: refactor.py [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations (fixes/fix_*.py) > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > Likewise on our Linux server the 2to3 from Python 2.6.6, 2.7 and > 3.1.2 all seem to have it: > > $ python2.6 > Python 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > $ python2.7 > Python 2.7 (r27:82500, Jul 13 2010, 14:02:41) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > > $ python3.1 > Python 3.1.2 (r312:79147, Jul 15 2010, 12:43:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from lib2to3.main import main >>>> main("lib2to3.fixes", ["--help"]) > Usage: 2to3 [options] file|dir ... > > Options: > ?-h, --help ? ? ? ? ? ?show this help message and exit > ?-d, --doctests_only ? Fix up doctests only > ?-f FIX, --fix=FIX ? ? Each FIX specifies a transformation; default: all > ?-j PROCESSES, --processes=PROCESSES > ? ? ? ? ? ? ? ? ? ? ? ?Run 2to3 concurrently > ?-x NOFIX, --nofix=NOFIX > ? ? ? ? ? ? ? ? ? ? ? ?Prevent a fixer from being run. > ?-l, --list-fixes ? ? ?List available transformations (fixes/fix_*.py) > ?-p, --print-function ?Modify the grammar so that print() is a function > ?-v, --verbose ? ? ? ? More verbose logging > ?--no-diffs ? ? ? ? ? ?Don't show diffs of the refactoring > ?-w, --write ? ? ? ? ? Write back modified files > ?-n, --nobackups ? ? ? Don't write backups for modified files. > > > Note that we *need* the --nofix option for the conversion of > Bio.Phylo to work (it uses long as an argument name, > short longitude). > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From devaniranjan at gmail.com Thu Oct 28 20:42:16 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Thu, 28 Oct 2010 16:42:16 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Hello everyone, I tried with pymol and it gives a value of 1.792 for the RMSD after alignment The EU bioinformatics server gives a value of 1.74 VMD 1.62 But SVD and PDB Superimposer gives a value 3.2 I have attached the 2 PDB files concerned-is it something I am doing in calculating the RMSD using biopython? Thank you On Thu, Oct 28, 2010 at 1:46 PM, Peter wrote: > On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan > wrote: > > Yes there is a difference-for 2 proteins having exact same residues of 36 > > residues the values from 4 sources are as follows > > VMD RMSD=1.61 > > SVD RMSD =3.2 > > PDB RMSD=3.2 > > > > From the EU Bioinformatics server (link below) RMSD =1.75 > > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) > > > > So Biopython really is computing the RMSD and not RMS? > > Thanks you > > It has been a while since I looked at this (but I can still edit > the Warwick page if is is unclear). > > Which definition of RMSD are you using? > > Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. > The comment for this code *says* is calculates the RMS deviation, > here: > > diff=coords1-coords2 > l=coords1.shape[0] > return sqrt(sum(sum(diff*diff))/l) > > Here variable l will be the number of atoms. > > What are the two examples you are using? Can you at perhaps > share a small example pair of PDB files? > > Peter > -------------- next part -------------- A non-text attachment was scrubbed... Name: protein1.pdb Type: chemical/x-pdb Size: 16983 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: protein2.pdb Type: chemical/x-pdb Size: 16981 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Fri Oct 29 11:37:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Oct 2010 12:37:08 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan wrote: > Hello everyone, > I tried with pymol and it gives a value of 1.792 for the RMSD after > alignment > The EU bioinformatics server gives a value of 1.74 > VMD 1.62 > But SVD and PDB Superimposer gives a value 3.2 > I have attached the 2 PDB files concerned-is it something I am doing in > calculating the RMSD using biopython? > Thank you Are you doing the same comparison in each case? For example, are some doing only C-alpha atoms, while others use all atoms? Thanks for the example files - but you forgot the sample Python code ;) import Bio.PDB import numpy structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb") structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb") super_imposer = Bio.PDB.Superimposer() super_imposer.set_atoms(list(structure1.get_atoms()), list(structure2.get_atoms())) print super_imposer.rms This gives 3.19274942026 for all atoms, as you said. Using Bio.SVSuperimposer, coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) from Bio.SVDSuperimposer import SVDSuperimposer sup=SVDSuperimposer() sup.set(coord1, coord2) print sup.run() print sup.get_rms() Again, 3.19274953249 after moving the atoms. Alternatively if we bypass the alignment step and calculate the RMS of the unaligned structures we get a much higher RMS: sup=SVDSuperimposer() print sup._rms(coord1, coord2) #private method, don't use normally 14.8866750536 This matches what I get by doing it explicitly via numpy: import numpy as np coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) assert coord1.shape == coord2.shape diff = coord1-coord2 l = len(diff) #number of atoms from math import sqrt print sqrt(sum(sum(diff*diff))/l) print np.sqrt(np.sum(diff**2)/l) #should give same result as above line (This should be the same calculation as Bio.PDB.Superimposer uses) So, I think there are two potential sources of the disagreement (1) The alignment, and (2) the RMS calculation. Can you use the other tools to get the RMS without aligning the structures? Alternatively, can you save their aligned structures and give that to Biopython? Peter P.S. Why doesn't file protein2.pdb have the elements column? From devaniranjan at gmail.com Fri Oct 29 13:33:36 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Fri, 29 Oct 2010 09:33:36 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Hello Peter, Thanks for the answer-I also got the same values using biopython both SVD and PDB modules (3.2) My concern arose as a result of VMD + pymol + EU Bioinfomatics sever giving a value of 1.7 (well 1.6 for VMD) but biopython giving 3.2. Even if the two groups calculate RMSD differently (that is : what atoms the consider only CA or backbone or all atoms ) there is no way there can be such a big discrepency between biopyhton and VMD/Pymol/EUServer I tried RMSD calculation with VMD WITHOUT alignment and for 14.02 as the answer. For biopython PDB module only CA gives 3.2 while backbone gives 3.1 which is acceptable. Thanks once again, let me know if you have any further thoughts on this. On Fri, Oct 29, 2010 at 12:37 PM, Peter wrote: > On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan > wrote: > > Hello everyone, > > I tried with pymol and it gives a value of 1.792 for the RMSD after > > alignment > > The EU bioinformatics server gives a value of 1.74 > > VMD 1.62 > > But SVD and PDB Superimposer gives a value 3.2 > > I have attached the 2 PDB files concerned-is it something I am doing in > > calculating the RMSD using biopython? > > Thank you > > Are you doing the same comparison in each case? For example, > are some doing only C-alpha atoms, while others use all atoms? > > Thanks for the example files - but you forgot the sample Python code ;) > > import Bio.PDB > import numpy > structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb") > structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb") > super_imposer = Bio.PDB.Superimposer() > super_imposer.set_atoms(list(structure1.get_atoms()), > list(structure2.get_atoms())) > print super_imposer.rms > > This gives 3.19274942026 for all atoms, as you said. Using > Bio.SVSuperimposer, > > coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) > coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) > from Bio.SVDSuperimposer import SVDSuperimposer > sup=SVDSuperimposer() > sup.set(coord1, coord2) > print sup.run() > print sup.get_rms() > > Again, 3.19274953249 after moving the atoms. Alternatively if we > bypass the alignment step and calculate the RMS of the unaligned > structures we get a much higher RMS: > > sup=SVDSuperimposer() > print sup._rms(coord1, coord2) #private method, don't use normally > 14.8866750536 > > This matches what I get by doing it explicitly via numpy: > > import numpy as np > coord1 = np.array([atom.coord for atom in structure1.get_atoms()]) > coord2 = np.array([atom.coord for atom in structure2.get_atoms()]) > assert coord1.shape == coord2.shape > diff = coord1-coord2 > l = len(diff) #number of atoms > from math import sqrt > print sqrt(sum(sum(diff*diff))/l) > print np.sqrt(np.sum(diff**2)/l) #should give same result as above line > > (This should be the same calculation as Bio.PDB.Superimposer uses) > > So, I think there are two potential sources of the disagreement > (1) The alignment, and (2) the RMS calculation. Can you use > the other tools to get the RMS without aligning the structures? > Alternatively, can you save their aligned structures and give > that to Biopython? > > Peter > > P.S. Why doesn't file protein2.pdb have the elements column? > From eric.talevich at gmail.com Fri Oct 29 21:39:55 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 29 Oct 2010 17:39:55 -0400 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan < devaniranjan at gmail.com> wrote: > I was wondering why there is two functions for calculating RMSD > > 1)in the SVDSuperimposer() > 2)in PDB.Superimposer() > > In the code its says RMS-is RMS being calculated instead of RMSD??? > I ask because VMD gives a different value for RMSD to the one from > Biopython > > Hello George, Here's my understanding of it: 1. RMSD and "RMS distance" both mean root mean square deviation, in terms of the distances in 3D space between each corresponding pair of atoms. The RMSD between all atoms in two aligned structures may be different than the RMSD between backbone atoms only. Or, if the two structures don't have the same peptide sequence, that raises another set of issues. 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a simplified wrapper. 3. The SVDSuperimposer module allows you to either (i) align two structures in 3D space and then calculate RMSD, or (ii) just calculate RMSD without spatially (re-)aligning the structures. PDB.Superimposer just does the former. If the structures weren't already aligned, these can yield very different values. 4. There are many ways to perform a structural alignment; SVDSuperimposer implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement more advanced methods. So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer -- it just means VMD found a better alignment between the two structures. Best, Eric From tiagoantao at gmail.com Fri Oct 29 23:23:16 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 29 Oct 2010 23:23:16 +0000 Subject: [Biopython-dev] Continuous integration server Message-ID: Hi all, I've been hacking with buildbot, an integration server. This is to allow continuous testing of Biopython. So that we are alerted of any problems as soon as somebody does a dreadful commit (I have the top 5 of most dreadful commits, so it was fair that I should try to do something about it). Things are still incomplete, but I think it is time to inform the list of this effort... To know more about buildbot you can either go to the buildbot site http://buildbot.net/ or see the draft doc that I have been preparing http://biopython.org/wiki/Continuous_integration There is a draft server here: http://events.open-bio.org:8010/ The cool thing about buildbot is that actual testing is done by volunteer computers. Want to test on OS y, Python version z? You can offer the idle time of your laptop for that... Obvious things missing: 0. First and foremost, see if people like this? 1. Changing the biopython test code to avoid stressing the network (i.e., having a run_tests option that will not test network tests). This to avoid imposing continuous traffic on genbank and friends. This is a show stopper. 2. Maybe warn the mailing list when some fundamental build stops working (e.g. send an email when a python 2.x build stops working) 3. Have test servers with all the applications installed (do you want to volunteer? This is more to do with volunteers) 4. Maybe change run_tests to require all tests to be done. If we are doing integration testing, we want all tests to be done (missing applications or libraries should be an error). As an example, none of my tests are complete 5. Support mac (my access to Mr Job's fashion machines is limited). Again this is more a volunteer issue. 6. Discuss policies: One test a day? Full tests or updates? Full network tests (probably sporadically)? Send emails? 7. Find volunteers to cover several OSes and several Python versions. Assure that people do full tests (i.e. with all applications and libraries) 8. While I have volunteer Windows testing myself, I will not be able to maintain it regularly. Opinions are most welcome -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From devaniranjan at gmail.com Fri Oct 29 23:42:12 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Sat, 30 Oct 2010 00:42:12 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: Thanks Eric and Peter, Your patience in answering this question is very much appreciated. I think Eric maybe right, I tried the RMSD calculation for several structures and VMD does give a lower value for them all. George Thanks once again for all of you for your answers On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich wrote: > On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan < > devaniranjan at gmail.com> wrote: > >> I was wondering why there is two functions for calculating RMSD >> >> 1)in the SVDSuperimposer() >> 2)in PDB.Superimposer() >> >> In the code its says RMS-is RMS being calculated instead of RMSD??? >> I ask because VMD gives a different value for RMSD to the one from >> Biopython >> >> > Hello George, > > Here's my understanding of it: > > 1. RMSD and "RMS distance" both mean root mean square deviation, in terms > of the distances in 3D space between each corresponding pair of atoms. The > RMSD between all atoms in two aligned structures may be different than the > RMSD between backbone atoms only. Or, if the two structures don't have the > same peptide sequence, that raises another set of issues. > > 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a > simplified wrapper. > > 3. The SVDSuperimposer module allows you to either (i) align two structures > in 3D space and then calculate RMSD, or (ii) just calculate RMSD without > spatially (re-)aligning the structures. PDB.Superimposer just does the > former. If the structures weren't already aligned, these can yield very > different values. > > 4. There are many ways to perform a structural alignment; SVDSuperimposer > implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement > more advanced methods. > > So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer > -- it just means VMD found a better alignment between the two structures. > > Best, > Eric > > > From mjldehoon at yahoo.com Sat Oct 30 04:06:06 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 29 Oct 2010 21:06:06 -0700 (PDT) Subject: [Biopython-dev] _pwm.c In-Reply-To: Message-ID: <297370.12835.qm@web62402.mail.re1.yahoo.com> --- On Thu, 10/28/10, Peter wrote: > On a related topic, is there a pure Python fall back for > _pwm.c in Bio.Motif? I added a pure Python fall back just now. Bartek, feel free to modify the code if needed. --Michiel. From mjldehoon at yahoo.com Sat Oct 30 04:13:45 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 29 Oct 2010 21:13:45 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c Message-ID: <80876.89806.qm@web62403.mail.re1.yahoo.com> Hi everybody, I was looking at our C modules to see if they can be made ready for Python 3. I noticed that Bio/cMarkovModelmodule.c currently contains only one function, _logadd, which is used to speed up Bio.MarkovModel. Numpy 1.3 and later contain a function (logaddexp) that does exactly the same as _logadd. Since Bio.MarkovModel itself already uses Numpy, I think we can remove Bio/cMarkovModelmodule.c. For this to work, we either need to require Numpy >= 1.3 in setup.py, or check for logaddexp when importing numpy in Bio.MarkovModel. I think requiring Numpy >= 1.3 in setup.py is better in the long run, so I would prefer that. Any other opinions? --Michiel From biopython at maubp.freeserve.co.uk Sat Oct 30 10:47:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 11:47:46 +0100 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <80876.89806.qm@web62403.mail.re1.yahoo.com> References: <80876.89806.qm@web62403.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 5:13 AM, Michiel de Hoon wrote: > Hi everybody, > > I was looking at our C modules to see if they can be made > ready for Python 3. I noticed that Bio/cMarkovModelmodule.c > currently contains only one function, _logadd, which is used > to speed up Bio.MarkovModel. Numpy 1.3 and later contain > a function (logaddexp) that does exactly the same as _logadd. > Since Bio.MarkovModel itself already uses Numpy, I think > we can remove Bio/cMarkovModelmodule.c. Sounds good :) > For this to work, we either need to require Numpy >= 1.3 > in setup.py, or check for logaddexp when importing numpy > in Bio.MarkovModel. I think requiring Numpy >= 1.3 in > setup.py is better in the long run, so I would prefer that. > Any other opinions? The setup.py check sounds best. We should check Numpy >= 1.3 is will be available for Python 2.4 - this is relevant for Biopython 1.56 which will still support Python 2.4 The most recent NumPy for Windows installer for Python 2.4 was NumPy 1.2.1, but most Windows users able to install Biopython via our Windows installer would also be able to install a more recent Python and NumPy - so not a big issue. According to the old INSTALL.txt in Nump's github repository it says that for numpy 1.3.0 they still supported Python 2.4. http://github.com/numpy/numpy/blob/v1.3.0/INSTALL.txt If there is any doubt about getting NumPy 1.3.x on Python 2.4, we could postpone this change until after we do the Biopython 1.56 release (probably in November 2010) and drop support for Python 2.4. Peter From mjldehoon at yahoo.com Sat Oct 30 11:39:46 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 30 Oct 2010 04:39:46 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <643043.25292.qm@web62401.mail.re1.yahoo.com> Currently there is a pure-Python fall back for _logadd in MarkovModel.py. We could check if the numpy version is at least 1.3 in setup.py, show a warning if an older numpy is found, and use the fall back in MarkovModel.py if numpy does not contain logaddexp. Then if we remove cMarkovModelmodule.c, in the worst case (a Windows user with Python 2.4 who cannot update to a more recent Python) MarkovModel.py will be a bit slower, but no functionality is lost. --Michiel. --- On Sat, 10/30/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/cMarkovModelmodule.c > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Saturday, October 30, 2010, 6:47 AM > On Sat, Oct 30, 2010 at 5:13 AM, > Michiel de Hoon wrote: > > Hi everybody, > > > > I was looking at our C modules to see if they can be > made > > ready for Python 3. I noticed that > Bio/cMarkovModelmodule.c > > currently contains only one function, _logadd, which > is used > > to speed up Bio.MarkovModel. Numpy 1.3 and later > contain > > a function (logaddexp) that does exactly the same as > _logadd. > > Since Bio.MarkovModel itself already uses Numpy, I > think > > we can remove Bio/cMarkovModelmodule.c. > > Sounds good :) > > > For this to work, we either need to require Numpy > >= 1.3 > > in setup.py, or check for logaddexp when importing > numpy > > in Bio.MarkovModel. I think requiring Numpy >= 1.3 > in > > setup.py is better in the long run, so I would prefer > that. > > Any other opinions? > > The setup.py check sounds best. > > We should check Numpy >= 1.3 is will be available for > Python 2.4 - this is relevant for Biopython 1.56 which > will still support Python 2.4 > > The most recent NumPy for Windows installer > for Python 2.4 was NumPy 1.2.1, but most Windows > users able to install Biopython via our Windows > installer would also be able to install a more recent > Python and NumPy - so not a big issue. > > According to the old INSTALL.txt in Nump's github > repository it says that for numpy 1.3.0 they still > supported Python 2.4. > > http://github.com/numpy/numpy/blob/v1.3.0/INSTALL.txt > > If there is any doubt about getting NumPy 1.3.x on > Python 2.4, we could postpone this change until > after we do the Biopython 1.56 release (probably in > November 2010) and drop support for Python 2.4. > > Peter > From biopython at maubp.freeserve.co.uk Sat Oct 30 12:43:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Oct 2010 13:43:53 +0100 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <643043.25292.qm@web62401.mail.re1.yahoo.com> References: <643043.25292.qm@web62401.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 12:39 PM, Michiel de Hoon wrote: > Currently there is a pure-Python fall back for _logadd in > MarkovModel.py. We could check if the numpy version > is at least 1.3 in setup.py, show a warning if an older > numpy is found, and use the fall back in MarkovModel.py > if numpy does not contain logaddexp. Then if we remove > cMarkovModelmodule.c, in the worst case (a Windows > user with Python 2.4 who cannot update to a more recent > Python) MarkovModel.py will be a bit slower, but no > functionality is lost. > > --Michiel. That sounds OK to me :) Once we drop Python 2.4 maybe we can also list NumPy 1.3 as the minimum supported NumPy? Peter From bugzilla-daemon at portal.open-bio.org Sat Oct 30 13:07:36 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Oct 2010 09:07:36 -0400 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <201010301307.o9UD7aaP029881@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 Bug 2866 depends on bug 2870, which changed state. Bug 2870 Summary: Add BioSQL schema for SQLite http://bugzilla.open-bio.org/show_bug.cgi?id=2870 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Oct 30 14:23:31 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 30 Oct 2010 07:23:31 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <781588.85801.qm@web62407.mail.re1.yahoo.com> OK, done. In the end, I put the warning message in MarkovModel.py anyway, since it's very easy to miss if it's in setup.py. --Michiel. --- On Sat, 10/30/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/cMarkovModelmodule.c > To: "Michiel de Hoon" > Cc: biopython-dev at biopython.org > Date: Saturday, October 30, 2010, 8:43 AM > On Sat, Oct 30, 2010 at 12:39 PM, > Michiel de Hoon wrote: > > Currently there is a pure-Python fall back for _logadd > in > > MarkovModel.py. We could check if the numpy version > > is at least 1.3 in setup.py, show a warning if an > older > > numpy is found, and use the fall back in > MarkovModel.py > > if numpy does not contain logaddexp. Then if we > remove > > cMarkovModelmodule.c, in the worst case (a Windows > > user with Python 2.4 who cannot update to a more > recent > > Python) MarkovModel.py will be a bit slower, but no > > functionality is lost. > > > > --Michiel. > > That sounds OK to me :) > > Once we drop Python 2.4 maybe we can also list > NumPy 1.3 as the minimum supported NumPy? > > Peter >