From mdehoon at c2b2.columbia.edu Sat Jun 2 23:14:36 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 03 Jun 2007 12:14:36 +0900 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> Message-ID: <4662321C.2030002@c2b2.columbia.edu> Jake Feala wrote: > Here is an example that worked fine for me: > from Network import * > f = open() > parser = GRIDIterator(f): > net = create_network() > net.load(parser) > To be more consistent with recent parsers in Biopython, this would be more appropriate: >>> import Network >>> f = open() >>> net = Network.parse(f, format="GRID") Also, assuming that >>> net = create_network() creates a NetworkObject representing an empty network, you could instead use the initialization function of Network objects. As in >>> import Network >>> net = Network.NetworkObject() For the example above, you might then also consider >>> import Network >>> f = open() >>> net = Network.NetworkObject(f, format="GRID") instead of using "parse". I'm using "NetworkObject" here only as a placeholder to distinguish it from the Network module; there are probably better names. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 09:46:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 09:46:13 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041346.l54DkDUc003011@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-06-04 09:46 EST ------- Do we actually need the shebangs? Unless the Biopython scripts are intended to be run as a stand-alone program (and appear as such to the user), we may as well remove all the shebang lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 10:01:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:01:54 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041401.l54E1stN003819@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-04 10:01 EST ------- >From looking at the code, some of the files ARE intended to be run as a script - they do a "__main__" check to detect this and then do something based on the command line arguments. I personally have never used any of them in this way - but there may be some users out there who do. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 10:19:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:19:47 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041419.l54EJlAV004749@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-04 10:19 EST ------- > From looking at the code, some of the files ARE intended to be run as a script > - they do a "__main__" check to detect this and then do something based on the > command line arguments. If we remove the shebangs, users can still do python script.py if they want to run script.py as a script. With the shebangs, users can do ./script.py instead. But IMHO, for Biopython scripts the advantage is minimal, and won't work on Windows. Well let's wait a few days to see if somebody steps forward who really needs the shebangs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 10:56:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:56:39 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041456.l54Eudxx006919@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #4 from dalloliogm at gmail.com 2007-06-04 10:56 EST ------- Well, I think it's a good practice to add the shebang to the beginning of every script, I don't know why you want to remove it ;) In the future, if I'm going to be able to contribute to the biopython code with some script, I will use the #!/usr/bin/env form. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 12:04:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 12:04:56 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041604.l54G4uDE010091@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #5 from chris.lasher at gmail.com 2007-06-04 12:04 EST ------- FWIW, the shebang appears to convey the Python file is meant to be executed in a standalone fashion*, rather than be used as a module. Since Biopython consists largely of modules, I can agree with Michiel that, with exception of the test scripts, the shebangs ought to be removed. I need to check more of the files, but it seems modules which are written with an "if __name__ == '__main__'" check simply execute some test code, or even nothing, if true. This indicates they truly are meant to be modules rather than standalone scripts, making a case for removing their shebangs. That said, there's no real harm in having the shebangs in the files. I think consistency is more key. If we do allow shebangs, we ought to set a standard. * See alternative TinyURL: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jfeala at gmail.com Mon Jun 4 13:55:11 2007 From: jfeala at gmail.com (Jake Feala) Date: Mon, 4 Jun 2007 10:55:11 -0700 Subject: [Biopython-dev] interaction networks in biopython Message-ID: <12c863fe0706041055x5e362cfcm8fb8b55e399778fc@mail.gmail.com> Michiel - Thanks for the advice. For consistency with Biopython parsers, I did write an __init__.py for the module that contains a parse(f,format) function to behave like you suggest. I'll put the script up on the website (http://cmrg.ucsd.edu/JakeFeala#software) with the others. As for the create_network function, I think it is necessary to pass a "directed" argument in order to create a Network object with the right superclass. (sorry this was not obvious from the usage example I gave). My Network objects inherit either a directed or undirected graph class from the existing NetworkX package. The code looks like this: from networkx import XGraph,XDiGraph def create_network(directed=False): """Generates Network object derived from a directed or undirected NetworkX Graph class """ if not directed: GraphClass = XGraph else: GraphClass = XDiGraph class Network(GraphClass): """Biological network based on NetworkX XGraph class. This wrapper bundles biological annotations (from InteractionRecord) with the graph representation and offers compatibility with Biopython, SBML, and Cytoscape """ def __init__(self): """Initializes a new Network object.""" super(Network,self).__init__(selfloops=True) self.__interaction_recs = {} def ... I tried to think of a better way to do this but this was the easiest I could think of. What I could do is add to parse(f,format) the capability of choosing the type of GraphClass based on the input file format, but I am afraid of taking away the flexibility of choosing to treat directed links as undirected. Any ideas? -Jake On 6/2/07, Michiel de Hoon wrote: > Jake Feala wrote: > > Here is an example that worked fine for me: > > from Network import * > > f = open() > > parser = GRIDIterator(f): > > net = create_network() > > net.load(parser) > > > To be more consistent with recent parsers in Biopython, this would be > more appropriate: > > >>> import Network > >>> f = open() > >>> net = Network.parse(f, format="GRID") > > > Also, assuming that > > >>> net = create_network() > > creates a NetworkObject representing an empty network, you could instead > use the initialization function of Network objects. As in > > >>> import Network > >>> net = Network.NetworkObject() > > For the example above, you might then also consider > > >>> import Network > >>> f = open() > >>> net = Network.NetworkObject(f, format="GRID") > > instead of using "parse". > > I'm using "NetworkObject" here only as a placeholder to distinguish it > from the Network module; there are probably better names. > > > --Michiel. > From bugzilla-daemon at portal.open-bio.org Wed Jun 13 05:33:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 05:33:50 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706130933.l5D9XoSo003082@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 05:33 EST ------- Created an attachment (id=676) --> (http://bugzilla.open-bio.org/attachment.cgi?id=676&action=view) BLASTX 2.2.15 plain text output Example from Italo Maia (see mailing list) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 13 05:35:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 05:35:25 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706130935.l5D9ZPbj003179@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 05:35 EST ------- Created an attachment (id=677) --> (http://bugzilla.open-bio.org/attachment.cgi?id=677&action=view) BLASTX 2.2.15 plain text output with no hits Another example from Italo Maia via the mailing list -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 13 06:18:43 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 06:18:43 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706131018.l5DAIhkk005806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 06:18 EST ------- I've updated CVS with an improved version of my patch (attachment 520) which also copes with Italo Maia's two BLASTX 2.2.15 plain text output files. I'd welcome a few more (small) examples to add to Biopython as test cases (I'll ask Italo Maia on the mailing list if we can include his two files). Note that this only solves some of the issues with plain-text parsing. It still won't read recent multi-query plain text output (as the header is not repeated). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 18 10:57:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 18 Jun 2007 10:57:26 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200706181457.l5IEvQrX017868@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #20 from chris.lasher at gmail.com 2007-06-18 10:57 EST ------- Those patched versions of cluster.c and clustermodule.c work great, Michiel! Stick them in CVS and mark this bug successfully squashed! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Mon Jun 18 15:24:19 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 18 Jun 2007 15:24:19 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <465E906F.1080704@maubp.freeserve.co.uk> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> Message-ID: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> On 5/31/07, Peter wrote: > Chris Lasher wrote: > > I'm obviously missing another target, and BOSC 2007 is fast > > approaching. > > Are you going to BOSC 2007 Chris? I wish I were going to BOSC, but unfortunately, I will not go. I do plan on making it to SciPy for on the business of Software Carpentry, though. Are any Biopythonistas going to either? > > I'm being held up by 4 files that are in the CVS > > repository that were foolishly committed with carriage returns (i.e., > > "\r") in the filenames. How that's possible, I have no clue, but I > > need to alter the data in the CVS repository so those filenames are > > correct, or otherwise completely removed, over the entire history of > > those files. Does anyone have any experience with the internals of CVS > > repositories? I definitely do not. > > How strange! I have no experience with the internals of CVS so can't > help you there. What are the four offending files? Maybe we could just > purge them for the move to SVN. Renaming the files turns out to solve this problem. Yet again, I should have tried the simplest solution first. Thanks go to mhagger on #cvs2svn on freenode for helping with this. > Also, I suspect (but have not checked this) that a few of the examples > files in the unit tests have been checked in as binary files rather than > text (due to some odd differences in new lines across platforms). Again, > a CVS expert would probably be able to generate a list of all "binary" > files in the repository fairly easily. Good to know. Anyone have any experience here? Chris From bugzilla-daemon at portal.open-bio.org Tue Jun 19 09:58:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 Jun 2007 09:58:44 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200706191358.l5JDwitq023040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #21 from mdehoon at ims.u-tokyo.ac.jp 2007-06-19 09:58 EST ------- Committed to CVS; thanks for noticing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Jun 19 13:24:40 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Jun 2007 18:24:40 +0100 Subject: [Biopython-dev] Subversion Repository / BOSC 2007 In-Reply-To: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> Message-ID: <46781158.4090708@maubp.freeserve.co.uk> Chris Lasher wrote: > On 5/31/07, Peter wrote: >> Chris Lasher wrote: >>> I'm obviously missing another target, and BOSC 2007 is fast >>> approaching. >> Are you going to BOSC 2007 Chris? > > I wish I were going to BOSC, but unfortunately, I will not go. I do > plan on making it to SciPy for on the business of Software Carpentry, > though. Are any Biopythonistas going to either? Iddo and I will be there this year, although he is going to be spending most of his time at the AFP/Biosapiens SIG. I had better start thinking about the Biopython project talk... http://www.open-bio.org/wiki/BOSC_2007 Peter From sbassi at gmail.com Wed Jun 20 18:10:23 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 20 Jun 2007 19:10:23 -0300 Subject: [Biopython-dev] Code to submit: CRC64 Message-ID: Hello, I don't have write access to CVS so I post this code here. I included CRC64 checksum, it is used in several genomic databases. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 -------------- next part -------------- A non-text attachment was scrubbed... Name: utils.py Type: text/x-python Size: 4803 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20070620/235d454e/attachment-0001.py -------------- next part -------------- A non-text attachment was scrubbed... Name: utilsDIFF Type: application/octet-stream Size: 1412 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20070620/235d454e/attachment-0001.obj From biopython-dev at maubp.freeserve.co.uk Thu Jun 21 09:02:39 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jun 2007 14:02:39 +0100 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: References: Message-ID: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Sebastian Bassi wrote: > Hello, > > I don't have write access to CVS so I post this code here. > I included CRC64 checksum, it is used in several genomic databases. HI Sebastian, Please could you fill an enhancement bug, and attach the code to it - it makes keeping track of requests and patches much easier. Could you also give a couple of examples of how you might use this? In typical usage, does the case of the sequences matter? As it stands it would be up to the user to adjust the case before calling the CRC64 function. Looking at the code, it looks like it would fail when used on sequences (Seq objects) where the "letters" are non single characters (e.g. sequences using the three letter amino acid codes). This is probably not a big problem. Regarding the implementation, I'm not sure if using Bio/utils.py is the best place - anyone? You introduce a few "top level" variables: POLY64REVh = 0xd8000000L CRCTableh = [0] * 256 CRCTablel = [0] * 256 isInitialized = False I think it would be better if their names started with an underscore to mark them as "private" to the utils.py module. I would also use a CRC64 prefix to make it explicit what the are for given the utils.py file contains a range of different functions. In particular, the name isInitialized is too vague and uses mixed case. Maybe: _CRC64_POLY64REVh = 0xd8000000L _CRC64_tableh = [0] * 256 _CRC64_tablel = [0] * 256 _CRC64_initialized = False Peter P.S. You misspelt recipe in the comments. From sbassi at gmail.com Thu Jun 21 09:57:49 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 21 Jun 2007 10:57:49 -0300 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Message-ID: On 6/21/07, Peter wrote: > Please could you fill an enhancement bug, and attach the code to it - By attach do you mean to include it into the "description" field? Or is there an attach option in the bug report form that I am missing? > it makes keeping track of requests and patches much easier. > Could you also give a couple of examples of how you might use this? 1) Check if the data you have is the same as data in a public DB without downloading the whole sequences, just download the CRC info and calculate the CRC with your local sequences and compare them. There are chances by a random match but it's very low. 2) You have your own sequences and want to store them in fasta format and want to include CRC64 in the description, to retrieve it later to check for consistency. > In typical usage, does the case of the sequences matter? As it stands Case matters. AA is checksumed in uppercase and DNA in lowercase. I will see if I can force this for seq objects (and leave it alone if it is a plain string). > Looking at the code, it looks like it would fail when used on > sequences (Seq objects) where the "letters" are non single characters > (e.g. sequences using the three letter amino acid codes). This is > probably not a big problem. CRC is always calculated in one letter code. I will correct the other problems. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython-dev at maubp.freeserve.co.uk Thu Jun 21 11:37:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jun 2007 16:37:05 +0100 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Message-ID: <467A9B21.7030308@maubp.freeserve.co.uk> Sebastian Bassi wrote: > On 6/21/07, Peter wrote: >> Please could you fill an enhancement bug, and attach the code to it - > > By attach do you mean to include it into the "description" field? Or > is there an attach option in the bug report form that I am missing? You have to file the bug first, and then you can attach files afterwards. Its a bit odd - possibly a limitation in bugzilla, or just the way its setup here. > 1) Check if the data you have is the same as data in a public DB > without downloading the whole sequences, just download the CRC info > and calculate the CRC with your local sequences and compare them. > There are chances by a random match but it's very low. Could you give a couple of explicit examples (URLs), as I personally don't remember ever noticing CRC information. > 2) You have your own sequences and want to store them in fasta format > and want to include CRC64 in the description, to retrieve it later to > check for consistency. Nice example; this should work on any sequence format that can store annotations (provided the CRC64 is calculated purely from the sequence). >> In typical usage, does the case of the sequences matter? > > Case matters. AA is checksumed in uppercase and DNA in lowercase. I > will see if I can force this for seq objects (and leave it alone if it > is a plain string). Seq objects can have upper or lower case letters (or a mixture) regardless of the alphabet. Rather than writing some complicated code to convert amino acids into upper case and DNA into lowercase, maybe just state these conventions in the function's doc string at leave it up to the user. >> Looking at the code, it looks like it would fail when used on >> sequences (Seq objects) where the "letters" are not single characters >> (e.g. sequences using the three letter amino acid codes). This is >> probably not a big problem. > > CRC is always calculated in one letter code. Fine. I would state this explicitly in the function's doc string (the comment at the beginning which documents the arguments etc). Peter From sbassi at gmail.com Thu Jun 21 14:06:29 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 21 Jun 2007 15:06:29 -0300 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: <467A9B21.7030308@maubp.freeserve.co.uk> References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> <467A9B21.7030308@maubp.freeserve.co.uk> Message-ID: On 6/21/07, Peter wrote: > You have to file the bug first, and then you can attach files > afterwards. Its a bit odd - possibly a limitation in bugzilla, or just > the way its setup here. OK, I am working on it. > Could you give a couple of explicit examples (URLs), as I personally > don't remember ever noticing CRC information. I did notice because my first bioinformatics program that I ever used was DNAstar and it included checksum information. http://expasy.org/uniprot/P04293 (near the bottom, in Sequence information). http://bioinformatics.anl.gov/seguid/overview.aspx (seguid proposed as a stronger id that crc64). http://lists.open-bio.org/pipermail/bioruby-cvs/2007-February.txt On this page there are some formats using some kind of checksum (included crc64): http://www.ebi.ac.uk/help/formats_frame.html BTW, I could code also the GCGchecksum based on bioperl implementation. And from Swiss prot manual: "The SQ (SeQuence header) line marks the beginning of the sequence data and gives a quick summary of its content. The format of the SQ line is: SQ SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64; The line contains the length of the sequence in amino acids ('AA') followed by the molecular weight ('MW') rounded to the nearest mass unit (Dalton) and the sequence 64-bit CRC (Cyclic Redundancy Check) value ('CRC64')." > Seq objects can have upper or lower case letters (or a mixture) > regardless of the alphabet. Rather than writing some complicated code to > convert amino acids into upper case and DNA into lowercase, maybe just > state these conventions in the function's doc string at leave it up to > the user. I am doing both versions for you to choose. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Fri Jun 22 02:19:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 Jun 2007 02:19:13 -0400 Subject: [Biopython-dev] [Bug 2323] New: New functions: GCG Checksum and CRC64 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2323 Summary: New functions: GCG Checksum and CRC64 Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com Functions to calculate CRC64 and GCG Checksum. CRC64 is used by Uniprot and GCG Checksum is used by GCG software. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 22 02:21:16 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 Jun 2007 02:21:16 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706220621.l5M6LGmt022191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #1 from sbassi at gmail.com 2007-06-22 02:21 EST ------- Created an attachment (id=689) --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) Proposed functions (CRC64 and GCG checksum) This could be in utils.py, but I am not sure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jun 23 06:52:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 23 Jun 2007 06:52:11 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706231052.l5NAqBX9011089@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-06-23 06:52 EST ------- I looked at the Python files in the Python standard library. Some have the shebangs, some don't. The ones that do usually have the "if __name__ is '__main__'" check, but not always, and some files have the "if __name__ is '__main__'" check but no shebang. I guess there's no strong rule in Python for shebang lines, except that when they exist, they're always of the "/usr/env python" form. So I agree now with Chris' original proposal to change all shebangs to "/usr/env python" form. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 01:04:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 01:04:21 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706240504.l5O54LqE027096@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #2 from sbassi at gmail.com 2007-06-24 01:04 EST ------- I spotted a bug in my code. The code works under Python>=2.4. In Python 2.3, it returns syntax error. Should I make it Python 2.3 compatible? I guess so, therefore I will submit another version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 09:47:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:47:24 -0400 Subject: [Biopython-dev] [Bug 1982] Patch to BioSQL/Loader.py In-Reply-To: Message-ID: <200706241347.l5ODlO4q031404@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1982 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-24 09:47 EST ------- I have committed your patch with some small modifications to CVS. Could you check if everything still works for you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 09:50:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:50:48 -0400 Subject: [Biopython-dev] [Bug 2324] New: Data file for the Bio.UniGene unit test is missing. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2324 Summary: Data file for the Bio.UniGene unit test is missing. Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: critical Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp A missing file for the Bio.UniGene unit test causes the following error when running the Biopython test suite: ====================================================================== ERROR: test_UniGene ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 149, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest cur_test = __import__(self.test_name) File "/Users/mdehoon/biopython/Tests/test_UniGene.py", line 5, in handle = open("UniGene/Mdm_partial.data") IOError: [Errno 2] No such file or directory: 'UniGene/Mdm_partial.data' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 09:56:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:56:05 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706241356.l5ODu5ri031819@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-24 09:56 EST ------- Note that something similar already exists in Bio/crc.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 12:35:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 12:35:20 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706241635.l5OGZKtp006088@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #4 from sbassi at gmail.com 2007-06-24 12:35 EST ------- (In reply to comment #3) > Note that something similar already exists in Bio/crc.py. > oops, so I will drop CRC64 function. CGC checksum could be included in Bio/crc.py. I will make a new version of the file crc.py with cgc checksum included. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 17:47:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 17:47:05 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706242147.l5OLl5qq019715@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #689 is|0 |1 obsolete| | ------- Comment #5 from sbassi at gmail.com 2007-06-24 17:47 EST ------- Created an attachment (id=690) --> (http://bugzilla.open-bio.org/attachment.cgi?id=690&action=view) New version of GCG_checksum and SEGUID This version drops crc64 since it was already in Biopython, fixes GCG_checksum and adds a new checksum: SEGUID. This code could be in Bio/crc.py together with crc64. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 25 08:38:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 25 Jun 2007 08:38:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706251238.l5PCcW9g027177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-06-25 08:38 EST ------- Can you give a real-life example on how you would use this code? Maybe that would give us some hint on what the best place for this code is. I agree that GCG_checksum and SEGUID should be together with cdc64, but I am not sure if a separate Bio/crc.py module containing these three functions is ideal. It may also be a good idea to rename GCG_checksum to gcg for consistency with cdc64 and seguid. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jun 26 17:45:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 26 Jun 2007 17:45:36 -0400 Subject: [Biopython-dev] [Bug 2324] Data file for the Bio.UniGene unit test is missing. In-Reply-To: Message-ID: <200706262145.l5QLjar5013517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2324 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-26 17:45 EST ------- Checked in missing file, which I should have done 7 weeks ago when I added test_UniGene.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 10:13:41 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:13:41 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271413.l5REDffl020255@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #7 from sbassi at gmail.com 2007-06-27 10:13 EST ------- Response to comment #6: Problem 1: There is two FASTA files with several sequences each one and I want to check if there is a match between sequences from both files. IDs can't be used for comparison since data comes from different sources and there is no correlation between them. Sequences themselves must be compared. Solution: To avoid comparing whole sequences and make faster comparisons, I work with a small digest of each sequence. Seguid algorithm is based in SHA-1 , which is designed to have the following property: "it is computationally not feasible to find two different messages which produce the same message digest." (see http://bioinformatics.anl.gov/seguid/overview.aspx for more information on seguid) =========================================================== from Bio import SeqIO seq1=set() handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): seq1.add(seguid(record.seq)) handle.close() seq2=set() handle=open("pdbaa","r") for record in SeqIO.parse(handle,"fasta"): seq2.add(seguid(record.seq)) handle.close() shared_elements=seq1.intersection(seq2) handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): if seguid(record.seq) in shared_elements: print record.id handle.close() =========================================================== Output: P00700|LYSC_COLVI P02185|MYG_PHYCA P03521|NCAP_VSIVA P04050|RPB1_YEAST P05803|NRAM_IAWHM P0A5Y6|INHA_MYCTU P0A5Y7|INHA_MYCBO P0AA04|PTHP_ECOLI P0AE72|CHPR_ECOLI P0C0S5|H2AZ_HUMAN P14223|ALF_PLAFA P17313|VG31_BPT4 P17670|SODF_MYCTU P19821|DPO1_THEAQ P25786|PSA1_HUMAN P31939|PUR9_HUMAN P62314|SMD1_HUMAN P62826|RAN_HUMAN Q08129|CATA_MYCTU Q99497|PARK7_HUMAN =========================================================== Problem 2: I want to include GCG Checksum information in the description field. As an integrity check and for comparison against other GCG files. Solution: Read the input file, add the GCG checksum into the description and write the output file in FASTA format. =========================================================== from Bio import SeqIO seqs=[] handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): record.description=record.description+" "+str(gcg(record.seq)) seqs.append(record) handle.close() output_handle = open("uniprotGCG.fas", "w") SeqIO.write(seqs, output_handle, "fasta") output_handle.close() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 10:20:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:20:17 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271420.l5REKHUo020860@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #690 is|0 |1 obsolete| | ------- Comment #8 from sbassi at gmail.com 2007-06-27 10:20 EST ------- Created an attachment (id=691) --> (http://bugzilla.open-bio.org/attachment.cgi?id=691&action=view) gcg and seguid functions Modified version of gcg and seguid. It works for both old and new (>=2.4) python versions and for both str or Seq argument. gcg function requires importing gcg24.py in Python 2.4+. This way I avoid to use "eval" to make it faster. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 10:21:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:21:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271421.l5RELUb7020932@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #9 from sbassi at gmail.com 2007-06-27 10:21 EST ------- Created an attachment (id=692) --> (http://bugzilla.open-bio.org/attachment.cgi?id=692&action=view) gcg function for >=Python2.4. It should be used from crc.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 10:52:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:52:40 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271452.l5REqesw023253@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-27 10:52 EST ------- I liked the example in comment 7. And yes, we do still try and support "older" versions of Python - our download pages states python 2.3 or later (which I myself use on Windows). I don't know why you attached the pyc file, these are normally generated automatically by python from python scripts (and are platform specific): http://bugzilla.open-bio.org/attachment.cgi?id=692 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 11:44:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 11:44:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271544.l5RFiUa4028039@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #692 is|0 |1 obsolete| | ------- Comment #11 from sbassi at gmail.com 2007-06-27 11:44 EST ------- Created an attachment (id=693) --> (http://bugzilla.open-bio.org/attachment.cgi?id=693&action=view) gcg function for >=Python2.4. It should be used from crc.py Replaces .pyc file I uploaded by mistake. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 14:48:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 14:48:01 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271848.l5RIm1Wn007412@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #691 is|0 |1 obsolete| | ------- Comment #12 from sbassi at gmail.com 2007-06-27 14:48 EST ------- Created an attachment (id=694) --> (http://bugzilla.open-bio.org/attachment.cgi?id=694&action=view) gcg and seguid functions This version removes some debugging statements. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 29 12:27:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 Jun 2007 12:27:07 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706291627.l5TGR7PG002534@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-06-29 12:27 EST ------- I'm thinking that Bio/SeqUtils might be a good place for this code, together with the code that is now in Bio/crc.py. Then we would have a file CheckSum.py in Bio/SeqUtils containing gcg, seguid, and crc64. Any objections? I'd prefer Bio/SeqUtils over Bio/utils.py. The latter largely overlaps what is already in Bio/Seq.py and Bio/SeqUtils and might need some cleanup. We should also add your example from #7 to the manual. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 29 12:57:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 Jun 2007 12:57:28 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706291657.l5TGvSIo004048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #14 from sbassi at gmail.com 2007-06-29 12:57 EST ------- (In reply to comment #13) > I'm thinking that Bio/SeqUtils might be a good place for this code, together > with the code that is now in Bio/crc.py. Then we would have a file CheckSum.py > in Bio/SeqUtils containing gcg, seguid, and crc64. Any objections? I'd prefer It's OK for me. > We should also add your example from #7 to the manual. I could add it to the wiki, but after the code is in its place in CVS so the sample would refer to the proper module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sat Jun 30 02:48:42 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 30 Jun 2007 15:48:42 +0900 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py Message-ID: <4685FCCA.4090904@c2b2.columbia.edu> Hi everybody, Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py? They are currently using the old Fasta writer in Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can either update them to use the new Fasta writer, or simply remove them, since currently these classes are not used anywhere in Biopython. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jun 30 15:14:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Jun 2007 21:14:44 +0200 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py In-Reply-To: <4685FCCA.4090904@c2b2.columbia.edu> References: <4685FCCA.4090904@c2b2.columbia.edu> Message-ID: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> > Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in > Bio/GFF/easy.py? They are currently using the old Fasta writer in > Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can > either update them to use the new Fasta writer, or simply remove them, > since currently these classes are not used anywhere in Biopython. This is for Bug 2284 right? http://bugzilla.open-bio.org/show_bug.cgi?id=2284 I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle Peter From jacobporter2002 at yahoo.com Sat Jun 30 16:25:57 2007 From: jacobporter2002 at yahoo.com (Jacob Porter) Date: Sat, 30 Jun 2007 13:25:57 -0700 (PDT) Subject: [Biopython-dev] I'd like to help out Message-ID: <296434.41950.qm@web33706.mail.mud.yahoo.com> Hello, I am an undergraduate at UC Berkeley in Applied Mathematics with computational biology going to graduate school at UC Davis in Applied Mathematics, and I would like to help out with the Biopython project. I've taken a compiler class with Python, and I'm pretty familiar with the core of Python. I work creating scripts in Python. I've done a little bit of work in phylogenomics, and that information can be found at this website: http://www.math.tamu.edu/~lgp/small-trees/ . I wrote Maple code to perform many of the computations on that website. I need to keep track of the number of hours that I work, and I want a signed document stating how many hours that I work after I am done. Perhaps I can start out by creating RPMs for Suse Linux. Other than that, I would like to work on projects for phylogenomics. I can also do other related things. If anyone has projects, please let me know. If anyone has information about RPMs and BioPython, or if you are responsible for that, please let me know. ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From bugzilla-daemon at portal.open-bio.org Sat Jun 30 22:55:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Jun 2007 22:55:31 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010255.l612tVwN022655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #15 from mdehoon at ims.u-tokyo.ac.jp 2007-06-30 22:55 EST ------- I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jun 30 23:23:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Jun 2007 23:23:02 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010323.l613N24V023919@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #16 from sbassi at gmail.com 2007-06-30 23:23 EST ------- (In reply to comment #15) > I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py. > This code won't run on Python 2.3: ============================================= sbassi at hp:~/bioinfo$ python Python 2.3.4 (#2, Jun 16 2005, 18:52:31) [GCC 3.3.5 (Debian 1:3.3.5-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import CheckSum Traceback (most recent call last): File "", line 1, in ? File "CheckSum.py", line 50 return sum(n*ord(c.upper()) for (n,c) in izip(cycle(range(1,58)),seq)) % 10000 ^ SyntaxError: invalid syntax ========================================== That is why I made a separate module for Python 2.4+ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Jun 3 03:14:36 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 03 Jun 2007 12:14:36 +0900 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> Message-ID: <4662321C.2030002@c2b2.columbia.edu> Jake Feala wrote: > Here is an example that worked fine for me: > from Network import * > f = open() > parser = GRIDIterator(f): > net = create_network() > net.load(parser) > To be more consistent with recent parsers in Biopython, this would be more appropriate: >>> import Network >>> f = open() >>> net = Network.parse(f, format="GRID") Also, assuming that >>> net = create_network() creates a NetworkObject representing an empty network, you could instead use the initialization function of Network objects. As in >>> import Network >>> net = Network.NetworkObject() For the example above, you might then also consider >>> import Network >>> f = open() >>> net = Network.NetworkObject(f, format="GRID") instead of using "parse". I'm using "NetworkObject" here only as a placeholder to distinguish it from the Network module; there are probably better names. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 13:46:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 09:46:13 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041346.l54DkDUc003011@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-06-04 09:46 EST ------- Do we actually need the shebangs? Unless the Biopython scripts are intended to be run as a stand-alone program (and appear as such to the user), we may as well remove all the shebang lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 14:01:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:01:54 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041401.l54E1stN003819@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-04 10:01 EST ------- >From looking at the code, some of the files ARE intended to be run as a script - they do a "__main__" check to detect this and then do something based on the command line arguments. I personally have never used any of them in this way - but there may be some users out there who do. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 14:19:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:19:47 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041419.l54EJlAV004749@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-04 10:19 EST ------- > From looking at the code, some of the files ARE intended to be run as a script > - they do a "__main__" check to detect this and then do something based on the > command line arguments. If we remove the shebangs, users can still do python script.py if they want to run script.py as a script. With the shebangs, users can do ./script.py instead. But IMHO, for Biopython scripts the advantage is minimal, and won't work on Windows. Well let's wait a few days to see if somebody steps forward who really needs the shebangs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 14:56:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 10:56:39 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041456.l54Eudxx006919@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #4 from dalloliogm at gmail.com 2007-06-04 10:56 EST ------- Well, I think it's a good practice to add the shebang to the beginning of every script, I don't know why you want to remove it ;) In the future, if I'm going to be able to contribute to the biopython code with some script, I will use the #!/usr/bin/env form. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 4 16:04:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 4 Jun 2007 12:04:56 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706041604.l54G4uDE010091@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #5 from chris.lasher at gmail.com 2007-06-04 12:04 EST ------- FWIW, the shebang appears to convey the Python file is meant to be executed in a standalone fashion*, rather than be used as a module. Since Biopython consists largely of modules, I can agree with Michiel that, with exception of the test scripts, the shebangs ought to be removed. I need to check more of the files, but it seems modules which are written with an "if __name__ == '__main__'" check simply execute some test code, or even nothing, if true. This indicates they truly are meant to be modules rather than standalone scripts, making a case for removing their shebangs. That said, there's no real harm in having the shebangs in the files. I think consistency is more key. If we do allow shebangs, we ought to set a standard. * See alternative TinyURL: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jfeala at gmail.com Mon Jun 4 17:55:11 2007 From: jfeala at gmail.com (Jake Feala) Date: Mon, 4 Jun 2007 10:55:11 -0700 Subject: [Biopython-dev] interaction networks in biopython Message-ID: <12c863fe0706041055x5e362cfcm8fb8b55e399778fc@mail.gmail.com> Michiel - Thanks for the advice. For consistency with Biopython parsers, I did write an __init__.py for the module that contains a parse(f,format) function to behave like you suggest. I'll put the script up on the website (http://cmrg.ucsd.edu/JakeFeala#software) with the others. As for the create_network function, I think it is necessary to pass a "directed" argument in order to create a Network object with the right superclass. (sorry this was not obvious from the usage example I gave). My Network objects inherit either a directed or undirected graph class from the existing NetworkX package. The code looks like this: from networkx import XGraph,XDiGraph def create_network(directed=False): """Generates Network object derived from a directed or undirected NetworkX Graph class """ if not directed: GraphClass = XGraph else: GraphClass = XDiGraph class Network(GraphClass): """Biological network based on NetworkX XGraph class. This wrapper bundles biological annotations (from InteractionRecord) with the graph representation and offers compatibility with Biopython, SBML, and Cytoscape """ def __init__(self): """Initializes a new Network object.""" super(Network,self).__init__(selfloops=True) self.__interaction_recs = {} def ... I tried to think of a better way to do this but this was the easiest I could think of. What I could do is add to parse(f,format) the capability of choosing the type of GraphClass based on the input file format, but I am afraid of taking away the flexibility of choosing to treat directed links as undirected. Any ideas? -Jake On 6/2/07, Michiel de Hoon wrote: > Jake Feala wrote: > > Here is an example that worked fine for me: > > from Network import * > > f = open() > > parser = GRIDIterator(f): > > net = create_network() > > net.load(parser) > > > To be more consistent with recent parsers in Biopython, this would be > more appropriate: > > >>> import Network > >>> f = open() > >>> net = Network.parse(f, format="GRID") > > > Also, assuming that > > >>> net = create_network() > > creates a NetworkObject representing an empty network, you could instead > use the initialization function of Network objects. As in > > >>> import Network > >>> net = Network.NetworkObject() > > For the example above, you might then also consider > > >>> import Network > >>> f = open() > >>> net = Network.NetworkObject(f, format="GRID") > > instead of using "parse". > > I'm using "NetworkObject" here only as a placeholder to distinguish it > from the Network module; there are probably better names. > > > --Michiel. > From bugzilla-daemon at portal.open-bio.org Wed Jun 13 09:33:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 05:33:50 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706130933.l5D9XoSo003082@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 05:33 EST ------- Created an attachment (id=676) --> (http://bugzilla.open-bio.org/attachment.cgi?id=676&action=view) BLASTX 2.2.15 plain text output Example from Italo Maia (see mailing list) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 13 09:35:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 05:35:25 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706130935.l5D9ZPbj003179@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 05:35 EST ------- Created an attachment (id=677) --> (http://bugzilla.open-bio.org/attachment.cgi?id=677&action=view) BLASTX 2.2.15 plain text output with no hits Another example from Italo Maia via the mailing list -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 13 10:18:43 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 13 Jun 2007 06:18:43 -0400 Subject: [Biopython-dev] [Bug 2090] Blast.NCBIStandalone BlastParser fails with blastall 2.2.14 In-Reply-To: Message-ID: <200706131018.l5DAIhkk005806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2090 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-13 06:18 EST ------- I've updated CVS with an improved version of my patch (attachment 520) which also copes with Italo Maia's two BLASTX 2.2.15 plain text output files. I'd welcome a few more (small) examples to add to Biopython as test cases (I'll ask Italo Maia on the mailing list if we can include his two files). Note that this only solves some of the issues with plain-text parsing. It still won't read recent multi-query plain text output (as the header is not repeated). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 18 14:57:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 18 Jun 2007 10:57:26 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200706181457.l5IEvQrX017868@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #20 from chris.lasher at gmail.com 2007-06-18 10:57 EST ------- Those patched versions of cluster.c and clustermodule.c work great, Michiel! Stick them in CVS and mark this bug successfully squashed! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Mon Jun 18 19:24:19 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 18 Jun 2007 15:24:19 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <465E906F.1080704@maubp.freeserve.co.uk> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> Message-ID: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> On 5/31/07, Peter wrote: > Chris Lasher wrote: > > I'm obviously missing another target, and BOSC 2007 is fast > > approaching. > > Are you going to BOSC 2007 Chris? I wish I were going to BOSC, but unfortunately, I will not go. I do plan on making it to SciPy for on the business of Software Carpentry, though. Are any Biopythonistas going to either? > > I'm being held up by 4 files that are in the CVS > > repository that were foolishly committed with carriage returns (i.e., > > "\r") in the filenames. How that's possible, I have no clue, but I > > need to alter the data in the CVS repository so those filenames are > > correct, or otherwise completely removed, over the entire history of > > those files. Does anyone have any experience with the internals of CVS > > repositories? I definitely do not. > > How strange! I have no experience with the internals of CVS so can't > help you there. What are the four offending files? Maybe we could just > purge them for the move to SVN. Renaming the files turns out to solve this problem. Yet again, I should have tried the simplest solution first. Thanks go to mhagger on #cvs2svn on freenode for helping with this. > Also, I suspect (but have not checked this) that a few of the examples > files in the unit tests have been checked in as binary files rather than > text (due to some odd differences in new lines across platforms). Again, > a CVS expert would probably be able to generate a list of all "binary" > files in the repository fairly easily. Good to know. Anyone have any experience here? Chris From bugzilla-daemon at portal.open-bio.org Tue Jun 19 13:58:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 19 Jun 2007 09:58:44 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200706191358.l5JDwitq023040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #21 from mdehoon at ims.u-tokyo.ac.jp 2007-06-19 09:58 EST ------- Committed to CVS; thanks for noticing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Jun 19 17:24:40 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 19 Jun 2007 18:24:40 +0100 Subject: [Biopython-dev] Subversion Repository / BOSC 2007 In-Reply-To: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> Message-ID: <46781158.4090708@maubp.freeserve.co.uk> Chris Lasher wrote: > On 5/31/07, Peter wrote: >> Chris Lasher wrote: >>> I'm obviously missing another target, and BOSC 2007 is fast >>> approaching. >> Are you going to BOSC 2007 Chris? > > I wish I were going to BOSC, but unfortunately, I will not go. I do > plan on making it to SciPy for on the business of Software Carpentry, > though. Are any Biopythonistas going to either? Iddo and I will be there this year, although he is going to be spending most of his time at the AFP/Biosapiens SIG. I had better start thinking about the Biopython project talk... http://www.open-bio.org/wiki/BOSC_2007 Peter From sbassi at gmail.com Wed Jun 20 22:10:23 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 20 Jun 2007 19:10:23 -0300 Subject: [Biopython-dev] Code to submit: CRC64 Message-ID: Hello, I don't have write access to CVS so I post this code here. I included CRC64 checksum, it is used in several genomic databases. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 -------------- next part -------------- A non-text attachment was scrubbed... Name: utils.py Type: text/x-python Size: 4803 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: utilsDIFF Type: application/octet-stream Size: 1412 bytes Desc: not available URL: From biopython-dev at maubp.freeserve.co.uk Thu Jun 21 13:02:39 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jun 2007 14:02:39 +0100 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: References: Message-ID: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Sebastian Bassi wrote: > Hello, > > I don't have write access to CVS so I post this code here. > I included CRC64 checksum, it is used in several genomic databases. HI Sebastian, Please could you fill an enhancement bug, and attach the code to it - it makes keeping track of requests and patches much easier. Could you also give a couple of examples of how you might use this? In typical usage, does the case of the sequences matter? As it stands it would be up to the user to adjust the case before calling the CRC64 function. Looking at the code, it looks like it would fail when used on sequences (Seq objects) where the "letters" are non single characters (e.g. sequences using the three letter amino acid codes). This is probably not a big problem. Regarding the implementation, I'm not sure if using Bio/utils.py is the best place - anyone? You introduce a few "top level" variables: POLY64REVh = 0xd8000000L CRCTableh = [0] * 256 CRCTablel = [0] * 256 isInitialized = False I think it would be better if their names started with an underscore to mark them as "private" to the utils.py module. I would also use a CRC64 prefix to make it explicit what the are for given the utils.py file contains a range of different functions. In particular, the name isInitialized is too vague and uses mixed case. Maybe: _CRC64_POLY64REVh = 0xd8000000L _CRC64_tableh = [0] * 256 _CRC64_tablel = [0] * 256 _CRC64_initialized = False Peter P.S. You misspelt recipe in the comments. From sbassi at gmail.com Thu Jun 21 13:57:49 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 21 Jun 2007 10:57:49 -0300 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Message-ID: On 6/21/07, Peter wrote: > Please could you fill an enhancement bug, and attach the code to it - By attach do you mean to include it into the "description" field? Or is there an attach option in the bug report form that I am missing? > it makes keeping track of requests and patches much easier. > Could you also give a couple of examples of how you might use this? 1) Check if the data you have is the same as data in a public DB without downloading the whole sequences, just download the CRC info and calculate the CRC with your local sequences and compare them. There are chances by a random match but it's very low. 2) You have your own sequences and want to store them in fasta format and want to include CRC64 in the description, to retrieve it later to check for consistency. > In typical usage, does the case of the sequences matter? As it stands Case matters. AA is checksumed in uppercase and DNA in lowercase. I will see if I can force this for seq objects (and leave it alone if it is a plain string). > Looking at the code, it looks like it would fail when used on > sequences (Seq objects) where the "letters" are non single characters > (e.g. sequences using the three letter amino acid codes). This is > probably not a big problem. CRC is always calculated in one letter code. I will correct the other problems. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython-dev at maubp.freeserve.co.uk Thu Jun 21 15:37:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 21 Jun 2007 16:37:05 +0100 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> Message-ID: <467A9B21.7030308@maubp.freeserve.co.uk> Sebastian Bassi wrote: > On 6/21/07, Peter wrote: >> Please could you fill an enhancement bug, and attach the code to it - > > By attach do you mean to include it into the "description" field? Or > is there an attach option in the bug report form that I am missing? You have to file the bug first, and then you can attach files afterwards. Its a bit odd - possibly a limitation in bugzilla, or just the way its setup here. > 1) Check if the data you have is the same as data in a public DB > without downloading the whole sequences, just download the CRC info > and calculate the CRC with your local sequences and compare them. > There are chances by a random match but it's very low. Could you give a couple of explicit examples (URLs), as I personally don't remember ever noticing CRC information. > 2) You have your own sequences and want to store them in fasta format > and want to include CRC64 in the description, to retrieve it later to > check for consistency. Nice example; this should work on any sequence format that can store annotations (provided the CRC64 is calculated purely from the sequence). >> In typical usage, does the case of the sequences matter? > > Case matters. AA is checksumed in uppercase and DNA in lowercase. I > will see if I can force this for seq objects (and leave it alone if it > is a plain string). Seq objects can have upper or lower case letters (or a mixture) regardless of the alphabet. Rather than writing some complicated code to convert amino acids into upper case and DNA into lowercase, maybe just state these conventions in the function's doc string at leave it up to the user. >> Looking at the code, it looks like it would fail when used on >> sequences (Seq objects) where the "letters" are not single characters >> (e.g. sequences using the three letter amino acid codes). This is >> probably not a big problem. > > CRC is always calculated in one letter code. Fine. I would state this explicitly in the function's doc string (the comment at the beginning which documents the arguments etc). Peter From sbassi at gmail.com Thu Jun 21 18:06:29 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 21 Jun 2007 15:06:29 -0300 Subject: [Biopython-dev] Code to submit: CRC64 In-Reply-To: <467A9B21.7030308@maubp.freeserve.co.uk> References: <320fb6e00706210602m103e9d6ehe3cd90f42f8d2c75@mail.gmail.com> <467A9B21.7030308@maubp.freeserve.co.uk> Message-ID: On 6/21/07, Peter wrote: > You have to file the bug first, and then you can attach files > afterwards. Its a bit odd - possibly a limitation in bugzilla, or just > the way its setup here. OK, I am working on it. > Could you give a couple of explicit examples (URLs), as I personally > don't remember ever noticing CRC information. I did notice because my first bioinformatics program that I ever used was DNAstar and it included checksum information. http://expasy.org/uniprot/P04293 (near the bottom, in Sequence information). http://bioinformatics.anl.gov/seguid/overview.aspx (seguid proposed as a stronger id that crc64). http://lists.open-bio.org/pipermail/bioruby-cvs/2007-February.txt On this page there are some formats using some kind of checksum (included crc64): http://www.ebi.ac.uk/help/formats_frame.html BTW, I could code also the GCGchecksum based on bioperl implementation. And from Swiss prot manual: "The SQ (SeQuence header) line marks the beginning of the sequence data and gives a quick summary of its content. The format of the SQ line is: SQ SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64; The line contains the length of the sequence in amino acids ('AA') followed by the molecular weight ('MW') rounded to the nearest mass unit (Dalton) and the sequence 64-bit CRC (Cyclic Redundancy Check) value ('CRC64')." > Seq objects can have upper or lower case letters (or a mixture) > regardless of the alphabet. Rather than writing some complicated code to > convert amino acids into upper case and DNA into lowercase, maybe just > state these conventions in the function's doc string at leave it up to > the user. I am doing both versions for you to choose. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Fri Jun 22 06:19:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 Jun 2007 02:19:13 -0400 Subject: [Biopython-dev] [Bug 2323] New: New functions: GCG Checksum and CRC64 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2323 Summary: New functions: GCG Checksum and CRC64 Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com Functions to calculate CRC64 and GCG Checksum. CRC64 is used by Uniprot and GCG Checksum is used by GCG software. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 22 06:21:16 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 22 Jun 2007 02:21:16 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706220621.l5M6LGmt022191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #1 from sbassi at gmail.com 2007-06-22 02:21 EST ------- Created an attachment (id=689) --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) Proposed functions (CRC64 and GCG checksum) This could be in utils.py, but I am not sure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jun 23 10:52:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 23 Jun 2007 06:52:11 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200706231052.l5NAqBX9011089@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-06-23 06:52 EST ------- I looked at the Python files in the Python standard library. Some have the shebangs, some don't. The ones that do usually have the "if __name__ is '__main__'" check, but not always, and some files have the "if __name__ is '__main__'" check but no shebang. I guess there's no strong rule in Python for shebang lines, except that when they exist, they're always of the "/usr/env python" form. So I agree now with Chris' original proposal to change all shebangs to "/usr/env python" form. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 05:04:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 01:04:21 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706240504.l5O54LqE027096@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #2 from sbassi at gmail.com 2007-06-24 01:04 EST ------- I spotted a bug in my code. The code works under Python>=2.4. In Python 2.3, it returns syntax error. Should I make it Python 2.3 compatible? I guess so, therefore I will submit another version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 13:47:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:47:24 -0400 Subject: [Biopython-dev] [Bug 1982] Patch to BioSQL/Loader.py In-Reply-To: Message-ID: <200706241347.l5ODlO4q031404@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1982 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-24 09:47 EST ------- I have committed your patch with some small modifications to CVS. Could you check if everything still works for you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 13:50:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:50:48 -0400 Subject: [Biopython-dev] [Bug 2324] New: Data file for the Bio.UniGene unit test is missing. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2324 Summary: Data file for the Bio.UniGene unit test is missing. Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: critical Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp A missing file for the Bio.UniGene unit test causes the following error when running the Biopython test suite: ====================================================================== ERROR: test_UniGene ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 149, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest cur_test = __import__(self.test_name) File "/Users/mdehoon/biopython/Tests/test_UniGene.py", line 5, in handle = open("UniGene/Mdm_partial.data") IOError: [Errno 2] No such file or directory: 'UniGene/Mdm_partial.data' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 13:56:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 09:56:05 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706241356.l5ODu5ri031819@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-06-24 09:56 EST ------- Note that something similar already exists in Bio/crc.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 16:35:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 12:35:20 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706241635.l5OGZKtp006088@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #4 from sbassi at gmail.com 2007-06-24 12:35 EST ------- (In reply to comment #3) > Note that something similar already exists in Bio/crc.py. > oops, so I will drop CRC64 function. CGC checksum could be included in Bio/crc.py. I will make a new version of the file crc.py with cgc checksum included. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jun 24 21:47:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 24 Jun 2007 17:47:05 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706242147.l5OLl5qq019715@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #689 is|0 |1 obsolete| | ------- Comment #5 from sbassi at gmail.com 2007-06-24 17:47 EST ------- Created an attachment (id=690) --> (http://bugzilla.open-bio.org/attachment.cgi?id=690&action=view) New version of GCG_checksum and SEGUID This version drops crc64 since it was already in Biopython, fixes GCG_checksum and adds a new checksum: SEGUID. This code could be in Bio/crc.py together with crc64. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jun 25 12:38:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 25 Jun 2007 08:38:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706251238.l5PCcW9g027177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-06-25 08:38 EST ------- Can you give a real-life example on how you would use this code? Maybe that would give us some hint on what the best place for this code is. I agree that GCG_checksum and SEGUID should be together with cdc64, but I am not sure if a separate Bio/crc.py module containing these three functions is ideal. It may also be a good idea to rename GCG_checksum to gcg for consistency with cdc64 and seguid. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jun 26 21:45:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 26 Jun 2007 17:45:36 -0400 Subject: [Biopython-dev] [Bug 2324] Data file for the Bio.UniGene unit test is missing. In-Reply-To: Message-ID: <200706262145.l5QLjar5013517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2324 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-26 17:45 EST ------- Checked in missing file, which I should have done 7 weeks ago when I added test_UniGene.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 14:13:41 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:13:41 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271413.l5REDffl020255@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #7 from sbassi at gmail.com 2007-06-27 10:13 EST ------- Response to comment #6: Problem 1: There is two FASTA files with several sequences each one and I want to check if there is a match between sequences from both files. IDs can't be used for comparison since data comes from different sources and there is no correlation between them. Sequences themselves must be compared. Solution: To avoid comparing whole sequences and make faster comparisons, I work with a small digest of each sequence. Seguid algorithm is based in SHA-1 , which is designed to have the following property: "it is computationally not feasible to find two different messages which produce the same message digest." (see http://bioinformatics.anl.gov/seguid/overview.aspx for more information on seguid) =========================================================== from Bio import SeqIO seq1=set() handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): seq1.add(seguid(record.seq)) handle.close() seq2=set() handle=open("pdbaa","r") for record in SeqIO.parse(handle,"fasta"): seq2.add(seguid(record.seq)) handle.close() shared_elements=seq1.intersection(seq2) handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): if seguid(record.seq) in shared_elements: print record.id handle.close() =========================================================== Output: P00700|LYSC_COLVI P02185|MYG_PHYCA P03521|NCAP_VSIVA P04050|RPB1_YEAST P05803|NRAM_IAWHM P0A5Y6|INHA_MYCTU P0A5Y7|INHA_MYCBO P0AA04|PTHP_ECOLI P0AE72|CHPR_ECOLI P0C0S5|H2AZ_HUMAN P14223|ALF_PLAFA P17313|VG31_BPT4 P17670|SODF_MYCTU P19821|DPO1_THEAQ P25786|PSA1_HUMAN P31939|PUR9_HUMAN P62314|SMD1_HUMAN P62826|RAN_HUMAN Q08129|CATA_MYCTU Q99497|PARK7_HUMAN =========================================================== Problem 2: I want to include GCG Checksum information in the description field. As an integrity check and for comparison against other GCG files. Solution: Read the input file, add the GCG checksum into the description and write the output file in FASTA format. =========================================================== from Bio import SeqIO seqs=[] handle=open("14gustavoUniprot.fas","r") for record in SeqIO.parse(handle,"fasta"): record.description=record.description+" "+str(gcg(record.seq)) seqs.append(record) handle.close() output_handle = open("uniprotGCG.fas", "w") SeqIO.write(seqs, output_handle, "fasta") output_handle.close() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 14:20:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:20:17 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271420.l5REKHUo020860@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #690 is|0 |1 obsolete| | ------- Comment #8 from sbassi at gmail.com 2007-06-27 10:20 EST ------- Created an attachment (id=691) --> (http://bugzilla.open-bio.org/attachment.cgi?id=691&action=view) gcg and seguid functions Modified version of gcg and seguid. It works for both old and new (>=2.4) python versions and for both str or Seq argument. gcg function requires importing gcg24.py in Python 2.4+. This way I avoid to use "eval" to make it faster. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 14:21:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:21:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271421.l5RELUb7020932@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #9 from sbassi at gmail.com 2007-06-27 10:21 EST ------- Created an attachment (id=692) --> (http://bugzilla.open-bio.org/attachment.cgi?id=692&action=view) gcg function for >=Python2.4. It should be used from crc.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 14:52:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 10:52:40 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271452.l5REqesw023253@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-06-27 10:52 EST ------- I liked the example in comment 7. And yes, we do still try and support "older" versions of Python - our download pages states python 2.3 or later (which I myself use on Windows). I don't know why you attached the pyc file, these are normally generated automatically by python from python scripts (and are platform specific): http://bugzilla.open-bio.org/attachment.cgi?id=692 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 15:44:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 11:44:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271544.l5RFiUa4028039@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #692 is|0 |1 obsolete| | ------- Comment #11 from sbassi at gmail.com 2007-06-27 11:44 EST ------- Created an attachment (id=693) --> (http://bugzilla.open-bio.org/attachment.cgi?id=693&action=view) gcg function for >=Python2.4. It should be used from crc.py Replaces .pyc file I uploaded by mistake. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jun 27 18:48:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 27 Jun 2007 14:48:01 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706271848.l5RIm1Wn007412@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 sbassi at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #691 is|0 |1 obsolete| | ------- Comment #12 from sbassi at gmail.com 2007-06-27 14:48 EST ------- Created an attachment (id=694) --> (http://bugzilla.open-bio.org/attachment.cgi?id=694&action=view) gcg and seguid functions This version removes some debugging statements. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 29 16:27:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 Jun 2007 12:27:07 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706291627.l5TGR7PG002534@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-06-29 12:27 EST ------- I'm thinking that Bio/SeqUtils might be a good place for this code, together with the code that is now in Bio/crc.py. Then we would have a file CheckSum.py in Bio/SeqUtils containing gcg, seguid, and crc64. Any objections? I'd prefer Bio/SeqUtils over Bio/utils.py. The latter largely overlaps what is already in Bio/Seq.py and Bio/SeqUtils and might need some cleanup. We should also add your example from #7 to the manual. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jun 29 16:57:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 29 Jun 2007 12:57:28 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200706291657.l5TGvSIo004048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #14 from sbassi at gmail.com 2007-06-29 12:57 EST ------- (In reply to comment #13) > I'm thinking that Bio/SeqUtils might be a good place for this code, together > with the code that is now in Bio/crc.py. Then we would have a file CheckSum.py > in Bio/SeqUtils containing gcg, seguid, and crc64. Any objections? I'd prefer It's OK for me. > We should also add your example from #7 to the manual. I could add it to the wiki, but after the code is in its place in CVS so the sample would refer to the proper module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sat Jun 30 06:48:42 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 30 Jun 2007 15:48:42 +0900 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py Message-ID: <4685FCCA.4090904@c2b2.columbia.edu> Hi everybody, Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py? They are currently using the old Fasta writer in Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can either update them to use the new Fasta writer, or simply remove them, since currently these classes are not used anywhere in Biopython. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jun 30 19:14:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 30 Jun 2007 21:14:44 +0200 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py In-Reply-To: <4685FCCA.4090904@c2b2.columbia.edu> References: <4685FCCA.4090904@c2b2.columbia.edu> Message-ID: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> > Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in > Bio/GFF/easy.py? They are currently using the old Fasta writer in > Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can > either update them to use the new Fasta writer, or simply remove them, > since currently these classes are not used anywhere in Biopython. This is for Bug 2284 right? http://bugzilla.open-bio.org/show_bug.cgi?id=2284 I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle Peter From jacobporter2002 at yahoo.com Sat Jun 30 20:25:57 2007 From: jacobporter2002 at yahoo.com (Jacob Porter) Date: Sat, 30 Jun 2007 13:25:57 -0700 (PDT) Subject: [Biopython-dev] I'd like to help out Message-ID: <296434.41950.qm@web33706.mail.mud.yahoo.com> Hello, I am an undergraduate at UC Berkeley in Applied Mathematics with computational biology going to graduate school at UC Davis in Applied Mathematics, and I would like to help out with the Biopython project. I've taken a compiler class with Python, and I'm pretty familiar with the core of Python. I work creating scripts in Python. I've done a little bit of work in phylogenomics, and that information can be found at this website: http://www.math.tamu.edu/~lgp/small-trees/ . I wrote Maple code to perform many of the computations on that website. I need to keep track of the number of hours that I work, and I want a signed document stating how many hours that I work after I am done. Perhaps I can start out by creating RPMs for Suse Linux. Other than that, I would like to work on projects for phylogenomics. I can also do other related things. If anyone has projects, please let me know. If anyone has information about RPMs and BioPython, or if you are responsible for that, please let me know. ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz