From bugzilla-daemon at portal.open-bio.org Tue May 1 08:01:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 08:01:49 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011201.l41C1nXg017300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-05-01 08:01 EST ------- Chris, I was not able to replicate this bug on any of the platforms I've tried so far (Windows 32-bits, Mac OS X, Unix, Linux). However, since it does occur on your system, I still feel that this is a true bug that should be fixed. Would you be willing to compile and run some test cases on your platform to find the source of this problem? One possibility is that the k-means algorithm gets stuck in an infinite (periodic) loop in which genes are assigned back and forth between clusters. I thought that with the current implementation, that was no longer possible, but maybe there is some case that I overlooked. Since the k-means algorithm starts from a random initial state, maybe on your platform starts from some funny initial state that doesn't appear on the other platforms, causing this bug to appear on your platform only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 14:31:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 14:31:06 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011831.l41IV6ZU000918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #9 from chris.lasher at gmail.com 2007-05-01 14:31 EST ------- I'd definitely be willing to run any tests. Just to note, I am not the one who discovered this bug, I was only the one who filed it. Credit for discovering it goes to Alex Lancaster who sent in notification of this on April 11th to the BioPython mailing list (see ). That was on a Fedora Core installation, so this is not just specific to 32-bit Ubuntu. Could this involve the source of the Numeric and mxTextTools packages? I installed Numeric Python and eGenix mxTextTools from the Ubuntu distribution packages, rather than from direct sources for both software packages. I can't see why this would make a difference but it is something to consider. Also, there's a possibility that I don't have all the required software, but I did not get any warnings when installing from CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 14:48:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 14:48:35 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011848.l41ImZOB001726@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 14:48 EST ------- Checking the version of Numeric may be worth while - I recall from the MMTK mailing list that some versions appeared to cause subtle bugs. In late 2005 Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version 23, but I don't know if he ever pinned down what the problem was (or indeed, if there really was a problem). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 15:06:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 15:06:30 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011906.l41J6UED002525@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #11 from chris.lasher at gmail.com 2007-05-01 15:06 EST ------- (In reply to comment #10) > Checking the version of Numeric may be worth while - I recall from the MMTK > mailing list that some versions appeared to cause subtle bugs. In late 2005 > Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version > 23, but I don't know if he ever pinned down what the problem was (or indeed, if > there really was a problem). > On Dapper Drake, Edgy Eft and Feisty Fawn, the Numeric packages are 24.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 15:50:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 15:50:47 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011950.l41JolgE004634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 15:50 EST ------- For reference, on my 64bit Ubuntu Dapper Drake system (where test_Cluster.py works) I have the following packages installed: python 2.4.2-0ubuntu3 python-reportlab 1.20debian-3ubuntu1 python-numeric 24.2-1ubuntu2 python-egenix-mxtexttools 2.0.6ubuntu1-1ubuntu4 i.e. Numeric 24.2 does work with test_Cluster.py for me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 2 14:44:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 May 2007 14:44:01 -0400 Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with alignments like Bio.SeqIO does sequences In-Reply-To: Message-ID: <200705021844.l42Ii154024905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2285 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-02 14:44 EST ------- Created an attachment (id=643) --> (http://bugzilla.open-bio.org/attachment.cgi?id=643&action=view) ZIP file containing four python scripts to go in Bio/AlignIO/*.py There is a follow up patch to Bio/SeqIO/__init__.py to basically use Bio.AlignIO for reading/writing clustal, stockholm and phylip instead. The corresponding parsers under Bio/SeqIO/*.py would then be removed. I have not yet worked out what a Nexus file looks like when it holds more than one alignment (if in fact this is possible). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 4 05:20:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 May 2007 05:20:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705040920.l449KVW3015656@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-05-04 05:20 EST ------- Chris, I found one Linux system on which test_Cluster.py hangs in the call to kmedoids instead of the call to kcluster. It turned out that this was due to a floating-point comparison in the kmedoids function. Since the same comparison occurs in the kcluster function, this may very well be the reason test_Cluster.py hangs on your platform in the call to kcluster. The comparison involves two floating-point variables which are bit-wise identical to each other. However, variable1 <= variable2 returns False. Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release 1.43) and print out the two variables "total" and "previous"? (You may find that test_Cluster.py no longer hangs when you add the printf statement; at least that is what happened with the call to kmedoids). If total and previous have the same value, but total>=previous returns False, then that would explain why the call to kcluster hangs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon May 7 10:16:38 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 07 May 2007 15:16:38 +0100 Subject: [Biopython-dev] Unified alignment input/output, Bio.AlignIO? In-Reply-To: <463240F9.8010907@maubp.freeserve.co.uk> References: <463240F9.8010907@maubp.freeserve.co.uk> Message-ID: <463F34C6.90008@maubp.freeserve.co.uk> Peter wrote: > Following the release of Biopython 1.43 with Bio.SeqIO, I would like to > do a better job for multiple sequence alignment file formats - creating > a new module Bio.AlignIO > > While most multiple sequence alignment files usually contain a single > alignment (made up of multiple sequences), this is not the general case. > > In the PHYLIP suite, concatenated alignments in phylip format are > produced by the seqboot program for tasks like bootstrapping of a > phylogenetic tree. Currently SeqIO chokes on these! > > Another example is the output of some the EMBOSS programs can contain > many multiple sequences alignments, for example the water and needle > tools can produce many pairwise alignments. > > In such cases, being able to write code like the following seems to be > the logical extension of the Bio.SeqIO style we have agreed on: > > from Bio import AlignIO > for alignment in AlignIO.parse("many.phy", "phylip") : > print "Alignment with %i sequences of length %i" \ > % (len(alignment.get_all_seqs()), > alignment.get_alignment_length()) > ... > > i.e. The AlignIO.parse() function would be an iterator returning > alignment objects. Does this sound reasonable so far? I have pressed ahead with this, there is a version attached to bug 2285 http://bugzilla.open-bio.org/show_bug.cgi?id=2285 This handles reading and writing of clustal, phylip, stockholm/pfam. I have not yet converted the Bio.SeqIO Nexus parser. Also, I plan to add a parser for reading the EMBOSS alignment format. As a side effect, this will actually remove a lot of the Bio.SeqIO code as handling any alignment file can be delegated to Bio.AlignIO instead. Would anyone like to comment on the scheme? Peter From bugzilla-daemon at portal.open-bio.org Mon May 7 13:45:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 13:45:32 -0400 Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with alignments like Bio.SeqIO does sequences In-Reply-To: Message-ID: <200705071745.l47HjWGl031779@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2285 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #643 is|0 |1 obsolete| | AssignedTo|biopython-dev at biopython.org |biopython- | |bugzilla at maubp.freeserve.co. | |uk Status|NEW |ASSIGNED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-07 13:45 EST ------- Created an attachment (id=646) --> (http://bugzilla.open-bio.org/attachment.cgi?id=646&action=view) ZIP file containing four python scripts to go in Bio/AlignIO/*.py Misc updates to previous version -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 15:42:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 15:42:15 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705071942.l47JgFi3004609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #14 from chris.lasher at gmail.com 2007-05-07 15:42 EST ------- Created an attachment (id=648) --> (http://bugzilla.open-bio.org/attachment.cgi?id=648&action=view) modified_Cluster.c_output.txt This is output from Cluster.c modified with a printf statement prior to line 2071 for total and previous. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 15:56:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 15:56:27 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705071956.l47JuRcu005295@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #15 from chris.lasher at gmail.com 2007-05-07 15:56 EST ------- (In reply to comment #13) > Chris, > > Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release > 1.43) and print out the two variables "total" and "previous"? (You may find > that test_Cluster.py no longer hangs when you add the printf statement; at > least that is what happened with the call to kmedoids). If total and previous > have the same value, but total>=previous returns False, then that would explain > why the call to kcluster hangs. > This did allow it to proceed up to test_distancematrix_kmedoids, however, it once again reaches an infinite loop in this test. Additionally, the value for "previous" reaches an enourmous number and I suspect it's not supposed to. (See the attached output.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 19:06:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 19:06:19 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705072306.l47N6JR3012976@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2007-05-07 19:06 EST ------- Thanks, Chris! Actually, this looks OK. The kcluster routine runs the k-means algorithm 100 times starting from random initial clusterings. On each run, total is initialized to DBL_MAX (the largest number representable as a double). This is the huge number that is printed (printf usually has problems to print DBL_MAX nicely, so it may appear weird in the output). The same floating-point comparison that causes kcluster to hang also appears in kmedoids, so it's no surprise that the code hangs there too. I'll write a patch that avoids this floating-point comparison and post it here. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 09:48:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 09:48:11 -0400 Subject: [Biopython-dev] [Bug 2289] New: LOCUS ss-cRNA => ERROR Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2289 Summary: LOCUS ss-cRNA => ERROR Product: Biopython Version: 1.24 Platform: PC OS/Version: Windows XP Status: NEW Severity: blocker Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: Daniel.Nicorici at gmail.com When I am processing a GenBank file from NCBI I get this error: ======================================================================= Traceback (most recent call last): File "F:\silvermine\tool\populator\ncbigenomic\source\python\do.py", line 26, in record = iterator.next() File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 142, in nex t return self._parser.parse(self.handle) File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 208, in par se self._scanner.feed(handle, self._consumer) File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 782, in _fee d_first_line 'LOCUS line does not contain valid sequence type (DNA, RNA, ...):\n' + line AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): LOCUS NC_005236 1769 bp ss-cRNA linear VRL 26-FEB-2007 ================================================================================ It seems that the error comes from the parser who is not able to handle ss-cRNA. If I replace ss-cRNA with ss-RNA then is no error anymore. Here is my python program which gives the error: =========================================================== import glob from Bio import GenBank # the files which will be processed path="G:\\Data\\NCBI\\genomic\\gbff\\temp\\complete*.genomic.gbff" print "Starting..." organism=[] count_organism=[] feature=[] count_feature=[] qualifier=[] count_qualifier=[] files = glob.glob(path) for file in files: print ">>>>>>>>>>>>>>>>>>>>>>>>>> " + file + " <<<<<<<<<<<<<<<<<<<<<<<<<" parser = GenBank.RecordParser() #infile = open("complete1short.genomic.gbff") infile = open(file); iterator = GenBank.Iterator(infile, parser) record = iterator.next() while record is not None: print record.locus + " --- " + record.organism + " --- " + record.version # organism flag=0 for b in range(len(organism)): if organism[b]==record.organism: count_organism[b]=count_organism[b]+1 flag=1 break if flag==0: organism.append(record.organism) count_organism.append(1) # features for a in range(len(record.features)): flag=0 for b in range(len(feature)): if feature[b]==record.features[a].key: count_feature[b]=count_feature[b]+1 flag=1 break if flag==0: feature.append(record.features[a].key) count_feature.append(1) #print "--" + record.features[i].key # qualifiers for c in range(len(record.features[a].qualifiers)): flag=0 for b in range(len(qualifier)): if qualifier[b]==record.features[a].qualifiers[c].key: count_qualifier[b]=count_qualifier[b]+1 flag=1 break if flag==0: qualifier.append(record.features[a].qualifiers[c].key) count_qualifier.append(1) #print "----" + record.features[i].qualifiers[j].key record=iterator.next() print "===================ORGANISM========================" for i in range(len(organism)): print organism[i] + "\t" + str(count_organism[i]) print "===================END_ORGANISM====================" print "===================FEATURES========================" for i in range(len(feature)): print feature[i] + "\t" + str(count_feature[i]) print "===================END_FEATURES====================" print "===================QUALIFIERS========================" for i in range(len(qualifier)): print qualifier[i] + "\t" + str(count_qualifier[i]) print "===================END_QUALIFIERS====================" print "The End!!!" x=raw_input("Press ENTER to continue...") ============================================================ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 10:06:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:06:32 -0400 Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR In-Reply-To: Message-ID: <200705091406.l49E6WGi008294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:06 EST ------- Confirmed: the parser currently only accepts entries 'DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'. Could you tell me where you got this GenBank file from? It would be helpful for testing (and I may want to add a similar example to the test suite). Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 10:25:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:25:59 -0400 Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR In-Reply-To: Message-ID: <200705091425.l49EPxNf009285@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 ------- Comment #2 from Daniel.Nicorici at gmail.com 2007-05-09 10:25 EST ------- Hello, The entry ss-cRNA appears in the file: ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 10:48:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:48:30 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091448.l49EmU9D010377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |normal Status|ASSIGNED |RESOLVED OS/Version|Windows XP |All Platform|PC |All Resolution| |FIXED Summary|LOCUS ss-cRNA => ERROR |Can't parse GenBank files | |with "ss-cRNA" in the LOCUS | |line Version|1.24 |Not Applicable ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:48 EST ------- See also Bug 2231. With hindsight checking against a known list of sequences types was too harsh. It now just looks for the text "DNA" or "RNA" within this field of the LOCUS line in GenBank files. I've checked in a fix to CVS, and checked I can parse GenBank file NC_005236 The simplest way to update your machine Daniel is to download and replace the file D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py with revision 1.11 from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython There will be a slight time delay before the CVS web site updates itself - you can of course get the file sfrom CVS directly if you would rather. Please let us know (on this bug) if that doesn't solve this problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 10:51:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:51:05 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091451.l49Ep5HC010511@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:51 EST ------- P.S. I have not tried the full file from here, as the FTP site was timing out. ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz (15 MB) I just tried the single GenBank record for NC_005236 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 10:57:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:57:06 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091457.l49Ev6L0010785@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 ------- Comment #5 from Daniel.Nicorici at gmail.com 2007-05-09 10:57 EST ------- Here is the part of the file that generates the error: ======================================================================= LOCUS NC_005236 1769 bp ss-cRNA linear VRL 20-FEB-2007 DEFINITION Seoul virus strain 80-39 segment S, complete sequence. ACCESSION NC_005236 VERSION NC_005236.1 GI:38505529 PROJECT GenomeProject:15027 KEYWORDS . SOURCE Seoul virus ORGANISM Seoul virus Viruses; ssRNA negative-strand viruses; Bunyaviridae; Hantavirus. REFERENCE 1 (bases 1 to 1769) AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J. TITLE Genetic analysis of the full length of S segment of Seoul virus prototype, 80-39 strain JOURNAL Unpublished REFERENCE 2 (bases 1 to 1769) CONSRTM NCBI Genome Project TITLE Direct Submission JOURNAL Submitted (12-AUG-2004) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA REFERENCE 3 (bases 1 to 1769) AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J. TITLE Direct Submission JOURNAL Submitted (09-APR-2003) Department of Microbiology, College of Medicine, Korea University, 5-ka, Anam-dong, Sungbuk-ku, Seoul 136-705, Korea COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AY273791. COMPLETENESS: full length. FEATURES Location/Qualifiers source 1..1769 /organism="Seoul virus" /mol_type="viral cRNA" /strain="80-39" /isolation_source="Rattus norvegicus" /db_xref="taxon:11608" /segment="segment S" /country="South Korea" gene 43..1332 /locus_tag="SEOVsSgp1" /db_xref="GeneID:2943086" CDS 43..1332 /locus_tag="SEOVsSgp1" /codon_start=1 /product="nucleocapsid protein" /protein_id="NP_942556.1" /db_xref="GI:38505530" /db_xref="GeneID:2943086" /translation="MATMEEIQREISAHEGQLVIARQKVKDAEKQYEKDPDDLNKRAL HDRESVAASIQSKIDELKRQLADRIAAGKNIGQDRDPTGVEPGDHLKERSALSYGNTL DLNSLDIDEPTGQTADWLTIIVYLTSFVVPIILKALYMLTTRGRQTSKDNKGMRIRFK DDSSYEDVNGIRKPKHLYVSMPNAQSSMKAEEITPGRFRTAVCGLYPAQIKARNMVSP VMSVVGFLALAKDWTSRIEEWLGAPCKFMAESPIAGSLSGNPVNRDYIRQRQGALAGM EPKEFQALRQHSKDAGCTLVEHIESPSSIWVFAGAPDRCPPTCLFVGGMAELGAFFSI LQDMRNTIMASKTVGTADEKLRKKSSFYQSYLRRTQSMGIQLDQRIIVMFMVAWGKEA VDNFHLGDDMDPELRSLAQILIDQKVKEISNQEPMKL" ORIGIN 1 tagtagtaga ctccctaaag agctactcca ctaacaagag aaatggcaac tatggaggaa 61 atccagagag aaatcagtgc tcacgagggg cagcttgtga tagcacgcca gaaggtcaag 121 gatgcagaaa agcagtatga gaaggatcct gatgacttaa acaagagggc actgcatgat 181 cgggagagtg tcgcagcttc aatacaatca aaaattgatg aactgaagcg ccaacttgcc 241 gacaggattg cagcagggaa gaacatcggg caagaccggg atcctacagg ggtagagccg 301 ggtgatcatc tcaaggaaag atcagcacta agctacggga atacactgga cctgaatagt 361 cttgacattg atgaacctac aggacaaaca gctgattggc tgactataat tgtctatcta 421 acatcattcg tggtcccgat catcttgaag gcactgtaca tgttaacaac aagaggtagg 481 cagacttcaa aggacaacaa ggggatgagg atcagattca aggatgacag ctcatatgag 541 gatgtcaatg ggatcagaaa gcctaaacat ctgtatgtgt caatgccaaa cgcccaatcc 601 agtatgaagg ctgaagagat aacaccagga agattccgca ctgcagtatg tgggctatat 661 cctgcacaga taaaggcaag gaatatggta agccctgtca tgagtgtagt tgggtttttg 721 gcactagcaa aagactggac atctagaatt gaagaatggc ttggcgcacc ctgcaagttc 781 atggcagagt ctcctattgc tgggagttta tctgggaatc ctgtgaatcg tgactatatc 841 agacaaagac aaggtgcact tgcagggatg gagccaaagg aatttcaagc cctcaggcaa 901 cattcaaagg atgctggatg tacactagtt gaacatattg agtcaccatc gtcaatatgg 961 gtgtttgctg gggcccctga taggtgtcca ccaacatgct tgtttgttgg agggatggct 1021 gagttaggtg ccttcttttc tatacttcag gatatgagga acacaatcat ggcttcaaaa 1081 actgtgggca cagctgatga aaagcttcga aagaaatcat cattctatca atcatacctc 1141 agacgcacac aatcaatggg aatacaactg gaccagagga taattgttat gtttatggtt 1201 gcctggggaa aggaggcagt ggacaacttc catctcggtg atgacatgga tccagagctt 1261 cgtagcctgg ctcagatctt gattgaccag aaagtgaagg aaatctcgaa ccaggagcct 1321 atgaaattat aagcacataa atatgtaatc aatactaact ataggttaag aaatactaat 1381 cattagttaa taagaataca gatttattga ataatcatat taaataatta ggtaagttaa 1441 atattattta gttaagttag ctaattgatt tatatgatta tcacaattga atgtaatcat 1501 aagcacaatc actgccatgt ataatcacgg gtatacgggt ggttttcata tggggaacag 1561 ggtgggctta gggccaggtc accttaagtg accttttttt gtatatatgg atgtagattt 1621 caattgatcg aatactaatc ctactgtcct cttttctttt cctttctcct tctttactaa 1681 caacaacaaa ctacctcaca accttctacc tcaatatata ctacctcatt aagttgtttc 1741 cttttgtctt tttagggagt ctactacta // ======================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From stephen at blackrim.net Wed May 9 11:58:16 2007 From: stephen at blackrim.net (Stephen A Smith) Date: Wed, 09 May 2007 11:58:16 -0400 Subject: [Biopython-dev] [Off Topic] Google Group Message-ID: <4641EF98.90504@blackrim.net> Hi all, Just letting you know there is a google group open now for discussions of all thing programming and evolutionary biology. You can find it here http://groups.google.com/group/evo_code. Figured the people at bio* might be interested. Take care Stephen Smith -- Dept. Ecology and Evolutionary Biology Yale University http://www.blackrim.org -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GS dpu s+: a- C++++ UL++++ P--- L++++ E--- W+++ N-- o-- K++++ w--- O- M-- V- PS+++ PE-- Y++ PGP++ t-- 5 X++ R-- tv++ b++++ DI+ D++ G++ e+++ h--- r+++ y+++ ------END GEEK CODE BLOCK------ From bugzilla-daemon at portal.open-bio.org Thu May 10 08:59:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 08:59:07 -0400 Subject: [Biopython-dev] [Bug 2290] New: Not reading 1YVE.pdb Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2290 Summary: Not reading 1YVE.pdb Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: proszek at gmail.com biopython 1.42 fails to read 1YVE.pdb file, although it reads test.pdb created by: awk '{if($1=="ATOM"){print}}' 1YVE.pdb line 8610 is a HETATM line Traceback below (where file=sys.argv[1]=1YVE.pdb) WARNING: Chain J is discontinuous at line 8610. Traceback (most recent call last): File "./wezly.py", line 122, in ? b=Protein(sys.argv[1]) File "./wezly.py", line 15, in __init__ self.struct=self.parser.get_structure('X',file) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 66, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 87, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 179, in _parse_coordinates structure_builder.init_residue(resname, hetero_flag, resseq, icode) File "/usr/lib/python2.4/site-packages/Bio/PDB/StructureBuilder.py", line 155, in init_residue self.chain.add(residue) File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 80, in add raise PDBConstructionException, "%s defined twice" % entity.get_full_id() File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 132, in get_full_id parent=self.get_parent() File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 102, in get_parent raise PDBException, 'No parent' PDBException: No parent -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 09:53:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 09:53:05 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705101353.l4ADr544030572@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 09:53 EST ------- Where did you get your 1YVE.pdb file from? Directly from the PDB? Just as a remark, the "PDBException: No parent" is not the problem. The error is further back, PDBConstructionException, "??? defined twice", and when Bio.PDB tries to get the identity of the problem residue it falls over. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 10:06:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 10:06:21 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705101406.l4AE6LRW031886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 10:06 EST ------- Which version of Biopython do you have? The "no parent bug" was fixed as bug 1936, make sure you have Biopython 1.43 or later. Mine installation of Biopython works but spits out a LOT of PDBConstructionException warnings about multiply defined water atoms (aka "Residue HOH"). Looking at the raw PDB file, there is a problem with multiply defined waters. As you can see below, the identifier jumps from 799 back to 1 (i.e. there are two waters with residue number 1). ... HETATM16581 O HOH 793 36.450 15.564 -9.023 1.00 39.79 O HETATM16582 O HOH 794 33.448 13.711 -11.019 1.00 40.42 O HETATM16583 O HOH 796 28.414 11.908 -16.047 1.00 48.15 O HETATM16584 O HOH 797 29.445 8.114 -11.059 1.00 55.49 O HETATM16585 O HOH 799 28.383 5.173 -8.998 1.00 33.85 O HETATM16586 O HOH 1 26.615 4.599 -6.718 1.00 24.95 O HETATM16587 O HOH 2 23.353 4.948 -7.137 1.00 34.47 O HETATM16588 O HOH 3 17.401 11.710 0.938 1.00 35.16 O HETATM16589 O HOH 4 21.326 11.092 8.215 1.00 22.51 O HETATM16590 O HOH 5 13.703 2.159 11.421 1.00 24.87 O ... Are you happy for me to mark this as a duplicate of bug 1936 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 11:07:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 11:07:56 -0400 Subject: [Biopython-dev] [Bug 2291] New: __init__.py missing in the Bio.PDB.mmCIF folder after the install Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2291 Summary: __init__.py missing in the Bio.PDB.mmCIF folder after the install Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: normal Priority: P2 Component: Website AssignedTo: biopython-dev at biopython.org ReportedBy: jean.lechner at gmail.com When you install Biopython you musst uncoment some lines in the setup.py file But at the end of the instalation the __init__.py file ils not created in the Bio.PDB.mmCIF directory So you cannot use MMCIFParser or MMCIF2Dict because biopython cannot import MMCIFlex from Bio.PDB.mmCIF -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 11:08:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 11:08:48 -0400 Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF folder after the install In-Reply-To: Message-ID: <200705101508.l4AF8mQf003465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2291 jean.lechner at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jean.lechner at gmail.com -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Thu May 10 13:18:23 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 10 May 2007 10:18:23 -0700 Subject: [Biopython-dev] Biopython talk at BOSC 2007? Message-ID: <464353DF.6030700@burnham.org> Anybody giving a talk on Biopython? We can get a 20-30 minute slot in Vienna, but someone has to show up and talk. Personally, I will actually be there for the ISMB SIGs, but as I am running my own conference, it will be a bit of a strain to talk at BOSC. However, the main reason I do not want to speak is that there are people much more deserving here. So if anyone plans to be at ISMB 2007 in any case, and wishes to represent Biopython with serpentine honor, contact Darin. Best, Iddo -------- Original Message -------- Subject: BOSC 2007 Second Call For Papers Date: Thu, 10 May 2007 12:17:41 -0400 From: darin.london at duke.edu To: biopython-owner at lists.open-bio.org The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From bugzilla-daemon at portal.open-bio.org Sun May 13 16:30:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 May 2007 16:30:10 -0400 Subject: [Biopython-dev] [Bug 2292] New: Bio.PDBIO writes TER records without any required fields Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2292 Summary: Bio.PDBIO writes TER records without any required fields Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: misiek at genesilico.pl Bio.PDBIO is happy to write TER records as "TER\n", which is inconsistent with PDB format specification. The PDB format requires that TER records have some fields similar to ATOM records: '''The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER.''' [See http://www.wwpdb.org/documentation/format23/sect9.html#TER] It leads to problem with programs that require correct TER records (like multiple structural alignment program MUSTANG), and crash when they are not found. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 13 16:31:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 May 2007 16:31:18 -0400 Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any required fields In-Reply-To: Message-ID: <200705132031.l4DKVIP9008944@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2292 ------- Comment #1 from misiek at genesilico.pl 2007-05-13 16:31 EST ------- Created an attachment (id=652) --> (http://bugzilla.open-bio.org/attachment.cgi?id=652&action=view) Proposed patch to PDBIO.py This is a simple fix. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Mon May 14 12:27:42 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 14 May 2007 09:27:42 -0700 Subject: [Biopython-dev] Subject: BOSC 2007 2nd Call For Papers. Message-ID: <46488DFE.3070908@burnham.org> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From idoerg at gmail.com Mon May 14 12:28:36 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 14 May 2007 09:28:36 -0700 Subject: [Biopython-dev] BOSC 2007 Abstract Submission Deadline Extension Message-ID: <46488E34.8000604@burnham.org> Subject: BOSC 2007 Abstract Submission Deadline Extension Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st. The announcement day will remain the same so that it remains before the Early Discount Date. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From bugzilla-daemon at portal.open-bio.org Mon May 14 18:18:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 May 2007 18:18:47 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705142218.l4EMIlwD008110@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE Version|Not Applicable |1.42 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-14 18:18 EST ------- *** This bug has been marked as a duplicate of bug 1936 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon May 14 18:29:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 14 May 2007 23:29:05 +0100 Subject: [Biopython-dev] Bugzilla Version Numbers In-Reply-To: <46273FB4.4030805@maubp.freeserve.co.uk> References: <128a885f0704102204p2872f42fh685919bb8b4656c3@mail.gmail.com> <46273FB4.4030805@maubp.freeserve.co.uk> Message-ID: <4648E2B1.5040706@maubp.freeserve.co.uk> Peter wrote: > Chris Lasher wrote: >> Hi all, >> >> Does anybody active with Biopython have administrative capabilities >> for the project's Bugzilla tracker? The version numbers are a wee >> bit out of date. > > They are, aren't they! I asked on the list last month about this, > and updating the component fields too: > > http://lists.open-bio.org/pipermail/biopython-dev/2007-March/002652.html > > As no-one on the list has come forward, I guess one of us should get > in touch with the relevant Open Bio people, probably by emailing > "support" at the domain helpdesk.open-bio.org > > Who needs/wants bugzilla admin rights? I've been in touch with Jason Stajich and he has done some magic: Michiel and I can now creategroups, editclassifications, editcomponents, editkeywords. I think that's all we need? I have initially added 1.42 and 1.43 to the version field for Biopython in bugzilla. I would also propose we have a few new components, such as PDB, Nexus and SeqIO (or perhaps rather than SeqIO something more general like sequence parsing). Peter From jfeala at gmail.com Tue May 15 12:42:30 2007 From: jfeala at gmail.com (Jake Feala) Date: Tue, 15 May 2007 09:42:30 -0700 Subject: [Biopython-dev] interaction networks in biopython Message-ID: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> Hello Biopython people - With all the new research in genome-wide cellular interaction networks I was a little surprised not to see much support for these type of data in Biopython. I know that Bioperl has a networks package that looks like the kind of thing that I would love to also see in Python for all the obvious reasons. First - has this already been done and I missed it? All I could find were a few scattered and application-specific scripts across the web, plus the Pathway package in BioPython. If not, then would there be any interest in development along these lines? A while back I wrote a few scripts that parse interaction datasets, stick them into a MySQL database, and retrieve the interactions into a Network object that can be used to analyze the graph of nodes and links. I would be glad to update these to fit into the biopython framework, as it would be useful to my own research. One caveat is that I am an engineering PhD student and my programming skills are mostly self-taught beyond two Java courses, so I might need a little guidance in testing and preparing the code for distribution. I have only ever written code for my own personal research but I think my style is decent and I would love to get better. Any opinion or advice? Thanks -Jake Feala Bioengineering Dept. University of California, San Diego From edschofield at gmail.com Tue May 15 14:37:30 2007 From: edschofield at gmail.com (Ed Schofield) Date: Tue, 15 May 2007 19:37:30 +0100 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> References: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> Message-ID: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com> On 5/15/07, Jake Feala wrote: > Hello Biopython people - > > With all the new research in genome-wide cellular interaction > networks I was a little surprised not to see much support for these > type of data in Biopython. I know that Bioperl has a networks package > that looks like the kind of thing that I would love to also see in > Python for all the obvious reasons. > > First - has this already been done and I missed it? All I could find > were a few scattered and application-specific scripts across the web, > plus the Pathway package in BioPython. > > If not, then would there be any interest in development along these > lines? A while back I wrote a few scripts that parse interaction > datasets, stick them into a MySQL database, and retrieve the > interactions into a Network object that can be used to analyze the > graph of nodes and links. I would be glad to update these to fit into > the biopython framework, as it would be useful to my own research. > > One caveat is that I am an engineering PhD student and my programming > skills are mostly self-taught beyond two Java courses, so I might need > a little guidance in testing and preparing the code for distribution. > I have only ever written code for my own personal research but I think > my style is decent and I would love to get better. > > Any opinion or advice? This would interest me too; I'd be glad to have such functionality in BioPython. I can offer you some guidance on Python, packaging and testing, and (if you need it) use of external array packages. -- Ed From yair.benita at gmail.com Tue May 15 15:25:27 2007 From: yair.benita at gmail.com (Yair Benita) Date: Tue, 15 May 2007 15:25:27 -0400 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com> Message-ID: I would be happy to contribute to this too. Currently I have a python script that uses HPRD to generate protein protein interaction maps. I have deferent filtering methods to display only classes of proteins or only links to a specific kegg pathway. It will need a bit of work before I can submit this to CVS. As for drawing the map, I am currently generating a dot file that can be converted to an image using GRAPHVIZ. If anyone wants to suggest anything else, please do. Yair on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > On 5/15/07, Jake Feala wrote: >> Hello Biopython people - >> >> With all the new research in genome-wide cellular interaction >> networks I was a little surprised not to see much support for these >> type of data in Biopython. I know that Bioperl has a networks package >> that looks like the kind of thing that I would love to also see in >> Python for all the obvious reasons. >> >> First - has this already been done and I missed it? All I could find >> were a few scattered and application-specific scripts across the web, >> plus the Pathway package in BioPython. >> >> If not, then would there be any interest in development along these >> lines? A while back I wrote a few scripts that parse interaction >> datasets, stick them into a MySQL database, and retrieve the >> interactions into a Network object that can be used to analyze the >> graph of nodes and links. I would be glad to update these to fit into >> the biopython framework, as it would be useful to my own research. >> >> One caveat is that I am an engineering PhD student and my programming >> skills are mostly self-taught beyond two Java courses, so I might need >> a little guidance in testing and preparing the code for distribution. >> I have only ever written code for my own personal research but I think >> my style is decent and I would love to get better. >> >> Any opinion or advice? > > This would interest me too; I'd be glad to have such functionality in > BioPython. I can offer you some guidance on Python, packaging and > testing, and (if you need it) use of external array packages. > > -- Ed > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Tue May 15 15:19:23 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 15 May 2007 20:19:23 +0100 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> Message-ID: <464A07BB.8020206@maubp.freeserve.co.uk> Chris Lasher wrote: > Since no one else has volunteered, I'm taking up responsibility for > the transition. I got the ball moving by contacting "support at > open-bio.org" to get alert them of our interest and get any contacts > we'll need to make this happen. Also, if anybody on the list has any > information that would be helpful in this (e.g., who administers the > CVS repo) please feel free to send it along. Likewise, feel free to > raise any questions, concerns, and comments on the list. Did you get any information from the Open Bioinformatics Foundation guys about moving from CVS to subversion? Peter From lists.steve at arachnedesign.net Tue May 15 15:56:46 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Tue, 15 May 2007 15:56:46 -0400 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: Message-ID: Hi, On May 15, 2007, at 3:25 PM, Yair Benita wrote: > I would be happy to contribute to this too. > Currently I have a python script that uses HPRD to generate protein > protein > interaction maps. I have deferent filtering methods to display only > classes > of proteins or only links to a specific kegg pathway. It will need > a bit of > work before I can submit this to CVS. As for drawing the map, I am > currently > generating a dot file that can be converted to an image using > GRAPHVIZ. If > anyone wants to suggest anything else, please do. I've been using NetworkX[1] to play w/ networks/graphs interactively. You can display them if you have matplotlib installed, and can save the graphs to dot format as well. -steve [1] NetworkX: https://networkx.lanl.gov/wiki > > Yair > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > >> On 5/15/07, Jake Feala wrote: >>> Hello Biopython people - >>> >>> With all the new research in genome-wide cellular interaction >>> networks I was a little surprised not to see much support for these >>> type of data in Biopython. I know that Bioperl has a networks >>> package >>> that looks like the kind of thing that I would love to also see in >>> Python for all the obvious reasons. >>> >>> First - has this already been done and I missed it? All I could >>> find >>> were a few scattered and application-specific scripts across the >>> web, >>> plus the Pathway package in BioPython. >>> >>> If not, then would there be any interest in development along these >>> lines? A while back I wrote a few scripts that parse interaction >>> datasets, stick them into a MySQL database, and retrieve the >>> interactions into a Network object that can be used to analyze the >>> graph of nodes and links. I would be glad to update these to fit >>> into >>> the biopython framework, as it would be useful to my own research. >>> >>> One caveat is that I am an engineering PhD student and my >>> programming >>> skills are mostly self-taught beyond two Java courses, so I might >>> need >>> a little guidance in testing and preparing the code for >>> distribution. >>> I have only ever written code for my own personal research but I >>> think >>> my style is decent and I would love to get better. >>> >>> Any opinion or advice? >> >> This would interest me too; I'd be glad to have such functionality in >> BioPython. I can offer you some guidance on Python, packaging and >> testing, and (if you need it) use of external array packages. >> >> -- Ed >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From salish at picasso.ucsf.edu Tue May 15 16:36:05 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Tue, 15 May 2007 13:36:05 -0700 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface Message-ID: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> Hello everyone, I started using Biopython in my research and I needed a way to write GenBank files from a SeqRecord (which was parsed from other GenBank/etc files). So I wrote something up. It uses the SeqIO interface and behaves like the fasta writer. The SeqIO.write(record, handle, "genbank") interface accepts "record" as either a SeqRecord generator with multiple records or a single record from SeqRecord. So record = SeqRecord or record = SeqRecord.next() both work. (I'm a relatively new to Python, so please excuse any bad terminology or stylistic deficiencies). The changes are: a new file called GenBankWriter.py in Bio/GenBank. Small changes to the __init__.py of Bio/GenBank. Changes to the _feed_first_line function of Scanner.py of Bio/GenBank. I had to change the way Bio/GenBank/Scanner.py reads the Locus line of a GenBank file in order to handle missing data and newer molecule types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). I also add/change a couple of lines in __init__.py to store whether a sequence was linear or circular and to store the string that encodes its molecule type (ss-RNA, etc). The output of SeqIO.write(record,handle,"genbank") is functionally identical to a GenBank file from NCBI except for some spacing and word wrap issues. What is the best way to submit new code for review? Whom do I send it to and should I send only the modified files? I've included one of my test scripts below just to show how it works. (Does anyone suggest any changes in the interface?) Thank you. Sincerely, Howard Salis Postdoctoral Scholar UC San Francisco #ASimpleTest.py """A vigorous exercise of the GenBankWriter class and the SeqIO interface.""" from Bio import SeqIO from Bio import GenBank working_dir = "E:\\Plasmids\\" #Get some arbitrarily chosen GenBank files (these are relatively small ones) gi_list = GenBank.search_for("EF470550 OR EF470551") print gi_list ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank") #Write the pair of strings to a single file. handle = open(working_dir + "Source.gb","w") for gi in gi_list: handle.write(str(ncbi_dict[gi])) handle.close() #Parse the Source file into a SeqRecord generator (two records) handle = open(working_dir + "Source.gb","r") records = SeqIO.parse(handle,"genbank") #write many records to a single GenBank file file = open(working_dir + "ManyRecords.gb","w") SeqIO.write(records,file,"genbank") file.close() handle.close() #---- #Parse the Source file into a SeqRecord generator (two records) handle = open(working_dir +"Source.gb","r") records = SeqIO.parse(handle,"genbank") #Write individual records into their own GenBank file counter=0 for record in records: counter+=1 file = open(working_dir + "OneFile_" + str(counter) + ".gb","w") SeqIO.write(record,file,"genbank") file.close() handle.close() #Open then back up again, parse them, and write them to a single file handle = open(working_dir + "ManyRecords_Out.gb","w") for num in range(1,counter+1): print num file = open(working_dir +"OneFile_" + str(num) + ".gb","r") records = SeqIO.parse(file,"genbank") SeqIO.write(records,handle,"genbank") file.close() handle.close() #Compare the original GenBank file in Source.gb to the GenBankWriter'd one. original = open(working_dir +"Source.gb","r") newone = open(working_dir + "ManyRecords_Out.gb","r") records_original = SeqIO.parse(original,"genbank") records_newone = SeqIO.parse(newone,"genbank") for (record_original,record_newone) in zip(records_original,records_newone): print str(record_original) print str(record_newone) original.close() newone.close() print "Done" From biopython-dev at maubp.freeserve.co.uk Tue May 15 16:59:55 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 15 May 2007 21:59:55 +0100 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> Message-ID: <464A1F4B.9020705@maubp.freeserve.co.uk> Howard Salis wrote: > Hello everyone, > > I started using Biopython in my research and I needed a way to > write GenBank files from a SeqRecord (which was parsed from other > GenBank/etc files). So I wrote something up. It uses the SeqIO > interface and behaves like the fasta writer. Sounds nice - its something I've been thinking about doing myself, but I wanted to do both both GenBank and EMBL, sharing the feature table writing code. Something else to keep in mind is writing any SeqRecord to a GenBank (or EMBL) file, even if it did not get created from a GenBank or EMBL file and is therefore lacking lots of annotation. > The changes are: a new file called GenBankWriter.py in Bio/GenBank. > Small changes to the __init__.py of Bio/GenBank. Changes to the > _feed_first_line function of Scanner.py of Bio/GenBank. > > I had to change the way Bio/GenBank/Scanner.py reads the Locus line of > a GenBank file in order to handle missing data and newer molecule > types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). That was recently fixed on Bug 2289 http://bugzilla.open-bio.org/show_bug.cgi?id=2289 > I also add/change a couple > of lines in __init__.py to store whether a sequence was linear or > circular and to store the string that encodes its molecule type > (ss-RNA, etc). I thought we already stored this information - but I'm not sure off hand. > The output of SeqIO.write(record,handle,"genbank") is > functionally identical to a GenBank file from NCBI except for some > spacing and word wrap issues. Good :) > What is the best way to submit new code for review? Whom do I send it > to and should I send only the modified files? You could email it directly to me, but it would be better to create a bug (an "enhancement") and then attached the changes to the bug. Edited versions of files will do, but patch files are best. You should use the unix "diff" command line tool to create a patch file. One way to do this on Windows is to install cygwin... > I've included one of my test scripts below just to show how it works. > (Does anyone suggest any changes in the interface?) Looking at the code, at first glance it looks like you are hooking into the existing Bio.SeqIO interface nicely. I look forward to seeing your code Howard. Peter From bugzilla-daemon at portal.open-bio.org Tue May 15 21:55:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 May 2007 21:55:14 -0400 Subject: [Biopython-dev] [Bug 2294] New: These patches allow one to write a GenBank file using the SeqIO interface Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2294 Summary: These patches allow one to write a GenBank file using the SeqIO interface Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: howard.salis at gmail.com The SeqIO interface currently reads from, but does not write to the GenBank format. The GenBank format is widely used and is often chosen as the data storage format for many plasmid, genome, and other nucleotide editors. By giving Biopython the capability of writing annotated sequences to the GenBank format, one can use Biopython to read in raw sequences, analyze and annotate them, and then view them in a nucleotide visual editor. The following patches do exactly this and use the current SeqIO interface to do it. The following attached patches enable the command SeqIO.write(record,handle,"genbank"), where handle is an open, writable file-object and record is _either_ a SeqRecord generator or the result of one of its iterations. That is, if one did manyrecords = SeqIO.parse(handle,"genbank") or onerecord = manyrecords.next(), then one could pass either manyrecords or onerecord to SeqIO.write(). If a generator containing multiple records is passed, all records are written to a single GenBank file. If one record is passed, it is written to file. The file is not closed, though, and may be called multiple times to write additional records to file. The attached patches make small modifications to Bio/SeqIO/__init__.py and Bio/SeqIO/InsdcIO.py. The _feed_first_line function in Bio/GenBank/Scanner.py is altered to handle missing data (it uses a very Pythonic dictionary of test lambda functions to parse the meaning of words). Finally, a new file is created called Bio/GenBank/GenBankWriter.py. Questions, Comments, Suggestions, Criticisms, etc are welcome. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 15 21:56:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 May 2007 21:56:32 -0400 Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a GenBank file using the SeqIO interface In-Reply-To: Message-ID: <200705160156.l4G1uWRZ005077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 howard.salis at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython-dev at biopython.org |howard.salis at gmail.com Status|NEW |ASSIGNED ------- Comment #1 from howard.salis at gmail.com 2007-05-15 21:56 EST ------- Created an attachment (id=654) --> (http://bugzilla.open-bio.org/attachment.cgi?id=654&action=view) patch to Bio/GenBank/Scanner.py (alters _feed_first_line under GenBank class) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From salish at picasso.ucsf.edu Tue May 15 22:34:08 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Tue, 15 May 2007 19:34:08 -0700 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <464A1F4B.9020705@maubp.freeserve.co.uk> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk> Message-ID: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> On 5/15/07, Peter wrote: > Sounds nice - its something I've been thinking about doing myself, but I > wanted to do both both GenBank and EMBL, sharing the feature table > writing code. Yep, since EMBL and GenBank share the same feature format, I've separated the "foreword", feature table, and sequence write functions. So if someone wants to write the EMBL writer, they just need to write the appropriate foreword. I think the sequence data is stored the same too? Is that correct? > Something else to keep in mind is writing any SeqRecord to a GenBank (or > EMBL) file, even if it did not get created from a GenBank or EMBL file > and is therefore lacking lots of annotation. Very true. The GenBankWriter.py will either leave these fields blank, leave out their keywords entirely if they are optional, or add something like or when it's necessary to have something there. > > I also add/change a couple > > of lines in __init__.py to store whether a sequence was linear or > > circular and to store the string that encodes its molecule type > > (ss-RNA, etc). > > I thought we already stored this information - but I'm not sure off hand. Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA()) that says whether it's DNA, RNA, peptide, etc, but even if I matched these ups with strings, then the "ss-", "ds-", etc part would be missing. I just saved the exact wording of the sequence type (e.g. "ds-DNA", "ss-RNA", etc) to an dictionary key named self.data.annotations["sequence_type"] in the _FeatureConsumer class under GenBank. This is in addition to the alphabet of the sequence so it shouldn't conflict. > You could email it directly to me, but it would be better to create a > bug (an "enhancement") and then attached the changes to the bug. Edited > versions of files will do, but patch files are best. Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294 > I look forward to seeing your code Howard. > > Peter Thank you! And I hope to continue to contribute to Biopython. -Howard From biopython-dev at maubp.freeserve.co.uk Wed May 16 03:53:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 16 May 2007 08:53:41 +0100 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk> <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> Message-ID: <464AB885.80305@maubp.freeserve.co.uk> Howard Salis wrote: >> Sounds nice - its something I've been thinking about doing myself, >> but I wanted to do both both GenBank and EMBL, sharing the feature >> table writing code. > > Yep, since EMBL and GenBank share the same feature format, I've > separated the "foreword", feature table, and sequence write > functions. Using "foreword / features / sequence" avoids clashing with the terms "header" and "footer" used in Bio.SeqIO to mean parts of a multi-sequence file which do not belong to a specific record. Maybe I should update Bio/GenBank/Scanner.py to use similar terminology... > So if someone wants to write the EMBL writer, they just need to write > the appropriate foreword. There is also the issue of translation between EMBL/GenBank terminology, for example where someone has read in an EMBL file and wants to write it out as a GenBank file. For a simple example, the division class should probably map: {'PRI': 'MAM', 'BCT': 'PRO', 'UNA': 'UNC'} > I think the sequence data is stored the same too? Is that correct? Actually, the way the sequence is printed out is slightly different. >>> I also add/change a couple of lines in __init__.py to store >>> whether a sequence was linear or circular and to store the string >>> that encodes its molecule type (ss-RNA, etc). >> I thought we already stored this information - but I'm not sure off >> hand. > > Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA()) > that says whether it's DNA, RNA, peptide, etc, but even if I matched > these ups with strings, then the "ss-", "ds-", etc part would be > missing. I just saved the exact wording of the sequence type (e.g. > "ds-DNA", "ss-RNA", etc) to an dictionary key named > self.data.annotations["sequence_type"] in the _FeatureConsumer class > under GenBank. This is in addition to the alphabet of the sequence so > it shouldn't conflict. That's probably a good idea. However, we would need to check what the EMBL equivalents are and convert them when writing GenBank files. Maybe we should just keep things simple and write one of RNA/DNA/Protein only? > Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294 I have made some more specific comments on the bug. I this email I have tried to stick to the broader picture. Peter From jfeala at gmail.com Wed May 16 13:25:37 2007 From: jfeala at gmail.com (Jake Feala) Date: Wed, 16 May 2007 10:25:37 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: Message-ID: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Thanks Ed and Yair, I'm really glad there's some interest in this! I'll get started on dusting off my code and adding more documentation. Steve - great suggestion. I had already seen at NetworkX and was already thinking about switching over to this as the back-end graph representation. Are there any issues that I should think about when creating these extra dependencies? Also, what is the next step in this process? Should we agree on an API and class hierarchy before we start dumping code on each other? Which aspects can we make compatible with other Biopython objects? (I was thinking maybe parsers for the interaction datasets and the SQL interface) -Jake On 5/15/07, Steve Lianoglou wrote: > Hi, > > On May 15, 2007, at 3:25 PM, Yair Benita wrote: > > > I would be happy to contribute to this too. > > Currently I have a python script that uses HPRD to generate protein > > protein > > interaction maps. I have deferent filtering methods to display only > > classes > > of proteins or only links to a specific kegg pathway. It will need > > a bit of > > work before I can submit this to CVS. As for drawing the map, I am > > currently > > generating a dot file that can be converted to an image using > > GRAPHVIZ. If > > anyone wants to suggest anything else, please do. > > I've been using NetworkX[1] to play w/ networks/graphs interactively. > You can display them if you have matplotlib installed, and can save > the graphs to dot format as well. > > -steve > > [1] NetworkX: https://networkx.lanl.gov/wiki > > > > > Yair > > > > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > > > >> On 5/15/07, Jake Feala wrote: > >>> Hello Biopython people - > >>> > >>> With all the new research in genome-wide cellular interaction > >>> networks I was a little surprised not to see much support for these > >>> type of data in Biopython. I know that Bioperl has a networks > >>> package > >>> that looks like the kind of thing that I would love to also see in > >>> Python for all the obvious reasons. > >>> > >>> First - has this already been done and I missed it? All I could > >>> find > >>> were a few scattered and application-specific scripts across the > >>> web, > >>> plus the Pathway package in BioPython. > >>> > >>> If not, then would there be any interest in development along these > >>> lines? A while back I wrote a few scripts that parse interaction > >>> datasets, stick them into a MySQL database, and retrieve the > >>> interactions into a Network object that can be used to analyze the > >>> graph of nodes and links. I would be glad to update these to fit > >>> into > >>> the biopython framework, as it would be useful to my own research. > >>> > >>> One caveat is that I am an engineering PhD student and my > >>> programming > >>> skills are mostly self-taught beyond two Java courses, so I might > >>> need > >>> a little guidance in testing and preparing the code for > >>> distribution. > >>> I have only ever written code for my own personal research but I > >>> think > >>> my style is decent and I would love to get better. > >>> > >>> Any opinion or advice? > >> > >> This would interest me too; I'd be glad to have such functionality in > >> BioPython. I can offer you some guidance on Python, packaging and > >> testing, and (if you need it) use of external array packages. > >> > >> -- Ed > >> _______________________________________________ > >> Biopython-dev mailing list > >> Biopython-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From jhackney at stanford.edu Wed May 16 14:10:38 2007 From: jhackney at stanford.edu (Jason A. Hackney) Date: Wed, 16 May 2007 11:10:38 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Message-ID: Hi All, I'm also interested in an interaction network class for biopython. I'm willing to contribute to the effort with either code review or testing. Cheers, Jason Jason A. Hackney Postdoctoral Fellow Department of Microbiology and Immunology Stanford University e-mail: jhackney at stanford.edu lab phone: 650-724-3891 mobile: 650-283-6907 On May 16, 2007, at 10:25 AM, Jake Feala wrote: > Thanks Ed and Yair, I'm really glad there's some interest in this! > I'll get started on dusting off my code and adding more documentation. > > Steve - great suggestion. I had already seen at NetworkX and was > already thinking about switching over to this as the back-end graph > representation. Are there any issues that I should think about when > creating these extra dependencies? > > Also, what is the next step in this process? Should we agree on an > API and class hierarchy before we start dumping code on each other? > Which aspects can we make compatible with other Biopython objects? (I > was thinking maybe parsers for the interaction datasets and the SQL > interface) > > -Jake > > > On 5/15/07, Steve Lianoglou wrote: >> Hi, >> >> On May 15, 2007, at 3:25 PM, Yair Benita wrote: >> >>> I would be happy to contribute to this too. >>> Currently I have a python script that uses HPRD to generate protein >>> protein >>> interaction maps. I have deferent filtering methods to display only >>> classes >>> of proteins or only links to a specific kegg pathway. It will need >>> a bit of >>> work before I can submit this to CVS. As for drawing the map, I am >>> currently >>> generating a dot file that can be converted to an image using >>> GRAPHVIZ. If >>> anyone wants to suggest anything else, please do. >> >> I've been using NetworkX[1] to play w/ networks/graphs interactively. >> You can display them if you have matplotlib installed, and can save >> the graphs to dot format as well. >> >> -steve >> >> [1] NetworkX: https://networkx.lanl.gov/wiki >> >>> >>> Yair >>> >>> >>> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: >>> >>>> On 5/15/07, Jake Feala wrote: >>>>> Hello Biopython people - >>>>> >>>>> With all the new research in genome-wide cellular interaction >>>>> networks I was a little surprised not to see much support for >>>>> these >>>>> type of data in Biopython. I know that Bioperl has a networks >>>>> package >>>>> that looks like the kind of thing that I would love to also see in >>>>> Python for all the obvious reasons. >>>>> >>>>> First - has this already been done and I missed it? All I could >>>>> find >>>>> were a few scattered and application-specific scripts across the >>>>> web, >>>>> plus the Pathway package in BioPython. >>>>> >>>>> If not, then would there be any interest in development along >>>>> these >>>>> lines? A while back I wrote a few scripts that parse interaction >>>>> datasets, stick them into a MySQL database, and retrieve the >>>>> interactions into a Network object that can be used to analyze the >>>>> graph of nodes and links. I would be glad to update these to fit >>>>> into >>>>> the biopython framework, as it would be useful to my own research. >>>>> >>>>> One caveat is that I am an engineering PhD student and my >>>>> programming >>>>> skills are mostly self-taught beyond two Java courses, so I might >>>>> need >>>>> a little guidance in testing and preparing the code for >>>>> distribution. >>>>> I have only ever written code for my own personal research but I >>>>> think >>>>> my style is decent and I would love to get better. >>>>> >>>>> Any opinion or advice? >>>> >>>> This would interest me too; I'd be glad to have such >>>> functionality in >>>> BioPython. I can offer you some guidance on Python, packaging and >>>> testing, and (if you need it) use of external array packages. >>>> >>>> -- Ed >>>> _______________________________________________ >>>> Biopython-dev mailing list >>>> Biopython-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>> >>> >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Wed May 16 18:05:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 16 May 2007 23:05:44 +0100 Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a GenBank file using the SeqIO interface In-Reply-To: <200705161904.l4GJ4pue012542@portal.open-bio.org> References: <200705161904.l4GJ4pue012542@portal.open-bio.org> Message-ID: <464B8038.9020900@maubp.freeserve.co.uk> Hi Howard, I'm replying to the mailing list as you've raised a few more general issues on bug 2294. Peter wrote: >> In the writer class, for your write_file(self, records) method, you allow >> explicitly check for and allow "records" to be a single SeqRecord. Don't. Any >> such "helpfulness" should be done in Bio.SeqIO.write() only, and not the >> individual write_file. Otherwise we'll end up with a situation where some >> writers are "helpful" and others are not. Howard replied: > Currently, the SeqIO's write function is > > def write(sequences, handle, format): > ... > > I can add checks to see if "sequences" is (a) a generator, (b) a SeqRecord > object, or (c) something else. If (a), then call > writer_class(handle).write_file(sequences). If (b), then call > writer_class(handle).write_record(sequences). If (c), spit out an error (for > now). I've added a check in Bio.SeqIO.write() for the "sequences" argument being a SeqRecord (your case b), and if so it now raises a ValueError. This is better than whatever cryptic error would have happened. I agree that it might be "nicer" if Bio.SeqIO.write() would also accept a SeqRecord object as input and did the expected thing, but having a fixed simple API is more straight forward. For comparison, see the previous discussions on the mailing list about having the file argument accepting either a handle or a filename (it was agreed that we would accept handles only). > Ok, so the standard is very exact in what the LOCUS line should be. However, > I've found that many programs do not write Genbank files exactly according to > this standard! So we might want to make the Genbank parser a bit more forgiving > to small changes in the spacing of Locus line, especially since many programs > leave out keywords. Have you got some examples? I would be keen to add a few more test cases for reasonable GenBank variations. > As it stands now, the patch code can handle missing keywords in the LOCUS line. If it doesn't already, the existing code column based code can easily do this too. > For example, the code defines a pair of dictionaries with lambda functions as > their keys > > ... > > I know this looks crazy, but it works really well. Where else but Python can I > have a dictionary / hash / whatever with the key being a function! :) Play > around with the code and you'll see how it works. Crazy code is scary ;) I'll try and have a play with this at the weekend. Note that this issue (parsing the LOCUS line) is a bit tangential to writing GenBank records. >> It also looks like when you write the LOCUS line you are not following >> the column based definition > > I'll fix this. The writing of the Genbank file should follow the standard to > the exactitude. I agree completely - As a general principle we should be a little bit flexible on reading files, but very strict on output. Regards, Peter From chris.lasher at gmail.com Sat May 19 16:21:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 19 May 2007 16:21:03 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <464A07BB.8020206@maubp.freeserve.co.uk> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> Message-ID: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> On 5/15/07, Peter wrote: > Did you get any information from the Open Bioinformatics Foundation guys > about moving from CVS to subversion? I didn't, with regards to public anonymous access to the Subversion repositories. I'm also on impromptu leave until this upcoming Monday, but we'll have this up and running by the end of the month. Chris From O.Doehring at cs.ucl.ac.uk Mon May 21 15:45:36 2007 From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk) Date: 21 May 2007 20:45:36 +0100 Subject: [Biopython-dev] Biopython to parse not only .pdb-files but also NACCESS .asa files Message-ID: Dear community, I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent Accessible Area Calculations ' to calculate two features which are not contained in standard .pdb-files. These two features are atomic accessiblity and van der Waal radius. As can be read in the readme file at http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output files' and at the PDB-Format site at http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'. NACCESS does the following: 'The output format is PDB, with B-factors and occupancies removed, then atomic accessiblity in square Angstroms, followed by the assigned van der Waal radius.' Note that Occupancy gets replaced by atomic accessiblity and B-factor by the van der Waal radius. This 'new' .pdb-file has extension .asa. I chose a quite straight-forward approach: I wanted to use Biopython as before, e.g. calling the B-Factor method but yielding the atomic accessiblity instead. But Biopython seems to type-check the .asa-file and complains that the B-factor is not of type float. Is there a way to access the data of .asa-files programmatically via the Biopython library? The only other way then seems to write a parser for .asa-files and to figure out which atomic element in the .pdb-file corresponds to the respective one in the .asa-file and finally to retrieve the wanted values for atomic accessiblity and van der Waal radius. Here are some more technical details. As an example I chose the '1DHR' protein: ------------------------------------------------------------------------------ def __init__(self,structure_id="1DHR",indices=[ 0]): # which residues are part of the patch self.indices = indices # If 1 (DEFAULT), the exceptions are caught, but some residues or atoms will be missing. # THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE! self.p=PDBParser(PERMISSIVE= 1) # which protein to analyse self.structure_id = structure_id self.fileName = self.structure_id + '.asa' self.structure = self.p.get_structure(self.structure_id, self.fileName) ------------------------------------------------------------------------------ Error message: Traceback (most recent call last): File "C:\Dokumente und Einstellungen\Renate D?hring\workspace\test\src\root\nested\compactness.py", line 249, in c = compact(indices=[0,1]) File "C:\Dokumente und Einstellungen\Renate D?hring\workspace\test\src\root\nested\compactness.py", line 17, in __init__ self.structure = self.p.get_structure(self.structure_id, self.fileName) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in get_structure self._parse(file.readlines()) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in _parse_coordinates bfactor=float(line[60:66]) ValueError: invalid literal for float(): 31 1. ------------------------------------------------------------------------------ I hope this question above was not discussed before but neither the search engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor could I find anything useful via a google search restricted to the archive using the 'site' attribute. What do you recommend for my situation. Many thanks! Yours, Orlando From edschofield at gmail.com Tue May 22 12:57:49 2007 From: edschofield at gmail.com (Ed Schofield) Date: Tue, 22 May 2007 17:57:49 +0100 Subject: [Biopython-dev] [BioPython] Biopython to parse not only .pdb-files but also NACCESS .asa files In-Reply-To: References: Message-ID: <1b5a37350705220957o24f6a436k89d60764729695da@mail.gmail.com> On 21 May 2007 20:45:36 +0100, O.Doehring at cs.ucl.ac.uk wrote: > > ValueError: invalid literal for float(): 31 1. > > ... > > What do you recommend for my situation. Many thanks! Is that a space between 31 and 1? There's your problem. My advice is to insert import pdb pdb.set_trace() at line 159 in PDBParser.py and check out why the columns in your data are misaligned with what PDBParser.py expects. A quick scan of nac_readme.html implies that perhaps you need the -f argument to give you the full output format? But if you need to write your own parser for .asa files, you could use _parse_coordinates(self, coords_trailer) as a template. -- Ed From dalke at dalkescientific.com Sat May 26 06:10:21 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 26 May 2007 12:10:21 +0200 Subject: [Biopython-dev] [Biopython-announce] is this supposed to be really slow? In-Reply-To: References: <20070525233151.GA4507@caltech.edu> Message-ID: (Move this from the -announce to the -dev list) Bryan Smith, replying to Titus Brown wrote: > i did see this constraint for only one request per 3 seconds, but did > not realize each time i went through my loop that this was a separate > request. > is there anything to do about this constraint? In your "search_for" call add delay=0. def search_for(search, reldate=None, mindate=None, maxdate=None, batchsize=100, delay=2, callback_fn=None, start_id=0, max_ids=None): """search_for(search[, reldate][, mindate][, maxdate] [, batchsize][, delay][, callback_fn][, start_id][, max_ids]) -> ids Search PubMed and return a list of the PMID's that match the criteria. search is the search string used to search the database. reldate is the number of dates prior to the current date to restrict the search. mindate and maxdate are the dates to restrict the search, e.g. 2002/01/01. batchsize specifies the number of ids to return at one time. By default, it is set to 10000, the maximum. delay is the number of seconds to wait between queries (default 2). callback_fn is an optional callback function that will be called as passed a PMID as results are retrieved. start_id specifies the index of the first id to retrieve and max_ids specifies the maximum number of id's to retrieve. in your Dictionary creation also add delay=0 class Dictionary: def __init__(self, delay=5.0, parser=None): """Dictionary(delay=5.0, parser=None) Create a new Dictionary to access PubMed. parser is an optional parser (e.g. Medline.RecordParser) object to change the results into another form. If set to None, then the raw contents of the file will be returned. delay is the number of seconds to wait between each query. >> I personally tend to just use the NCBI retrieval URLs directly, but >> that's kind of ugly. NCBI also watches those requests, and if you do too many you might get a warning or be blocked off, or so rumor has it. BTW, in your original code you can simplify > for idx in range( len( termIds ) ): > pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ] > ].publication_date[ 0:4 ] ) > idx = idx + 1 to for idx, termId in enumerate(termIds): pubDates[idx] = int(medlineDict[termId]].publication_date[:4]) Andrew dalke at dalkescientific.com From chris.lasher at gmail.com Thu May 31 00:30:38 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 31 May 2007 00:30:38 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> Message-ID: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> On 5/19/07, Chris Lasher wrote: > On 5/15/07, Peter wrote: > > Did you get any information from the Open Bioinformatics Foundation guys > > about moving from CVS to subversion? > > I didn't, with regards to public anonymous access to the Subversion > repositories. I'm also on impromptu leave until this upcoming Monday, > but we'll have this up and running by the end of the month. > > Chris > I'm obviously missing another target, and BOSC 2007 is fast approaching. I'm being held up by 4 files that are in the CVS repository that were foolishly committed with carriage returns (i.e., "\r") in the filenames. How that's possible, I have no clue, but I need to alter the data in the CVS repository so those filenames are correct, or otherwise completely removed, over the entire history of those files. Does anyone have any experience with the internals of CVS repositories? I definitely do not. Chris From biopython-dev at maubp.freeserve.co.uk Thu May 31 05:07:59 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 31 May 2007 10:07:59 +0100 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> Message-ID: <465E906F.1080704@maubp.freeserve.co.uk> Chris Lasher wrote: > On 5/19/07, Chris Lasher wrote: >> On 5/15/07, Peter wrote: >>> Did you get any information from the Open Bioinformatics Foundation guys >>> about moving from CVS to subversion? >> I didn't, with regards to public anonymous access to the Subversion >> repositories. I'm also on impromptu leave until this upcoming Monday, >> but we'll have this up and running by the end of the month. >> >> Chris >> > > I'm obviously missing another target, and BOSC 2007 is fast > approaching. Are you going to BOSC 2007 Chris? > I'm being held up by 4 files that are in the CVS > repository that were foolishly committed with carriage returns (i.e., > "\r") in the filenames. How that's possible, I have no clue, but I > need to alter the data in the CVS repository so those filenames are > correct, or otherwise completely removed, over the entire history of > those files. Does anyone have any experience with the internals of CVS > repositories? I definitely do not. How strange! I have no experience with the internals of CVS so can't help you there. What are the four offending files? Maybe we could just purge them for the move to SVN. Also, I suspect (but have not checked this) that a few of the examples files in the unit tests have been checked in as binary files rather than text (due to some odd differences in new lines across platforms). Again, a CVS expert would probably be able to generate a list of all "binary" files in the repository fairly easily. Peter From bugzilla-daemon at portal.open-bio.org Thu May 31 09:14:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:14:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311314.l4VDEV2X031189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:14 EST ------- Created an attachment (id=661) --> (http://bugzilla.open-bio.org/attachment.cgi?id=661&action=view) Updated version of Bio/Cluster/cluster.c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 31 09:15:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:15:17 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311315.l4VDFH6D031294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:15 EST ------- Created an attachment (id=662) --> (http://bugzilla.open-bio.org/attachment.cgi?id=662&action=view) Updated version of Bio/Cluster/clustermodule.c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 31 09:17:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:17:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311317.l4VDHVB7031418@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:17 EST ------- Could you try with the updated Bio/Cluster/cluster.c, Bio/Cluster/clustermodule.c (see attachments)? These should solve the problems with the Cluster unit test. If they work fine, I'll upload them to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jfeala at gmail.com Thu May 31 19:52:36 2007 From: jfeala at gmail.com (Jake Feala) Date: Thu, 31 May 2007 16:52:36 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Message-ID: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> Hi Everybody - I've been thinking about the possible structure of a "BioNet" package, and here is what I think would be most useful: InteractionRecord.py - a storage object for biological interactions, mirroring information stored in the PSI-MI (Proteomics Standards Initiative - Molecular Interaction) XML standard - unless someone knows a better one. Network.py - network object inheriting a NetworkX graph class with additional methods for manipulating an InteractionRecord stored with each edge InteractionIO - a submodule with parsers to read and write interactions to/from Cytoscape, PSI-MI, and other formats or online interaction databases BioNetSQL - a submodule for storing and querying to a local SQL database of interactions I've started on the code, including parsers for Cytoscape, PSI-MI XML files, and GRID flat files. I haven't fixed up my SQL scripts yet because I want to rethink the database design. All the code is available at http://cmrg.ucsd.edu/JakeFeala#software Here is an example that worked fine for me: from Network import * f = open() parser = GRIDIterator(f): net = create_network() net.load(parser) Are there any suggestions, regarding (1) the standard for InteractionRecord, (2) methods for the Network object, (3) structure of the SQL database, (4) overall structure of the package? Also, does anyone want to contribute to any specific part (e.g. Yair can add his HPRD parser)? Thanks! -Jake On 5/16/07, Jason A. Hackney wrote: > Hi All, > > I'm also interested in an interaction network class for biopython. I'm > willing to contribute to the effort with either code review or testing. > > Cheers, > > Jason > > > > Jason A. Hackney > > Postdoctoral Fellow > Department of Microbiology and Immunology > Stanford University > > e-mail: jhackney at stanford.edu > lab phone: 650-724-3891 > mobile: 650-283-6907 > > > > > > On May 16, 2007, at 10:25 AM, Jake Feala wrote: > > Thanks Ed and Yair, I'm really glad there's some interest in this! > I'll get started on dusting off my code and adding more documentation. > > Steve - great suggestion. I had already seen at NetworkX and was > already thinking about switching over to this as the back-end graph > representation. Are there any issues that I should think about when > creating these extra dependencies? > > Also, what is the next step in this process? Should we agree on an > API and class hierarchy before we start dumping code on each other? > Which aspects can we make compatible with other Biopython objects? (I > was thinking maybe parsers for the interaction datasets and the SQL > interface) > > -Jake > > > On 5/15/07, Steve Lianoglou wrote: > Hi, > > On May 15, 2007, at 3:25 PM, Yair Benita wrote: > > > I would be happy to contribute to this too. > Currently I have a python script that uses HPRD to generate protein > protein > interaction maps. I have deferent filtering methods to display only > classes > of proteins or only links to a specific kegg pathway. It will need > a bit of > work before I can submit this to CVS. As for drawing the map, I am > currently > generating a dot file that can be converted to an image using > GRAPHVIZ. If > anyone wants to suggest anything else, please do. > > I've been using NetworkX[1] to play w/ networks/graphs interactively. > You can display them if you have matplotlib installed, and can save > the graphs to dot format as well. > > -steve > > [1] NetworkX: https://networkx.lanl.gov/wiki > > > > Yair > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > > > On 5/15/07, Jake Feala wrote: > Hello Biopython people - > > With all the new research in genome-wide cellular interaction > networks I was a little surprised not to see much support for these > type of data in Biopython. I know that Bioperl has a networks > package > that looks like the kind of thing that I would love to also see in > Python for all the obvious reasons. > > First - has this already been done and I missed it? All I could > find > were a few scattered and application-specific scripts across the > web, > plus the Pathway package in BioPython. > > If not, then would there be any interest in development along these > lines? A while back I wrote a few scripts that parse interaction > datasets, stick them into a MySQL database, and retrieve the > interactions into a Network object that can be used to analyze the > graph of nodes and links. I would be glad to update these to fit > into > the biopython framework, as it would be useful to my own research. > > One caveat is that I am an engineering PhD student and my > programming > skills are mostly self-taught beyond two Java courses, so I might > need > a little guidance in testing and preparing the code for > distribution. > I have only ever written code for my own personal research but I > think > my style is decent and I would love to get better. > > Any opinion or advice? > > This would interest me too; I'd be glad to have such functionality in > BioPython. I can offer you some guidance on Python, packaging and > testing, and (if you need it) use of external array packages. > > -- Ed > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From bugzilla-daemon at portal.open-bio.org Tue May 1 12:01:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 08:01:49 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011201.l41C1nXg017300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-05-01 08:01 EST ------- Chris, I was not able to replicate this bug on any of the platforms I've tried so far (Windows 32-bits, Mac OS X, Unix, Linux). However, since it does occur on your system, I still feel that this is a true bug that should be fixed. Would you be willing to compile and run some test cases on your platform to find the source of this problem? One possibility is that the k-means algorithm gets stuck in an infinite (periodic) loop in which genes are assigned back and forth between clusters. I thought that with the current implementation, that was no longer possible, but maybe there is some case that I overlooked. Since the k-means algorithm starts from a random initial state, maybe on your platform starts from some funny initial state that doesn't appear on the other platforms, causing this bug to appear on your platform only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 18:31:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 14:31:06 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011831.l41IV6ZU000918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #9 from chris.lasher at gmail.com 2007-05-01 14:31 EST ------- I'd definitely be willing to run any tests. Just to note, I am not the one who discovered this bug, I was only the one who filed it. Credit for discovering it goes to Alex Lancaster who sent in notification of this on April 11th to the BioPython mailing list (see ). That was on a Fedora Core installation, so this is not just specific to 32-bit Ubuntu. Could this involve the source of the Numeric and mxTextTools packages? I installed Numeric Python and eGenix mxTextTools from the Ubuntu distribution packages, rather than from direct sources for both software packages. I can't see why this would make a difference but it is something to consider. Also, there's a possibility that I don't have all the required software, but I did not get any warnings when installing from CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 18:48:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 14:48:35 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011848.l41ImZOB001726@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 14:48 EST ------- Checking the version of Numeric may be worth while - I recall from the MMTK mailing list that some versions appeared to cause subtle bugs. In late 2005 Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version 23, but I don't know if he ever pinned down what the problem was (or indeed, if there really was a problem). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 19:06:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 15:06:30 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011906.l41J6UED002525@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #11 from chris.lasher at gmail.com 2007-05-01 15:06 EST ------- (In reply to comment #10) > Checking the version of Numeric may be worth while - I recall from the MMTK > mailing list that some versions appeared to cause subtle bugs. In late 2005 > Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version > 23, but I don't know if he ever pinned down what the problem was (or indeed, if > there really was a problem). > On Dapper Drake, Edgy Eft and Feisty Fawn, the Numeric packages are 24.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 1 19:50:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 May 2007 15:50:47 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705011950.l41JolgE004634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 15:50 EST ------- For reference, on my 64bit Ubuntu Dapper Drake system (where test_Cluster.py works) I have the following packages installed: python 2.4.2-0ubuntu3 python-reportlab 1.20debian-3ubuntu1 python-numeric 24.2-1ubuntu2 python-egenix-mxtexttools 2.0.6ubuntu1-1ubuntu4 i.e. Numeric 24.2 does work with test_Cluster.py for me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 2 18:44:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 May 2007 14:44:01 -0400 Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with alignments like Bio.SeqIO does sequences In-Reply-To: Message-ID: <200705021844.l42Ii154024905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2285 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-02 14:44 EST ------- Created an attachment (id=643) --> (http://bugzilla.open-bio.org/attachment.cgi?id=643&action=view) ZIP file containing four python scripts to go in Bio/AlignIO/*.py There is a follow up patch to Bio/SeqIO/__init__.py to basically use Bio.AlignIO for reading/writing clustal, stockholm and phylip instead. The corresponding parsers under Bio/SeqIO/*.py would then be removed. I have not yet worked out what a Nexus file looks like when it holds more than one alignment (if in fact this is possible). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 4 09:20:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 May 2007 05:20:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705040920.l449KVW3015656@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-05-04 05:20 EST ------- Chris, I found one Linux system on which test_Cluster.py hangs in the call to kmedoids instead of the call to kcluster. It turned out that this was due to a floating-point comparison in the kmedoids function. Since the same comparison occurs in the kcluster function, this may very well be the reason test_Cluster.py hangs on your platform in the call to kcluster. The comparison involves two floating-point variables which are bit-wise identical to each other. However, variable1 <= variable2 returns False. Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release 1.43) and print out the two variables "total" and "previous"? (You may find that test_Cluster.py no longer hangs when you add the printf statement; at least that is what happened with the call to kmedoids). If total and previous have the same value, but total>=previous returns False, then that would explain why the call to kcluster hangs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon May 7 14:16:38 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 07 May 2007 15:16:38 +0100 Subject: [Biopython-dev] Unified alignment input/output, Bio.AlignIO? In-Reply-To: <463240F9.8010907@maubp.freeserve.co.uk> References: <463240F9.8010907@maubp.freeserve.co.uk> Message-ID: <463F34C6.90008@maubp.freeserve.co.uk> Peter wrote: > Following the release of Biopython 1.43 with Bio.SeqIO, I would like to > do a better job for multiple sequence alignment file formats - creating > a new module Bio.AlignIO > > While most multiple sequence alignment files usually contain a single > alignment (made up of multiple sequences), this is not the general case. > > In the PHYLIP suite, concatenated alignments in phylip format are > produced by the seqboot program for tasks like bootstrapping of a > phylogenetic tree. Currently SeqIO chokes on these! > > Another example is the output of some the EMBOSS programs can contain > many multiple sequences alignments, for example the water and needle > tools can produce many pairwise alignments. > > In such cases, being able to write code like the following seems to be > the logical extension of the Bio.SeqIO style we have agreed on: > > from Bio import AlignIO > for alignment in AlignIO.parse("many.phy", "phylip") : > print "Alignment with %i sequences of length %i" \ > % (len(alignment.get_all_seqs()), > alignment.get_alignment_length()) > ... > > i.e. The AlignIO.parse() function would be an iterator returning > alignment objects. Does this sound reasonable so far? I have pressed ahead with this, there is a version attached to bug 2285 http://bugzilla.open-bio.org/show_bug.cgi?id=2285 This handles reading and writing of clustal, phylip, stockholm/pfam. I have not yet converted the Bio.SeqIO Nexus parser. Also, I plan to add a parser for reading the EMBOSS alignment format. As a side effect, this will actually remove a lot of the Bio.SeqIO code as handling any alignment file can be delegated to Bio.AlignIO instead. Would anyone like to comment on the scheme? Peter From bugzilla-daemon at portal.open-bio.org Mon May 7 17:45:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 13:45:32 -0400 Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with alignments like Bio.SeqIO does sequences In-Reply-To: Message-ID: <200705071745.l47HjWGl031779@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2285 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #643 is|0 |1 obsolete| | AssignedTo|biopython-dev at biopython.org |biopython- | |bugzilla at maubp.freeserve.co. | |uk Status|NEW |ASSIGNED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-07 13:45 EST ------- Created an attachment (id=646) --> (http://bugzilla.open-bio.org/attachment.cgi?id=646&action=view) ZIP file containing four python scripts to go in Bio/AlignIO/*.py Misc updates to previous version -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 19:42:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 15:42:15 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705071942.l47JgFi3004609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #14 from chris.lasher at gmail.com 2007-05-07 15:42 EST ------- Created an attachment (id=648) --> (http://bugzilla.open-bio.org/attachment.cgi?id=648&action=view) modified_Cluster.c_output.txt This is output from Cluster.c modified with a printf statement prior to line 2071 for total and previous. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 19:56:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 15:56:27 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705071956.l47JuRcu005295@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #15 from chris.lasher at gmail.com 2007-05-07 15:56 EST ------- (In reply to comment #13) > Chris, > > Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release > 1.43) and print out the two variables "total" and "previous"? (You may find > that test_Cluster.py no longer hangs when you add the printf statement; at > least that is what happened with the call to kmedoids). If total and previous > have the same value, but total>=previous returns False, then that would explain > why the call to kcluster hangs. > This did allow it to proceed up to test_distancematrix_kmedoids, however, it once again reaches an infinite loop in this test. Additionally, the value for "previous" reaches an enourmous number and I suspect it's not supposed to. (See the attached output.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 7 23:06:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 May 2007 19:06:19 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705072306.l47N6JR3012976@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2007-05-07 19:06 EST ------- Thanks, Chris! Actually, this looks OK. The kcluster routine runs the k-means algorithm 100 times starting from random initial clusterings. On each run, total is initialized to DBL_MAX (the largest number representable as a double). This is the huge number that is printed (printf usually has problems to print DBL_MAX nicely, so it may appear weird in the output). The same floating-point comparison that causes kcluster to hang also appears in kmedoids, so it's no surprise that the code hangs there too. I'll write a patch that avoids this floating-point comparison and post it here. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 13:48:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 09:48:11 -0400 Subject: [Biopython-dev] [Bug 2289] New: LOCUS ss-cRNA => ERROR Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2289 Summary: LOCUS ss-cRNA => ERROR Product: Biopython Version: 1.24 Platform: PC OS/Version: Windows XP Status: NEW Severity: blocker Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: Daniel.Nicorici at gmail.com When I am processing a GenBank file from NCBI I get this error: ======================================================================= Traceback (most recent call last): File "F:\silvermine\tool\populator\ncbigenomic\source\python\do.py", line 26, in record = iterator.next() File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 142, in nex t return self._parser.parse(self.handle) File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 208, in par se self._scanner.feed(handle, self._consumer) File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 782, in _fee d_first_line 'LOCUS line does not contain valid sequence type (DNA, RNA, ...):\n' + line AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA, ...): LOCUS NC_005236 1769 bp ss-cRNA linear VRL 26-FEB-2007 ================================================================================ It seems that the error comes from the parser who is not able to handle ss-cRNA. If I replace ss-cRNA with ss-RNA then is no error anymore. Here is my python program which gives the error: =========================================================== import glob from Bio import GenBank # the files which will be processed path="G:\\Data\\NCBI\\genomic\\gbff\\temp\\complete*.genomic.gbff" print "Starting..." organism=[] count_organism=[] feature=[] count_feature=[] qualifier=[] count_qualifier=[] files = glob.glob(path) for file in files: print ">>>>>>>>>>>>>>>>>>>>>>>>>> " + file + " <<<<<<<<<<<<<<<<<<<<<<<<<" parser = GenBank.RecordParser() #infile = open("complete1short.genomic.gbff") infile = open(file); iterator = GenBank.Iterator(infile, parser) record = iterator.next() while record is not None: print record.locus + " --- " + record.organism + " --- " + record.version # organism flag=0 for b in range(len(organism)): if organism[b]==record.organism: count_organism[b]=count_organism[b]+1 flag=1 break if flag==0: organism.append(record.organism) count_organism.append(1) # features for a in range(len(record.features)): flag=0 for b in range(len(feature)): if feature[b]==record.features[a].key: count_feature[b]=count_feature[b]+1 flag=1 break if flag==0: feature.append(record.features[a].key) count_feature.append(1) #print "--" + record.features[i].key # qualifiers for c in range(len(record.features[a].qualifiers)): flag=0 for b in range(len(qualifier)): if qualifier[b]==record.features[a].qualifiers[c].key: count_qualifier[b]=count_qualifier[b]+1 flag=1 break if flag==0: qualifier.append(record.features[a].qualifiers[c].key) count_qualifier.append(1) #print "----" + record.features[i].qualifiers[j].key record=iterator.next() print "===================ORGANISM========================" for i in range(len(organism)): print organism[i] + "\t" + str(count_organism[i]) print "===================END_ORGANISM====================" print "===================FEATURES========================" for i in range(len(feature)): print feature[i] + "\t" + str(count_feature[i]) print "===================END_FEATURES====================" print "===================QUALIFIERS========================" for i in range(len(qualifier)): print qualifier[i] + "\t" + str(count_qualifier[i]) print "===================END_QUALIFIERS====================" print "The End!!!" x=raw_input("Press ENTER to continue...") ============================================================ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 14:06:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:06:32 -0400 Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR In-Reply-To: Message-ID: <200705091406.l49E6WGi008294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:06 EST ------- Confirmed: the parser currently only accepts entries 'DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'. Could you tell me where you got this GenBank file from? It would be helpful for testing (and I may want to add a similar example to the test suite). Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 14:25:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:25:59 -0400 Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR In-Reply-To: Message-ID: <200705091425.l49EPxNf009285@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 ------- Comment #2 from Daniel.Nicorici at gmail.com 2007-05-09 10:25 EST ------- Hello, The entry ss-cRNA appears in the file: ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 14:48:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:48:30 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091448.l49EmU9D010377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |normal Status|ASSIGNED |RESOLVED OS/Version|Windows XP |All Platform|PC |All Resolution| |FIXED Summary|LOCUS ss-cRNA => ERROR |Can't parse GenBank files | |with "ss-cRNA" in the LOCUS | |line Version|1.24 |Not Applicable ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:48 EST ------- See also Bug 2231. With hindsight checking against a known list of sequences types was too harsh. It now just looks for the text "DNA" or "RNA" within this field of the LOCUS line in GenBank files. I've checked in a fix to CVS, and checked I can parse GenBank file NC_005236 The simplest way to update your machine Daniel is to download and replace the file D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py with revision 1.11 from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython There will be a slight time delay before the CVS web site updates itself - you can of course get the file sfrom CVS directly if you would rather. Please let us know (on this bug) if that doesn't solve this problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 14:51:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:51:05 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091451.l49Ep5HC010511@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:51 EST ------- P.S. I have not tried the full file from here, as the FTP site was timing out. ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz (15 MB) I just tried the single GenBank record for NC_005236 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 9 14:57:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 May 2007 10:57:06 -0400 Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA" in the LOCUS line In-Reply-To: Message-ID: <200705091457.l49Ev6L0010785@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2289 ------- Comment #5 from Daniel.Nicorici at gmail.com 2007-05-09 10:57 EST ------- Here is the part of the file that generates the error: ======================================================================= LOCUS NC_005236 1769 bp ss-cRNA linear VRL 20-FEB-2007 DEFINITION Seoul virus strain 80-39 segment S, complete sequence. ACCESSION NC_005236 VERSION NC_005236.1 GI:38505529 PROJECT GenomeProject:15027 KEYWORDS . SOURCE Seoul virus ORGANISM Seoul virus Viruses; ssRNA negative-strand viruses; Bunyaviridae; Hantavirus. REFERENCE 1 (bases 1 to 1769) AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J. TITLE Genetic analysis of the full length of S segment of Seoul virus prototype, 80-39 strain JOURNAL Unpublished REFERENCE 2 (bases 1 to 1769) CONSRTM NCBI Genome Project TITLE Direct Submission JOURNAL Submitted (12-AUG-2004) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA REFERENCE 3 (bases 1 to 1769) AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J. TITLE Direct Submission JOURNAL Submitted (09-APR-2003) Department of Microbiology, College of Medicine, Korea University, 5-ka, Anam-dong, Sungbuk-ku, Seoul 136-705, Korea COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AY273791. COMPLETENESS: full length. FEATURES Location/Qualifiers source 1..1769 /organism="Seoul virus" /mol_type="viral cRNA" /strain="80-39" /isolation_source="Rattus norvegicus" /db_xref="taxon:11608" /segment="segment S" /country="South Korea" gene 43..1332 /locus_tag="SEOVsSgp1" /db_xref="GeneID:2943086" CDS 43..1332 /locus_tag="SEOVsSgp1" /codon_start=1 /product="nucleocapsid protein" /protein_id="NP_942556.1" /db_xref="GI:38505530" /db_xref="GeneID:2943086" /translation="MATMEEIQREISAHEGQLVIARQKVKDAEKQYEKDPDDLNKRAL HDRESVAASIQSKIDELKRQLADRIAAGKNIGQDRDPTGVEPGDHLKERSALSYGNTL DLNSLDIDEPTGQTADWLTIIVYLTSFVVPIILKALYMLTTRGRQTSKDNKGMRIRFK DDSSYEDVNGIRKPKHLYVSMPNAQSSMKAEEITPGRFRTAVCGLYPAQIKARNMVSP VMSVVGFLALAKDWTSRIEEWLGAPCKFMAESPIAGSLSGNPVNRDYIRQRQGALAGM EPKEFQALRQHSKDAGCTLVEHIESPSSIWVFAGAPDRCPPTCLFVGGMAELGAFFSI LQDMRNTIMASKTVGTADEKLRKKSSFYQSYLRRTQSMGIQLDQRIIVMFMVAWGKEA VDNFHLGDDMDPELRSLAQILIDQKVKEISNQEPMKL" ORIGIN 1 tagtagtaga ctccctaaag agctactcca ctaacaagag aaatggcaac tatggaggaa 61 atccagagag aaatcagtgc tcacgagggg cagcttgtga tagcacgcca gaaggtcaag 121 gatgcagaaa agcagtatga gaaggatcct gatgacttaa acaagagggc actgcatgat 181 cgggagagtg tcgcagcttc aatacaatca aaaattgatg aactgaagcg ccaacttgcc 241 gacaggattg cagcagggaa gaacatcggg caagaccggg atcctacagg ggtagagccg 301 ggtgatcatc tcaaggaaag atcagcacta agctacggga atacactgga cctgaatagt 361 cttgacattg atgaacctac aggacaaaca gctgattggc tgactataat tgtctatcta 421 acatcattcg tggtcccgat catcttgaag gcactgtaca tgttaacaac aagaggtagg 481 cagacttcaa aggacaacaa ggggatgagg atcagattca aggatgacag ctcatatgag 541 gatgtcaatg ggatcagaaa gcctaaacat ctgtatgtgt caatgccaaa cgcccaatcc 601 agtatgaagg ctgaagagat aacaccagga agattccgca ctgcagtatg tgggctatat 661 cctgcacaga taaaggcaag gaatatggta agccctgtca tgagtgtagt tgggtttttg 721 gcactagcaa aagactggac atctagaatt gaagaatggc ttggcgcacc ctgcaagttc 781 atggcagagt ctcctattgc tgggagttta tctgggaatc ctgtgaatcg tgactatatc 841 agacaaagac aaggtgcact tgcagggatg gagccaaagg aatttcaagc cctcaggcaa 901 cattcaaagg atgctggatg tacactagtt gaacatattg agtcaccatc gtcaatatgg 961 gtgtttgctg gggcccctga taggtgtcca ccaacatgct tgtttgttgg agggatggct 1021 gagttaggtg ccttcttttc tatacttcag gatatgagga acacaatcat ggcttcaaaa 1081 actgtgggca cagctgatga aaagcttcga aagaaatcat cattctatca atcatacctc 1141 agacgcacac aatcaatggg aatacaactg gaccagagga taattgttat gtttatggtt 1201 gcctggggaa aggaggcagt ggacaacttc catctcggtg atgacatgga tccagagctt 1261 cgtagcctgg ctcagatctt gattgaccag aaagtgaagg aaatctcgaa ccaggagcct 1321 atgaaattat aagcacataa atatgtaatc aatactaact ataggttaag aaatactaat 1381 cattagttaa taagaataca gatttattga ataatcatat taaataatta ggtaagttaa 1441 atattattta gttaagttag ctaattgatt tatatgatta tcacaattga atgtaatcat 1501 aagcacaatc actgccatgt ataatcacgg gtatacgggt ggttttcata tggggaacag 1561 ggtgggctta gggccaggtc accttaagtg accttttttt gtatatatgg atgtagattt 1621 caattgatcg aatactaatc ctactgtcct cttttctttt cctttctcct tctttactaa 1681 caacaacaaa ctacctcaca accttctacc tcaatatata ctacctcatt aagttgtttc 1741 cttttgtctt tttagggagt ctactacta // ======================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From stephen at blackrim.net Wed May 9 15:58:16 2007 From: stephen at blackrim.net (Stephen A Smith) Date: Wed, 09 May 2007 11:58:16 -0400 Subject: [Biopython-dev] [Off Topic] Google Group Message-ID: <4641EF98.90504@blackrim.net> Hi all, Just letting you know there is a google group open now for discussions of all thing programming and evolutionary biology. You can find it here http://groups.google.com/group/evo_code. Figured the people at bio* might be interested. Take care Stephen Smith -- Dept. Ecology and Evolutionary Biology Yale University http://www.blackrim.org -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GS dpu s+: a- C++++ UL++++ P--- L++++ E--- W+++ N-- o-- K++++ w--- O- M-- V- PS+++ PE-- Y++ PGP++ t-- 5 X++ R-- tv++ b++++ DI+ D++ G++ e+++ h--- r+++ y+++ ------END GEEK CODE BLOCK------ From bugzilla-daemon at portal.open-bio.org Thu May 10 12:59:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 08:59:07 -0400 Subject: [Biopython-dev] [Bug 2290] New: Not reading 1YVE.pdb Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2290 Summary: Not reading 1YVE.pdb Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: proszek at gmail.com biopython 1.42 fails to read 1YVE.pdb file, although it reads test.pdb created by: awk '{if($1=="ATOM"){print}}' 1YVE.pdb line 8610 is a HETATM line Traceback below (where file=sys.argv[1]=1YVE.pdb) WARNING: Chain J is discontinuous at line 8610. Traceback (most recent call last): File "./wezly.py", line 122, in ? b=Protein(sys.argv[1]) File "./wezly.py", line 15, in __init__ self.struct=self.parser.get_structure('X',file) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 66, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 87, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 179, in _parse_coordinates structure_builder.init_residue(resname, hetero_flag, resseq, icode) File "/usr/lib/python2.4/site-packages/Bio/PDB/StructureBuilder.py", line 155, in init_residue self.chain.add(residue) File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 80, in add raise PDBConstructionException, "%s defined twice" % entity.get_full_id() File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 132, in get_full_id parent=self.get_parent() File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 102, in get_parent raise PDBException, 'No parent' PDBException: No parent -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 13:53:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 09:53:05 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705101353.l4ADr544030572@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 09:53 EST ------- Where did you get your 1YVE.pdb file from? Directly from the PDB? Just as a remark, the "PDBException: No parent" is not the problem. The error is further back, PDBConstructionException, "??? defined twice", and when Bio.PDB tries to get the identity of the problem residue it falls over. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 14:06:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 10:06:21 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705101406.l4AE6LRW031886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 10:06 EST ------- Which version of Biopython do you have? The "no parent bug" was fixed as bug 1936, make sure you have Biopython 1.43 or later. Mine installation of Biopython works but spits out a LOT of PDBConstructionException warnings about multiply defined water atoms (aka "Residue HOH"). Looking at the raw PDB file, there is a problem with multiply defined waters. As you can see below, the identifier jumps from 799 back to 1 (i.e. there are two waters with residue number 1). ... HETATM16581 O HOH 793 36.450 15.564 -9.023 1.00 39.79 O HETATM16582 O HOH 794 33.448 13.711 -11.019 1.00 40.42 O HETATM16583 O HOH 796 28.414 11.908 -16.047 1.00 48.15 O HETATM16584 O HOH 797 29.445 8.114 -11.059 1.00 55.49 O HETATM16585 O HOH 799 28.383 5.173 -8.998 1.00 33.85 O HETATM16586 O HOH 1 26.615 4.599 -6.718 1.00 24.95 O HETATM16587 O HOH 2 23.353 4.948 -7.137 1.00 34.47 O HETATM16588 O HOH 3 17.401 11.710 0.938 1.00 35.16 O HETATM16589 O HOH 4 21.326 11.092 8.215 1.00 22.51 O HETATM16590 O HOH 5 13.703 2.159 11.421 1.00 24.87 O ... Are you happy for me to mark this as a duplicate of bug 1936 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 15:07:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 11:07:56 -0400 Subject: [Biopython-dev] [Bug 2291] New: __init__.py missing in the Bio.PDB.mmCIF folder after the install Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2291 Summary: __init__.py missing in the Bio.PDB.mmCIF folder after the install Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: normal Priority: P2 Component: Website AssignedTo: biopython-dev at biopython.org ReportedBy: jean.lechner at gmail.com When you install Biopython you musst uncoment some lines in the setup.py file But at the end of the instalation the __init__.py file ils not created in the Bio.PDB.mmCIF directory So you cannot use MMCIFParser or MMCIF2Dict because biopython cannot import MMCIFlex from Bio.PDB.mmCIF -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 10 15:08:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 May 2007 11:08:48 -0400 Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF folder after the install In-Reply-To: Message-ID: <200705101508.l4AF8mQf003465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2291 jean.lechner at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jean.lechner at gmail.com -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Thu May 10 17:18:23 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Thu, 10 May 2007 10:18:23 -0700 Subject: [Biopython-dev] Biopython talk at BOSC 2007? Message-ID: <464353DF.6030700@burnham.org> Anybody giving a talk on Biopython? We can get a 20-30 minute slot in Vienna, but someone has to show up and talk. Personally, I will actually be there for the ISMB SIGs, but as I am running my own conference, it will be a bit of a strain to talk at BOSC. However, the main reason I do not want to speak is that there are people much more deserving here. So if anyone plans to be at ISMB 2007 in any case, and wishes to represent Biopython with serpentine honor, contact Darin. Best, Iddo -------- Original Message -------- Subject: BOSC 2007 Second Call For Papers Date: Thu, 10 May 2007 12:17:41 -0400 From: darin.london at duke.edu To: biopython-owner at lists.open-bio.org The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From bugzilla-daemon at portal.open-bio.org Sun May 13 20:30:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 May 2007 16:30:10 -0400 Subject: [Biopython-dev] [Bug 2292] New: Bio.PDBIO writes TER records without any required fields Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2292 Summary: Bio.PDBIO writes TER records without any required fields Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: misiek at genesilico.pl Bio.PDBIO is happy to write TER records as "TER\n", which is inconsistent with PDB format specification. The PDB format requires that TER records have some fields similar to ATOM records: '''The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER.''' [See http://www.wwpdb.org/documentation/format23/sect9.html#TER] It leads to problem with programs that require correct TER records (like multiple structural alignment program MUSTANG), and crash when they are not found. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun May 13 20:31:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 May 2007 16:31:18 -0400 Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any required fields In-Reply-To: Message-ID: <200705132031.l4DKVIP9008944@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2292 ------- Comment #1 from misiek at genesilico.pl 2007-05-13 16:31 EST ------- Created an attachment (id=652) --> (http://bugzilla.open-bio.org/attachment.cgi?id=652&action=view) Proposed patch to PDBIO.py This is a simple fix. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Mon May 14 16:27:42 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 14 May 2007 09:27:42 -0700 Subject: [Biopython-dev] Subject: BOSC 2007 2nd Call For Papers. Message-ID: <46488DFE.3070908@burnham.org> The BOSC Organizing Committee are proud to announce BOSC 2007, occurring in Vienna, Austria on July 19th, 20th. The conference this year promises to be exciting, as the BOSC developers attempt to define and solve currently intractable problems in Bioinformatics. Please refer to the following website for complete information, and requests for submissions. Thank you, and we hope to see you in Vienna. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From idoerg at gmail.com Mon May 14 16:28:36 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 14 May 2007 09:28:36 -0700 Subject: [Biopython-dev] BOSC 2007 Abstract Submission Deadline Extension Message-ID: <46488E34.8000604@burnham.org> Subject: BOSC 2007 Abstract Submission Deadline Extension Due to technical difficulties in sending out the 2nd call for papers, the BOSC organizers are extending the deadline for abstract submissions to Monday May 21st. The announcement day will remain the same so that it remains before the Early Discount Date. http://open-bio.org/wiki/BOSC_2007 The BOSC organizing Committee Please pass this email on to anyone that would be interested. -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA T: +1 858 646 3100 x3516 wengophone: idoerg http://iddo-friedberg.org http://2007.BioFunctionPrediction.org From bugzilla-daemon at portal.open-bio.org Mon May 14 22:18:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 May 2007 18:18:47 -0400 Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb In-Reply-To: Message-ID: <200705142218.l4EMIlwD008110@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2290 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE Version|Not Applicable |1.42 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-14 18:18 EST ------- *** This bug has been marked as a duplicate of bug 1936 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon May 14 22:29:05 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 14 May 2007 23:29:05 +0100 Subject: [Biopython-dev] Bugzilla Version Numbers In-Reply-To: <46273FB4.4030805@maubp.freeserve.co.uk> References: <128a885f0704102204p2872f42fh685919bb8b4656c3@mail.gmail.com> <46273FB4.4030805@maubp.freeserve.co.uk> Message-ID: <4648E2B1.5040706@maubp.freeserve.co.uk> Peter wrote: > Chris Lasher wrote: >> Hi all, >> >> Does anybody active with Biopython have administrative capabilities >> for the project's Bugzilla tracker? The version numbers are a wee >> bit out of date. > > They are, aren't they! I asked on the list last month about this, > and updating the component fields too: > > http://lists.open-bio.org/pipermail/biopython-dev/2007-March/002652.html > > As no-one on the list has come forward, I guess one of us should get > in touch with the relevant Open Bio people, probably by emailing > "support" at the domain helpdesk.open-bio.org > > Who needs/wants bugzilla admin rights? I've been in touch with Jason Stajich and he has done some magic: Michiel and I can now creategroups, editclassifications, editcomponents, editkeywords. I think that's all we need? I have initially added 1.42 and 1.43 to the version field for Biopython in bugzilla. I would also propose we have a few new components, such as PDB, Nexus and SeqIO (or perhaps rather than SeqIO something more general like sequence parsing). Peter From jfeala at gmail.com Tue May 15 16:42:30 2007 From: jfeala at gmail.com (Jake Feala) Date: Tue, 15 May 2007 09:42:30 -0700 Subject: [Biopython-dev] interaction networks in biopython Message-ID: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> Hello Biopython people - With all the new research in genome-wide cellular interaction networks I was a little surprised not to see much support for these type of data in Biopython. I know that Bioperl has a networks package that looks like the kind of thing that I would love to also see in Python for all the obvious reasons. First - has this already been done and I missed it? All I could find were a few scattered and application-specific scripts across the web, plus the Pathway package in BioPython. If not, then would there be any interest in development along these lines? A while back I wrote a few scripts that parse interaction datasets, stick them into a MySQL database, and retrieve the interactions into a Network object that can be used to analyze the graph of nodes and links. I would be glad to update these to fit into the biopython framework, as it would be useful to my own research. One caveat is that I am an engineering PhD student and my programming skills are mostly self-taught beyond two Java courses, so I might need a little guidance in testing and preparing the code for distribution. I have only ever written code for my own personal research but I think my style is decent and I would love to get better. Any opinion or advice? Thanks -Jake Feala Bioengineering Dept. University of California, San Diego From edschofield at gmail.com Tue May 15 18:37:30 2007 From: edschofield at gmail.com (Ed Schofield) Date: Tue, 15 May 2007 19:37:30 +0100 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> References: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com> Message-ID: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com> On 5/15/07, Jake Feala wrote: > Hello Biopython people - > > With all the new research in genome-wide cellular interaction > networks I was a little surprised not to see much support for these > type of data in Biopython. I know that Bioperl has a networks package > that looks like the kind of thing that I would love to also see in > Python for all the obvious reasons. > > First - has this already been done and I missed it? All I could find > were a few scattered and application-specific scripts across the web, > plus the Pathway package in BioPython. > > If not, then would there be any interest in development along these > lines? A while back I wrote a few scripts that parse interaction > datasets, stick them into a MySQL database, and retrieve the > interactions into a Network object that can be used to analyze the > graph of nodes and links. I would be glad to update these to fit into > the biopython framework, as it would be useful to my own research. > > One caveat is that I am an engineering PhD student and my programming > skills are mostly self-taught beyond two Java courses, so I might need > a little guidance in testing and preparing the code for distribution. > I have only ever written code for my own personal research but I think > my style is decent and I would love to get better. > > Any opinion or advice? This would interest me too; I'd be glad to have such functionality in BioPython. I can offer you some guidance on Python, packaging and testing, and (if you need it) use of external array packages. -- Ed From yair.benita at gmail.com Tue May 15 19:25:27 2007 From: yair.benita at gmail.com (Yair Benita) Date: Tue, 15 May 2007 15:25:27 -0400 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com> Message-ID: I would be happy to contribute to this too. Currently I have a python script that uses HPRD to generate protein protein interaction maps. I have deferent filtering methods to display only classes of proteins or only links to a specific kegg pathway. It will need a bit of work before I can submit this to CVS. As for drawing the map, I am currently generating a dot file that can be converted to an image using GRAPHVIZ. If anyone wants to suggest anything else, please do. Yair on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > On 5/15/07, Jake Feala wrote: >> Hello Biopython people - >> >> With all the new research in genome-wide cellular interaction >> networks I was a little surprised not to see much support for these >> type of data in Biopython. I know that Bioperl has a networks package >> that looks like the kind of thing that I would love to also see in >> Python for all the obvious reasons. >> >> First - has this already been done and I missed it? All I could find >> were a few scattered and application-specific scripts across the web, >> plus the Pathway package in BioPython. >> >> If not, then would there be any interest in development along these >> lines? A while back I wrote a few scripts that parse interaction >> datasets, stick them into a MySQL database, and retrieve the >> interactions into a Network object that can be used to analyze the >> graph of nodes and links. I would be glad to update these to fit into >> the biopython framework, as it would be useful to my own research. >> >> One caveat is that I am an engineering PhD student and my programming >> skills are mostly self-taught beyond two Java courses, so I might need >> a little guidance in testing and preparing the code for distribution. >> I have only ever written code for my own personal research but I think >> my style is decent and I would love to get better. >> >> Any opinion or advice? > > This would interest me too; I'd be glad to have such functionality in > BioPython. I can offer you some guidance on Python, packaging and > testing, and (if you need it) use of external array packages. > > -- Ed > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Tue May 15 19:19:23 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 15 May 2007 20:19:23 +0100 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> Message-ID: <464A07BB.8020206@maubp.freeserve.co.uk> Chris Lasher wrote: > Since no one else has volunteered, I'm taking up responsibility for > the transition. I got the ball moving by contacting "support at > open-bio.org" to get alert them of our interest and get any contacts > we'll need to make this happen. Also, if anybody on the list has any > information that would be helpful in this (e.g., who administers the > CVS repo) please feel free to send it along. Likewise, feel free to > raise any questions, concerns, and comments on the list. Did you get any information from the Open Bioinformatics Foundation guys about moving from CVS to subversion? Peter From lists.steve at arachnedesign.net Tue May 15 19:56:46 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Tue, 15 May 2007 15:56:46 -0400 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: Message-ID: Hi, On May 15, 2007, at 3:25 PM, Yair Benita wrote: > I would be happy to contribute to this too. > Currently I have a python script that uses HPRD to generate protein > protein > interaction maps. I have deferent filtering methods to display only > classes > of proteins or only links to a specific kegg pathway. It will need > a bit of > work before I can submit this to CVS. As for drawing the map, I am > currently > generating a dot file that can be converted to an image using > GRAPHVIZ. If > anyone wants to suggest anything else, please do. I've been using NetworkX[1] to play w/ networks/graphs interactively. You can display them if you have matplotlib installed, and can save the graphs to dot format as well. -steve [1] NetworkX: https://networkx.lanl.gov/wiki > > Yair > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > >> On 5/15/07, Jake Feala wrote: >>> Hello Biopython people - >>> >>> With all the new research in genome-wide cellular interaction >>> networks I was a little surprised not to see much support for these >>> type of data in Biopython. I know that Bioperl has a networks >>> package >>> that looks like the kind of thing that I would love to also see in >>> Python for all the obvious reasons. >>> >>> First - has this already been done and I missed it? All I could >>> find >>> were a few scattered and application-specific scripts across the >>> web, >>> plus the Pathway package in BioPython. >>> >>> If not, then would there be any interest in development along these >>> lines? A while back I wrote a few scripts that parse interaction >>> datasets, stick them into a MySQL database, and retrieve the >>> interactions into a Network object that can be used to analyze the >>> graph of nodes and links. I would be glad to update these to fit >>> into >>> the biopython framework, as it would be useful to my own research. >>> >>> One caveat is that I am an engineering PhD student and my >>> programming >>> skills are mostly self-taught beyond two Java courses, so I might >>> need >>> a little guidance in testing and preparing the code for >>> distribution. >>> I have only ever written code for my own personal research but I >>> think >>> my style is decent and I would love to get better. >>> >>> Any opinion or advice? >> >> This would interest me too; I'd be glad to have such functionality in >> BioPython. I can offer you some guidance on Python, packaging and >> testing, and (if you need it) use of external array packages. >> >> -- Ed >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From salish at picasso.ucsf.edu Tue May 15 20:36:05 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Tue, 15 May 2007 13:36:05 -0700 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface Message-ID: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> Hello everyone, I started using Biopython in my research and I needed a way to write GenBank files from a SeqRecord (which was parsed from other GenBank/etc files). So I wrote something up. It uses the SeqIO interface and behaves like the fasta writer. The SeqIO.write(record, handle, "genbank") interface accepts "record" as either a SeqRecord generator with multiple records or a single record from SeqRecord. So record = SeqRecord or record = SeqRecord.next() both work. (I'm a relatively new to Python, so please excuse any bad terminology or stylistic deficiencies). The changes are: a new file called GenBankWriter.py in Bio/GenBank. Small changes to the __init__.py of Bio/GenBank. Changes to the _feed_first_line function of Scanner.py of Bio/GenBank. I had to change the way Bio/GenBank/Scanner.py reads the Locus line of a GenBank file in order to handle missing data and newer molecule types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). I also add/change a couple of lines in __init__.py to store whether a sequence was linear or circular and to store the string that encodes its molecule type (ss-RNA, etc). The output of SeqIO.write(record,handle,"genbank") is functionally identical to a GenBank file from NCBI except for some spacing and word wrap issues. What is the best way to submit new code for review? Whom do I send it to and should I send only the modified files? I've included one of my test scripts below just to show how it works. (Does anyone suggest any changes in the interface?) Thank you. Sincerely, Howard Salis Postdoctoral Scholar UC San Francisco #ASimpleTest.py """A vigorous exercise of the GenBankWriter class and the SeqIO interface.""" from Bio import SeqIO from Bio import GenBank working_dir = "E:\\Plasmids\\" #Get some arbitrarily chosen GenBank files (these are relatively small ones) gi_list = GenBank.search_for("EF470550 OR EF470551") print gi_list ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank") #Write the pair of strings to a single file. handle = open(working_dir + "Source.gb","w") for gi in gi_list: handle.write(str(ncbi_dict[gi])) handle.close() #Parse the Source file into a SeqRecord generator (two records) handle = open(working_dir + "Source.gb","r") records = SeqIO.parse(handle,"genbank") #write many records to a single GenBank file file = open(working_dir + "ManyRecords.gb","w") SeqIO.write(records,file,"genbank") file.close() handle.close() #---- #Parse the Source file into a SeqRecord generator (two records) handle = open(working_dir +"Source.gb","r") records = SeqIO.parse(handle,"genbank") #Write individual records into their own GenBank file counter=0 for record in records: counter+=1 file = open(working_dir + "OneFile_" + str(counter) + ".gb","w") SeqIO.write(record,file,"genbank") file.close() handle.close() #Open then back up again, parse them, and write them to a single file handle = open(working_dir + "ManyRecords_Out.gb","w") for num in range(1,counter+1): print num file = open(working_dir +"OneFile_" + str(num) + ".gb","r") records = SeqIO.parse(file,"genbank") SeqIO.write(records,handle,"genbank") file.close() handle.close() #Compare the original GenBank file in Source.gb to the GenBankWriter'd one. original = open(working_dir +"Source.gb","r") newone = open(working_dir + "ManyRecords_Out.gb","r") records_original = SeqIO.parse(original,"genbank") records_newone = SeqIO.parse(newone,"genbank") for (record_original,record_newone) in zip(records_original,records_newone): print str(record_original) print str(record_newone) original.close() newone.close() print "Done" From biopython-dev at maubp.freeserve.co.uk Tue May 15 20:59:55 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 15 May 2007 21:59:55 +0100 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> Message-ID: <464A1F4B.9020705@maubp.freeserve.co.uk> Howard Salis wrote: > Hello everyone, > > I started using Biopython in my research and I needed a way to > write GenBank files from a SeqRecord (which was parsed from other > GenBank/etc files). So I wrote something up. It uses the SeqIO > interface and behaves like the fasta writer. Sounds nice - its something I've been thinking about doing myself, but I wanted to do both both GenBank and EMBL, sharing the feature table writing code. Something else to keep in mind is writing any SeqRecord to a GenBank (or EMBL) file, even if it did not get created from a GenBank or EMBL file and is therefore lacking lots of annotation. > The changes are: a new file called GenBankWriter.py in Bio/GenBank. > Small changes to the __init__.py of Bio/GenBank. Changes to the > _feed_first_line function of Scanner.py of Bio/GenBank. > > I had to change the way Bio/GenBank/Scanner.py reads the Locus line of > a GenBank file in order to handle missing data and newer molecule > types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). That was recently fixed on Bug 2289 http://bugzilla.open-bio.org/show_bug.cgi?id=2289 > I also add/change a couple > of lines in __init__.py to store whether a sequence was linear or > circular and to store the string that encodes its molecule type > (ss-RNA, etc). I thought we already stored this information - but I'm not sure off hand. > The output of SeqIO.write(record,handle,"genbank") is > functionally identical to a GenBank file from NCBI except for some > spacing and word wrap issues. Good :) > What is the best way to submit new code for review? Whom do I send it > to and should I send only the modified files? You could email it directly to me, but it would be better to create a bug (an "enhancement") and then attached the changes to the bug. Edited versions of files will do, but patch files are best. You should use the unix "diff" command line tool to create a patch file. One way to do this on Windows is to install cygwin... > I've included one of my test scripts below just to show how it works. > (Does anyone suggest any changes in the interface?) Looking at the code, at first glance it looks like you are hooking into the existing Bio.SeqIO interface nicely. I look forward to seeing your code Howard. Peter From bugzilla-daemon at portal.open-bio.org Wed May 16 01:55:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 May 2007 21:55:14 -0400 Subject: [Biopython-dev] [Bug 2294] New: These patches allow one to write a GenBank file using the SeqIO interface Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2294 Summary: These patches allow one to write a GenBank file using the SeqIO interface Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: howard.salis at gmail.com The SeqIO interface currently reads from, but does not write to the GenBank format. The GenBank format is widely used and is often chosen as the data storage format for many plasmid, genome, and other nucleotide editors. By giving Biopython the capability of writing annotated sequences to the GenBank format, one can use Biopython to read in raw sequences, analyze and annotate them, and then view them in a nucleotide visual editor. The following patches do exactly this and use the current SeqIO interface to do it. The following attached patches enable the command SeqIO.write(record,handle,"genbank"), where handle is an open, writable file-object and record is _either_ a SeqRecord generator or the result of one of its iterations. That is, if one did manyrecords = SeqIO.parse(handle,"genbank") or onerecord = manyrecords.next(), then one could pass either manyrecords or onerecord to SeqIO.write(). If a generator containing multiple records is passed, all records are written to a single GenBank file. If one record is passed, it is written to file. The file is not closed, though, and may be called multiple times to write additional records to file. The attached patches make small modifications to Bio/SeqIO/__init__.py and Bio/SeqIO/InsdcIO.py. The _feed_first_line function in Bio/GenBank/Scanner.py is altered to handle missing data (it uses a very Pythonic dictionary of test lambda functions to parse the meaning of words). Finally, a new file is created called Bio/GenBank/GenBankWriter.py. Questions, Comments, Suggestions, Criticisms, etc are welcome. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 16 01:56:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 May 2007 21:56:32 -0400 Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a GenBank file using the SeqIO interface In-Reply-To: Message-ID: <200705160156.l4G1uWRZ005077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2294 howard.salis at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython-dev at biopython.org |howard.salis at gmail.com Status|NEW |ASSIGNED ------- Comment #1 from howard.salis at gmail.com 2007-05-15 21:56 EST ------- Created an attachment (id=654) --> (http://bugzilla.open-bio.org/attachment.cgi?id=654&action=view) patch to Bio/GenBank/Scanner.py (alters _feed_first_line under GenBank class) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From salish at picasso.ucsf.edu Wed May 16 02:34:08 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Tue, 15 May 2007 19:34:08 -0700 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <464A1F4B.9020705@maubp.freeserve.co.uk> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk> Message-ID: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> On 5/15/07, Peter wrote: > Sounds nice - its something I've been thinking about doing myself, but I > wanted to do both both GenBank and EMBL, sharing the feature table > writing code. Yep, since EMBL and GenBank share the same feature format, I've separated the "foreword", feature table, and sequence write functions. So if someone wants to write the EMBL writer, they just need to write the appropriate foreword. I think the sequence data is stored the same too? Is that correct? > Something else to keep in mind is writing any SeqRecord to a GenBank (or > EMBL) file, even if it did not get created from a GenBank or EMBL file > and is therefore lacking lots of annotation. Very true. The GenBankWriter.py will either leave these fields blank, leave out their keywords entirely if they are optional, or add something like or when it's necessary to have something there. > > I also add/change a couple > > of lines in __init__.py to store whether a sequence was linear or > > circular and to store the string that encodes its molecule type > > (ss-RNA, etc). > > I thought we already stored this information - but I'm not sure off hand. Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA()) that says whether it's DNA, RNA, peptide, etc, but even if I matched these ups with strings, then the "ss-", "ds-", etc part would be missing. I just saved the exact wording of the sequence type (e.g. "ds-DNA", "ss-RNA", etc) to an dictionary key named self.data.annotations["sequence_type"] in the _FeatureConsumer class under GenBank. This is in addition to the alphabet of the sequence so it shouldn't conflict. > You could email it directly to me, but it would be better to create a > bug (an "enhancement") and then attached the changes to the bug. Edited > versions of files will do, but patch files are best. Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294 > I look forward to seeing your code Howard. > > Peter Thank you! And I hope to continue to contribute to Biopython. -Howard From biopython-dev at maubp.freeserve.co.uk Wed May 16 07:53:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 16 May 2007 08:53:41 +0100 Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface In-Reply-To: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk> <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com> Message-ID: <464AB885.80305@maubp.freeserve.co.uk> Howard Salis wrote: >> Sounds nice - its something I've been thinking about doing myself, >> but I wanted to do both both GenBank and EMBL, sharing the feature >> table writing code. > > Yep, since EMBL and GenBank share the same feature format, I've > separated the "foreword", feature table, and sequence write > functions. Using "foreword / features / sequence" avoids clashing with the terms "header" and "footer" used in Bio.SeqIO to mean parts of a multi-sequence file which do not belong to a specific record. Maybe I should update Bio/GenBank/Scanner.py to use similar terminology... > So if someone wants to write the EMBL writer, they just need to write > the appropriate foreword. There is also the issue of translation between EMBL/GenBank terminology, for example where someone has read in an EMBL file and wants to write it out as a GenBank file. For a simple example, the division class should probably map: {'PRI': 'MAM', 'BCT': 'PRO', 'UNA': 'UNC'} > I think the sequence data is stored the same too? Is that correct? Actually, the way the sequence is printed out is slightly different. >>> I also add/change a couple of lines in __init__.py to store >>> whether a sequence was linear or circular and to store the string >>> that encodes its molecule type (ss-RNA, etc). >> I thought we already stored this information - but I'm not sure off >> hand. > > Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA()) > that says whether it's DNA, RNA, peptide, etc, but even if I matched > these ups with strings, then the "ss-", "ds-", etc part would be > missing. I just saved the exact wording of the sequence type (e.g. > "ds-DNA", "ss-RNA", etc) to an dictionary key named > self.data.annotations["sequence_type"] in the _FeatureConsumer class > under GenBank. This is in addition to the alphabet of the sequence so > it shouldn't conflict. That's probably a good idea. However, we would need to check what the EMBL equivalents are and convert them when writing GenBank files. Maybe we should just keep things simple and write one of RNA/DNA/Protein only? > Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294 I have made some more specific comments on the bug. I this email I have tried to stick to the broader picture. Peter From jfeala at gmail.com Wed May 16 17:25:37 2007 From: jfeala at gmail.com (Jake Feala) Date: Wed, 16 May 2007 10:25:37 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: Message-ID: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Thanks Ed and Yair, I'm really glad there's some interest in this! I'll get started on dusting off my code and adding more documentation. Steve - great suggestion. I had already seen at NetworkX and was already thinking about switching over to this as the back-end graph representation. Are there any issues that I should think about when creating these extra dependencies? Also, what is the next step in this process? Should we agree on an API and class hierarchy before we start dumping code on each other? Which aspects can we make compatible with other Biopython objects? (I was thinking maybe parsers for the interaction datasets and the SQL interface) -Jake On 5/15/07, Steve Lianoglou wrote: > Hi, > > On May 15, 2007, at 3:25 PM, Yair Benita wrote: > > > I would be happy to contribute to this too. > > Currently I have a python script that uses HPRD to generate protein > > protein > > interaction maps. I have deferent filtering methods to display only > > classes > > of proteins or only links to a specific kegg pathway. It will need > > a bit of > > work before I can submit this to CVS. As for drawing the map, I am > > currently > > generating a dot file that can be converted to an image using > > GRAPHVIZ. If > > anyone wants to suggest anything else, please do. > > I've been using NetworkX[1] to play w/ networks/graphs interactively. > You can display them if you have matplotlib installed, and can save > the graphs to dot format as well. > > -steve > > [1] NetworkX: https://networkx.lanl.gov/wiki > > > > > Yair > > > > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > > > >> On 5/15/07, Jake Feala wrote: > >>> Hello Biopython people - > >>> > >>> With all the new research in genome-wide cellular interaction > >>> networks I was a little surprised not to see much support for these > >>> type of data in Biopython. I know that Bioperl has a networks > >>> package > >>> that looks like the kind of thing that I would love to also see in > >>> Python for all the obvious reasons. > >>> > >>> First - has this already been done and I missed it? All I could > >>> find > >>> were a few scattered and application-specific scripts across the > >>> web, > >>> plus the Pathway package in BioPython. > >>> > >>> If not, then would there be any interest in development along these > >>> lines? A while back I wrote a few scripts that parse interaction > >>> datasets, stick them into a MySQL database, and retrieve the > >>> interactions into a Network object that can be used to analyze the > >>> graph of nodes and links. I would be glad to update these to fit > >>> into > >>> the biopython framework, as it would be useful to my own research. > >>> > >>> One caveat is that I am an engineering PhD student and my > >>> programming > >>> skills are mostly self-taught beyond two Java courses, so I might > >>> need > >>> a little guidance in testing and preparing the code for > >>> distribution. > >>> I have only ever written code for my own personal research but I > >>> think > >>> my style is decent and I would love to get better. > >>> > >>> Any opinion or advice? > >> > >> This would interest me too; I'd be glad to have such functionality in > >> BioPython. I can offer you some guidance on Python, packaging and > >> testing, and (if you need it) use of external array packages. > >> > >> -- Ed > >> _______________________________________________ > >> Biopython-dev mailing list > >> Biopython-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From jhackney at stanford.edu Wed May 16 18:10:38 2007 From: jhackney at stanford.edu (Jason A. Hackney) Date: Wed, 16 May 2007 11:10:38 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Message-ID: Hi All, I'm also interested in an interaction network class for biopython. I'm willing to contribute to the effort with either code review or testing. Cheers, Jason Jason A. Hackney Postdoctoral Fellow Department of Microbiology and Immunology Stanford University e-mail: jhackney at stanford.edu lab phone: 650-724-3891 mobile: 650-283-6907 On May 16, 2007, at 10:25 AM, Jake Feala wrote: > Thanks Ed and Yair, I'm really glad there's some interest in this! > I'll get started on dusting off my code and adding more documentation. > > Steve - great suggestion. I had already seen at NetworkX and was > already thinking about switching over to this as the back-end graph > representation. Are there any issues that I should think about when > creating these extra dependencies? > > Also, what is the next step in this process? Should we agree on an > API and class hierarchy before we start dumping code on each other? > Which aspects can we make compatible with other Biopython objects? (I > was thinking maybe parsers for the interaction datasets and the SQL > interface) > > -Jake > > > On 5/15/07, Steve Lianoglou wrote: >> Hi, >> >> On May 15, 2007, at 3:25 PM, Yair Benita wrote: >> >>> I would be happy to contribute to this too. >>> Currently I have a python script that uses HPRD to generate protein >>> protein >>> interaction maps. I have deferent filtering methods to display only >>> classes >>> of proteins or only links to a specific kegg pathway. It will need >>> a bit of >>> work before I can submit this to CVS. As for drawing the map, I am >>> currently >>> generating a dot file that can be converted to an image using >>> GRAPHVIZ. If >>> anyone wants to suggest anything else, please do. >> >> I've been using NetworkX[1] to play w/ networks/graphs interactively. >> You can display them if you have matplotlib installed, and can save >> the graphs to dot format as well. >> >> -steve >> >> [1] NetworkX: https://networkx.lanl.gov/wiki >> >>> >>> Yair >>> >>> >>> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: >>> >>>> On 5/15/07, Jake Feala wrote: >>>>> Hello Biopython people - >>>>> >>>>> With all the new research in genome-wide cellular interaction >>>>> networks I was a little surprised not to see much support for >>>>> these >>>>> type of data in Biopython. I know that Bioperl has a networks >>>>> package >>>>> that looks like the kind of thing that I would love to also see in >>>>> Python for all the obvious reasons. >>>>> >>>>> First - has this already been done and I missed it? All I could >>>>> find >>>>> were a few scattered and application-specific scripts across the >>>>> web, >>>>> plus the Pathway package in BioPython. >>>>> >>>>> If not, then would there be any interest in development along >>>>> these >>>>> lines? A while back I wrote a few scripts that parse interaction >>>>> datasets, stick them into a MySQL database, and retrieve the >>>>> interactions into a Network object that can be used to analyze the >>>>> graph of nodes and links. I would be glad to update these to fit >>>>> into >>>>> the biopython framework, as it would be useful to my own research. >>>>> >>>>> One caveat is that I am an engineering PhD student and my >>>>> programming >>>>> skills are mostly self-taught beyond two Java courses, so I might >>>>> need >>>>> a little guidance in testing and preparing the code for >>>>> distribution. >>>>> I have only ever written code for my own personal research but I >>>>> think >>>>> my style is decent and I would love to get better. >>>>> >>>>> Any opinion or advice? >>>> >>>> This would interest me too; I'd be glad to have such >>>> functionality in >>>> BioPython. I can offer you some guidance on Python, packaging and >>>> testing, and (if you need it) use of external array packages. >>>> >>>> -- Ed >>>> _______________________________________________ >>>> Biopython-dev mailing list >>>> Biopython-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>> >>> >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Wed May 16 22:05:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 16 May 2007 23:05:44 +0100 Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a GenBank file using the SeqIO interface In-Reply-To: <200705161904.l4GJ4pue012542@portal.open-bio.org> References: <200705161904.l4GJ4pue012542@portal.open-bio.org> Message-ID: <464B8038.9020900@maubp.freeserve.co.uk> Hi Howard, I'm replying to the mailing list as you've raised a few more general issues on bug 2294. Peter wrote: >> In the writer class, for your write_file(self, records) method, you allow >> explicitly check for and allow "records" to be a single SeqRecord. Don't. Any >> such "helpfulness" should be done in Bio.SeqIO.write() only, and not the >> individual write_file. Otherwise we'll end up with a situation where some >> writers are "helpful" and others are not. Howard replied: > Currently, the SeqIO's write function is > > def write(sequences, handle, format): > ... > > I can add checks to see if "sequences" is (a) a generator, (b) a SeqRecord > object, or (c) something else. If (a), then call > writer_class(handle).write_file(sequences). If (b), then call > writer_class(handle).write_record(sequences). If (c), spit out an error (for > now). I've added a check in Bio.SeqIO.write() for the "sequences" argument being a SeqRecord (your case b), and if so it now raises a ValueError. This is better than whatever cryptic error would have happened. I agree that it might be "nicer" if Bio.SeqIO.write() would also accept a SeqRecord object as input and did the expected thing, but having a fixed simple API is more straight forward. For comparison, see the previous discussions on the mailing list about having the file argument accepting either a handle or a filename (it was agreed that we would accept handles only). > Ok, so the standard is very exact in what the LOCUS line should be. However, > I've found that many programs do not write Genbank files exactly according to > this standard! So we might want to make the Genbank parser a bit more forgiving > to small changes in the spacing of Locus line, especially since many programs > leave out keywords. Have you got some examples? I would be keen to add a few more test cases for reasonable GenBank variations. > As it stands now, the patch code can handle missing keywords in the LOCUS line. If it doesn't already, the existing code column based code can easily do this too. > For example, the code defines a pair of dictionaries with lambda functions as > their keys > > ... > > I know this looks crazy, but it works really well. Where else but Python can I > have a dictionary / hash / whatever with the key being a function! :) Play > around with the code and you'll see how it works. Crazy code is scary ;) I'll try and have a play with this at the weekend. Note that this issue (parsing the LOCUS line) is a bit tangential to writing GenBank records. >> It also looks like when you write the LOCUS line you are not following >> the column based definition > > I'll fix this. The writing of the Genbank file should follow the standard to > the exactitude. I agree completely - As a general principle we should be a little bit flexible on reading files, but very strict on output. Regards, Peter From chris.lasher at gmail.com Sat May 19 20:21:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 19 May 2007 16:21:03 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <464A07BB.8020206@maubp.freeserve.co.uk> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> Message-ID: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> On 5/15/07, Peter wrote: > Did you get any information from the Open Bioinformatics Foundation guys > about moving from CVS to subversion? I didn't, with regards to public anonymous access to the Subversion repositories. I'm also on impromptu leave until this upcoming Monday, but we'll have this up and running by the end of the month. Chris From O.Doehring at cs.ucl.ac.uk Mon May 21 19:45:36 2007 From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk) Date: 21 May 2007 20:45:36 +0100 Subject: [Biopython-dev] Biopython to parse not only .pdb-files but also NACCESS .asa files Message-ID: Dear community, I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent Accessible Area Calculations ' to calculate two features which are not contained in standard .pdb-files. These two features are atomic accessiblity and van der Waal radius. As can be read in the readme file at http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output files' and at the PDB-Format site at http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'. NACCESS does the following: 'The output format is PDB, with B-factors and occupancies removed, then atomic accessiblity in square Angstroms, followed by the assigned van der Waal radius.' Note that Occupancy gets replaced by atomic accessiblity and B-factor by the van der Waal radius. This 'new' .pdb-file has extension .asa. I chose a quite straight-forward approach: I wanted to use Biopython as before, e.g. calling the B-Factor method but yielding the atomic accessiblity instead. But Biopython seems to type-check the .asa-file and complains that the B-factor is not of type float. Is there a way to access the data of .asa-files programmatically via the Biopython library? The only other way then seems to write a parser for .asa-files and to figure out which atomic element in the .pdb-file corresponds to the respective one in the .asa-file and finally to retrieve the wanted values for atomic accessiblity and van der Waal radius. Here are some more technical details. As an example I chose the '1DHR' protein: ------------------------------------------------------------------------------ def __init__(self,structure_id="1DHR",indices=[ 0]): # which residues are part of the patch self.indices = indices # If 1 (DEFAULT), the exceptions are caught, but some residues or atoms will be missing. # THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE! self.p=PDBParser(PERMISSIVE= 1) # which protein to analyse self.structure_id = structure_id self.fileName = self.structure_id + '.asa' self.structure = self.p.get_structure(self.structure_id, self.fileName) ------------------------------------------------------------------------------ Error message: Traceback (most recent call last): File "C:\Dokumente und Einstellungen\Renate D?hring\workspace\test\src\root\nested\compactness.py", line 249, in c = compact(indices=[0,1]) File "C:\Dokumente und Einstellungen\Renate D?hring\workspace\test\src\root\nested\compactness.py", line 17, in __init__ self.structure = self.p.get_structure(self.structure_id, self.fileName) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in get_structure self._parse(file.readlines()) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in _parse_coordinates bfactor=float(line[60:66]) ValueError: invalid literal for float(): 31 1. ------------------------------------------------------------------------------ I hope this question above was not discussed before but neither the search engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor could I find anything useful via a google search restricted to the archive using the 'site' attribute. What do you recommend for my situation. Many thanks! Yours, Orlando From edschofield at gmail.com Tue May 22 16:57:49 2007 From: edschofield at gmail.com (Ed Schofield) Date: Tue, 22 May 2007 17:57:49 +0100 Subject: [Biopython-dev] [BioPython] Biopython to parse not only .pdb-files but also NACCESS .asa files In-Reply-To: References: Message-ID: <1b5a37350705220957o24f6a436k89d60764729695da@mail.gmail.com> On 21 May 2007 20:45:36 +0100, O.Doehring at cs.ucl.ac.uk wrote: > > ValueError: invalid literal for float(): 31 1. > > ... > > What do you recommend for my situation. Many thanks! Is that a space between 31 and 1? There's your problem. My advice is to insert import pdb pdb.set_trace() at line 159 in PDBParser.py and check out why the columns in your data are misaligned with what PDBParser.py expects. A quick scan of nac_readme.html implies that perhaps you need the -f argument to give you the full output format? But if you need to write your own parser for .asa files, you could use _parse_coordinates(self, coords_trailer) as a template. -- Ed From dalke at dalkescientific.com Sat May 26 10:10:21 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 26 May 2007 12:10:21 +0200 Subject: [Biopython-dev] [Biopython-announce] is this supposed to be really slow? In-Reply-To: References: <20070525233151.GA4507@caltech.edu> Message-ID: (Move this from the -announce to the -dev list) Bryan Smith, replying to Titus Brown wrote: > i did see this constraint for only one request per 3 seconds, but did > not realize each time i went through my loop that this was a separate > request. > is there anything to do about this constraint? In your "search_for" call add delay=0. def search_for(search, reldate=None, mindate=None, maxdate=None, batchsize=100, delay=2, callback_fn=None, start_id=0, max_ids=None): """search_for(search[, reldate][, mindate][, maxdate] [, batchsize][, delay][, callback_fn][, start_id][, max_ids]) -> ids Search PubMed and return a list of the PMID's that match the criteria. search is the search string used to search the database. reldate is the number of dates prior to the current date to restrict the search. mindate and maxdate are the dates to restrict the search, e.g. 2002/01/01. batchsize specifies the number of ids to return at one time. By default, it is set to 10000, the maximum. delay is the number of seconds to wait between queries (default 2). callback_fn is an optional callback function that will be called as passed a PMID as results are retrieved. start_id specifies the index of the first id to retrieve and max_ids specifies the maximum number of id's to retrieve. in your Dictionary creation also add delay=0 class Dictionary: def __init__(self, delay=5.0, parser=None): """Dictionary(delay=5.0, parser=None) Create a new Dictionary to access PubMed. parser is an optional parser (e.g. Medline.RecordParser) object to change the results into another form. If set to None, then the raw contents of the file will be returned. delay is the number of seconds to wait between each query. >> I personally tend to just use the NCBI retrieval URLs directly, but >> that's kind of ugly. NCBI also watches those requests, and if you do too many you might get a warning or be blocked off, or so rumor has it. BTW, in your original code you can simplify > for idx in range( len( termIds ) ): > pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ] > ].publication_date[ 0:4 ] ) > idx = idx + 1 to for idx, termId in enumerate(termIds): pubDates[idx] = int(medlineDict[termId]].publication_date[:4]) Andrew dalke at dalkescientific.com From chris.lasher at gmail.com Thu May 31 04:30:38 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 31 May 2007 00:30:38 -0400 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> Message-ID: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> On 5/19/07, Chris Lasher wrote: > On 5/15/07, Peter wrote: > > Did you get any information from the Open Bioinformatics Foundation guys > > about moving from CVS to subversion? > > I didn't, with regards to public anonymous access to the Subversion > repositories. I'm also on impromptu leave until this upcoming Monday, > but we'll have this up and running by the end of the month. > > Chris > I'm obviously missing another target, and BOSC 2007 is fast approaching. I'm being held up by 4 files that are in the CVS repository that were foolishly committed with carriage returns (i.e., "\r") in the filenames. How that's possible, I have no clue, but I need to alter the data in the CVS repository so those filenames are correct, or otherwise completely removed, over the entire history of those files. Does anyone have any experience with the internals of CVS repositories? I definitely do not. Chris From biopython-dev at maubp.freeserve.co.uk Thu May 31 09:07:59 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 31 May 2007 10:07:59 +0100 Subject: [Biopython-dev] Subversion Repository In-Reply-To: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> Message-ID: <465E906F.1080704@maubp.freeserve.co.uk> Chris Lasher wrote: > On 5/19/07, Chris Lasher wrote: >> On 5/15/07, Peter wrote: >>> Did you get any information from the Open Bioinformatics Foundation guys >>> about moving from CVS to subversion? >> I didn't, with regards to public anonymous access to the Subversion >> repositories. I'm also on impromptu leave until this upcoming Monday, >> but we'll have this up and running by the end of the month. >> >> Chris >> > > I'm obviously missing another target, and BOSC 2007 is fast > approaching. Are you going to BOSC 2007 Chris? > I'm being held up by 4 files that are in the CVS > repository that were foolishly committed with carriage returns (i.e., > "\r") in the filenames. How that's possible, I have no clue, but I > need to alter the data in the CVS repository so those filenames are > correct, or otherwise completely removed, over the entire history of > those files. Does anyone have any experience with the internals of CVS > repositories? I definitely do not. How strange! I have no experience with the internals of CVS so can't help you there. What are the four offending files? Maybe we could just purge them for the move to SVN. Also, I suspect (but have not checked this) that a few of the examples files in the unit tests have been checked in as binary files rather than text (due to some odd differences in new lines across platforms). Again, a CVS expert would probably be able to generate a list of all "binary" files in the repository fairly easily. Peter From bugzilla-daemon at portal.open-bio.org Thu May 31 13:14:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:14:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311314.l4VDEV2X031189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:14 EST ------- Created an attachment (id=661) --> (http://bugzilla.open-bio.org/attachment.cgi?id=661&action=view) Updated version of Bio/Cluster/cluster.c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 31 13:15:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:15:17 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311315.l4VDFH6D031294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:15 EST ------- Created an attachment (id=662) --> (http://bugzilla.open-bio.org/attachment.cgi?id=662&action=view) Updated version of Bio/Cluster/clustermodule.c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 31 13:17:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 May 2007 09:17:31 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200705311317.l4VDHVB7031418@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 ------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:17 EST ------- Could you try with the updated Bio/Cluster/cluster.c, Bio/Cluster/clustermodule.c (see attachments)? These should solve the problems with the Cluster unit test. If they work fine, I'll upload them to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jfeala at gmail.com Thu May 31 23:52:36 2007 From: jfeala at gmail.com (Jake Feala) Date: Thu, 31 May 2007 16:52:36 -0700 Subject: [Biopython-dev] interaction networks in biopython In-Reply-To: References: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com> Message-ID: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com> Hi Everybody - I've been thinking about the possible structure of a "BioNet" package, and here is what I think would be most useful: InteractionRecord.py - a storage object for biological interactions, mirroring information stored in the PSI-MI (Proteomics Standards Initiative - Molecular Interaction) XML standard - unless someone knows a better one. Network.py - network object inheriting a NetworkX graph class with additional methods for manipulating an InteractionRecord stored with each edge InteractionIO - a submodule with parsers to read and write interactions to/from Cytoscape, PSI-MI, and other formats or online interaction databases BioNetSQL - a submodule for storing and querying to a local SQL database of interactions I've started on the code, including parsers for Cytoscape, PSI-MI XML files, and GRID flat files. I haven't fixed up my SQL scripts yet because I want to rethink the database design. All the code is available at http://cmrg.ucsd.edu/JakeFeala#software Here is an example that worked fine for me: from Network import * f = open() parser = GRIDIterator(f): net = create_network() net.load(parser) Are there any suggestions, regarding (1) the standard for InteractionRecord, (2) methods for the Network object, (3) structure of the SQL database, (4) overall structure of the package? Also, does anyone want to contribute to any specific part (e.g. Yair can add his HPRD parser)? Thanks! -Jake On 5/16/07, Jason A. Hackney wrote: > Hi All, > > I'm also interested in an interaction network class for biopython. I'm > willing to contribute to the effort with either code review or testing. > > Cheers, > > Jason > > > > Jason A. Hackney > > Postdoctoral Fellow > Department of Microbiology and Immunology > Stanford University > > e-mail: jhackney at stanford.edu > lab phone: 650-724-3891 > mobile: 650-283-6907 > > > > > > On May 16, 2007, at 10:25 AM, Jake Feala wrote: > > Thanks Ed and Yair, I'm really glad there's some interest in this! > I'll get started on dusting off my code and adding more documentation. > > Steve - great suggestion. I had already seen at NetworkX and was > already thinking about switching over to this as the back-end graph > representation. Are there any issues that I should think about when > creating these extra dependencies? > > Also, what is the next step in this process? Should we agree on an > API and class hierarchy before we start dumping code on each other? > Which aspects can we make compatible with other Biopython objects? (I > was thinking maybe parsers for the interaction datasets and the SQL > interface) > > -Jake > > > On 5/15/07, Steve Lianoglou wrote: > Hi, > > On May 15, 2007, at 3:25 PM, Yair Benita wrote: > > > I would be happy to contribute to this too. > Currently I have a python script that uses HPRD to generate protein > protein > interaction maps. I have deferent filtering methods to display only > classes > of proteins or only links to a specific kegg pathway. It will need > a bit of > work before I can submit this to CVS. As for drawing the map, I am > currently > generating a dot file that can be converted to an image using > GRAPHVIZ. If > anyone wants to suggest anything else, please do. > > I've been using NetworkX[1] to play w/ networks/graphs interactively. > You can display them if you have matplotlib installed, and can save > the graphs to dot format as well. > > -steve > > [1] NetworkX: https://networkx.lanl.gov/wiki > > > > Yair > > > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote: > > > On 5/15/07, Jake Feala wrote: > Hello Biopython people - > > With all the new research in genome-wide cellular interaction > networks I was a little surprised not to see much support for these > type of data in Biopython. I know that Bioperl has a networks > package > that looks like the kind of thing that I would love to also see in > Python for all the obvious reasons. > > First - has this already been done and I missed it? All I could > find > were a few scattered and application-specific scripts across the > web, > plus the Pathway package in BioPython. > > If not, then would there be any interest in development along these > lines? A while back I wrote a few scripts that parse interaction > datasets, stick them into a MySQL database, and retrieve the > interactions into a Network object that can be used to analyze the > graph of nodes and links. I would be glad to update these to fit > into > the biopython framework, as it would be useful to my own research. > > One caveat is that I am an engineering PhD student and my > programming > skills are mostly self-taught beyond two Java courses, so I might > need > a little guidance in testing and preparing the code for > distribution. > I have only ever written code for my own personal research but I > think > my style is decent and I would love to get better. > > Any opinion or advice? > > This would interest me too; I'd be glad to have such functionality in > BioPython. I can offer you some guidance on Python, packaging and > testing, and (if you need it) use of external array packages. > > -- Ed > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev >