From bugzilla-daemon at portal.open-bio.org Tue May 1 08:01:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 08:01:49 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011201.l41C1nXg017300@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-05-01 08:01 EST -------
Chris,
I was not able to replicate this bug on any of the platforms I've tried so far
(Windows 32-bits, Mac OS X, Unix, Linux). However, since it does occur on your
system, I still feel that this is a true bug that should be fixed. Would you be
willing to compile and run some test cases on your platform to find the source
of this problem?
One possibility is that the k-means algorithm gets stuck in an infinite
(periodic) loop in which genes are assigned back and forth between clusters. I
thought that with the current implementation, that was no longer possible, but
maybe there is some case that I overlooked. Since the k-means algorithm starts
from a random initial state, maybe on your platform starts from some funny
initial state that doesn't appear on the other platforms, causing this bug to
appear on your platform only.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 14:31:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 14:31:06 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011831.l41IV6ZU000918@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #9 from chris.lasher at gmail.com 2007-05-01 14:31 EST -------
I'd definitely be willing to run any tests. Just to note, I am not the one who
discovered this bug, I was only the one who filed it. Credit for discovering it
goes to Alex Lancaster who sent in notification of this on April 11th to the
BioPython mailing list (see ). That was on a Fedora
Core installation, so this is not just specific to 32-bit Ubuntu.
Could this involve the source of the Numeric and mxTextTools packages? I
installed Numeric Python and eGenix mxTextTools from the Ubuntu distribution
packages, rather than from direct sources for both software packages. I can't
see why this would make a difference but it is something to consider. Also,
there's a possibility that I don't have all the required software, but I did
not get any warnings when installing from CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 14:48:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 14:48:35 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011848.l41ImZOB001726@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 14:48 EST -------
Checking the version of Numeric may be worth while - I recall from the MMTK
mailing list that some versions appeared to cause subtle bugs. In late 2005
Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version
23, but I don't know if he ever pinned down what the problem was (or indeed, if
there really was a problem).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 15:06:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 15:06:30 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011906.l41J6UED002525@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #11 from chris.lasher at gmail.com 2007-05-01 15:06 EST -------
(In reply to comment #10)
> Checking the version of Numeric may be worth while - I recall from the MMTK
> mailing list that some versions appeared to cause subtle bugs. In late 2005
> Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version
> 23, but I don't know if he ever pinned down what the problem was (or indeed, if
> there really was a problem).
>
On Dapper Drake, Edgy Eft and Feisty Fawn, the Numeric packages are 24.2.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 15:50:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 15:50:47 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011950.l41JolgE004634@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 15:50 EST -------
For reference, on my 64bit Ubuntu Dapper Drake system (where test_Cluster.py
works) I have the following packages installed:
python 2.4.2-0ubuntu3
python-reportlab 1.20debian-3ubuntu1
python-numeric 24.2-1ubuntu2
python-egenix-mxtexttools 2.0.6ubuntu1-1ubuntu4
i.e. Numeric 24.2 does work with test_Cluster.py for me.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 2 14:44:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 May 2007 14:44:01 -0400
Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with
alignments like Bio.SeqIO does sequences
In-Reply-To:
Message-ID: <200705021844.l42Ii154024905@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-02 14:44 EST -------
Created an attachment (id=643)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=643&action=view)
ZIP file containing four python scripts to go in Bio/AlignIO/*.py
There is a follow up patch to Bio/SeqIO/__init__.py to basically use
Bio.AlignIO for reading/writing clustal, stockholm and phylip instead. The
corresponding parsers under Bio/SeqIO/*.py would then be removed.
I have not yet worked out what a Nexus file looks like when it holds more than
one alignment (if in fact this is possible).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri May 4 05:20:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 May 2007 05:20:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705040920.l449KVW3015656@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-05-04 05:20 EST -------
Chris,
I found one Linux system on which test_Cluster.py hangs in the call to kmedoids
instead of the call to kcluster. It turned out that this was due to a
floating-point comparison in the kmedoids function. Since the same comparison
occurs in the kcluster function, this may very well be the reason
test_Cluster.py hangs on your platform in the call to kcluster. The comparison
involves two floating-point variables which are bit-wise identical to each
other. However, variable1 <= variable2 returns False.
Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release
1.43) and print out the two variables "total" and "previous"? (You may find
that test_Cluster.py no longer hangs when you add the printf statement; at
least that is what happened with the call to kmedoids). If total and previous
have the same value, but total>=previous returns False, then that would explain
why the call to kcluster hangs.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon May 7 10:16:38 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 07 May 2007 15:16:38 +0100
Subject: [Biopython-dev] Unified alignment input/output, Bio.AlignIO?
In-Reply-To: <463240F9.8010907@maubp.freeserve.co.uk>
References: <463240F9.8010907@maubp.freeserve.co.uk>
Message-ID: <463F34C6.90008@maubp.freeserve.co.uk>
Peter wrote:
> Following the release of Biopython 1.43 with Bio.SeqIO, I would like to
> do a better job for multiple sequence alignment file formats - creating
> a new module Bio.AlignIO
>
> While most multiple sequence alignment files usually contain a single
> alignment (made up of multiple sequences), this is not the general case.
>
> In the PHYLIP suite, concatenated alignments in phylip format are
> produced by the seqboot program for tasks like bootstrapping of a
> phylogenetic tree. Currently SeqIO chokes on these!
>
> Another example is the output of some the EMBOSS programs can contain
> many multiple sequences alignments, for example the water and needle
> tools can produce many pairwise alignments.
>
> In such cases, being able to write code like the following seems to be
> the logical extension of the Bio.SeqIO style we have agreed on:
>
> from Bio import AlignIO
> for alignment in AlignIO.parse("many.phy", "phylip") :
> print "Alignment with %i sequences of length %i" \
> % (len(alignment.get_all_seqs()),
> alignment.get_alignment_length())
> ...
>
> i.e. The AlignIO.parse() function would be an iterator returning
> alignment objects. Does this sound reasonable so far?
I have pressed ahead with this, there is a version attached to bug 2285
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
This handles reading and writing of clustal, phylip, stockholm/pfam. I
have not yet converted the Bio.SeqIO Nexus parser. Also, I plan to add a
parser for reading the EMBOSS alignment format.
As a side effect, this will actually remove a lot of the Bio.SeqIO code
as handling any alignment file can be delegated to Bio.AlignIO instead.
Would anyone like to comment on the scheme?
Peter
From bugzilla-daemon at portal.open-bio.org Mon May 7 13:45:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 13:45:32 -0400
Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with
alignments like Bio.SeqIO does sequences
In-Reply-To:
Message-ID: <200705071745.l47HjWGl031779@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #643 is|0 |1
obsolete| |
AssignedTo|biopython-dev at biopython.org |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
Status|NEW |ASSIGNED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-07 13:45 EST -------
Created an attachment (id=646)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=646&action=view)
ZIP file containing four python scripts to go in Bio/AlignIO/*.py
Misc updates to previous version
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 15:42:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 15:42:15 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705071942.l47JgFi3004609@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #14 from chris.lasher at gmail.com 2007-05-07 15:42 EST -------
Created an attachment (id=648)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=648&action=view)
modified_Cluster.c_output.txt
This is output from Cluster.c modified with a printf statement prior to line
2071 for total and previous.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 15:56:27 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 15:56:27 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705071956.l47JuRcu005295@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #15 from chris.lasher at gmail.com 2007-05-07 15:56 EST -------
(In reply to comment #13)
> Chris,
>
> Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release
> 1.43) and print out the two variables "total" and "previous"? (You may find
> that test_Cluster.py no longer hangs when you add the printf statement; at
> least that is what happened with the call to kmedoids). If total and previous
> have the same value, but total>=previous returns False, then that would explain
> why the call to kcluster hangs.
>
This did allow it to proceed up to test_distancematrix_kmedoids, however, it
once again reaches an infinite loop in this test. Additionally, the value for
"previous" reaches an enourmous number and I suspect it's not supposed to. (See
the attached output.)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 19:06:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 19:06:19 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705072306.l47N6JR3012976@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2007-05-07 19:06 EST -------
Thanks, Chris!
Actually, this looks OK.
The kcluster routine runs the k-means algorithm 100 times starting from random
initial clusterings. On each run, total is initialized to DBL_MAX (the largest
number representable as a double). This is the huge number that is printed
(printf usually has problems to print DBL_MAX nicely, so it may appear weird in
the output).
The same floating-point comparison that causes kcluster to hang also appears in
kmedoids, so it's no surprise that the code hangs there too.
I'll write a patch that avoids this floating-point comparison and post it here.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 09:48:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 09:48:11 -0400
Subject: [Biopython-dev] [Bug 2289] New: LOCUS ss-cRNA => ERROR
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
Summary: LOCUS ss-cRNA => ERROR
Product: Biopython
Version: 1.24
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: blocker
Priority: P1
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: Daniel.Nicorici at gmail.com
When I am processing a GenBank file from NCBI I get this error:
=======================================================================
Traceback (most recent call last):
File "F:\silvermine\tool\populator\ncbigenomic\source\python\do.py", line 26,
in
record = iterator.next()
File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 142, in
nex
t
return self._parser.parse(self.handle)
File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 208, in
par
se
self._scanner.feed(handle, self._consumer)
File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 360, in
feed
self._feed_first_line(consumer, self.line)
File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 782, in
_fee
d_first_line
'LOCUS line does not contain valid sequence type (DNA, RNA, ...):\n' + line
AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA,
...):
LOCUS NC_005236 1769 bp ss-cRNA linear VRL
26-FEB-2007
================================================================================
It seems that the error comes from the parser who is not able to handle
ss-cRNA. If I replace ss-cRNA with ss-RNA then is no error anymore.
Here is my python program which gives the error:
===========================================================
import glob
from Bio import GenBank
# the files which will be processed
path="G:\\Data\\NCBI\\genomic\\gbff\\temp\\complete*.genomic.gbff"
print "Starting..."
organism=[]
count_organism=[]
feature=[]
count_feature=[]
qualifier=[]
count_qualifier=[]
files = glob.glob(path)
for file in files:
print ">>>>>>>>>>>>>>>>>>>>>>>>>> " + file + " <<<<<<<<<<<<<<<<<<<<<<<<<"
parser = GenBank.RecordParser()
#infile = open("complete1short.genomic.gbff")
infile = open(file);
iterator = GenBank.Iterator(infile, parser)
record = iterator.next()
while record is not None:
print record.locus + " --- " + record.organism + " --- " +
record.version
# organism
flag=0
for b in range(len(organism)):
if organism[b]==record.organism:
count_organism[b]=count_organism[b]+1
flag=1
break
if flag==0:
organism.append(record.organism)
count_organism.append(1)
# features
for a in range(len(record.features)):
flag=0
for b in range(len(feature)):
if feature[b]==record.features[a].key:
count_feature[b]=count_feature[b]+1
flag=1
break
if flag==0:
feature.append(record.features[a].key)
count_feature.append(1)
#print "--" + record.features[i].key
# qualifiers
for c in range(len(record.features[a].qualifiers)):
flag=0
for b in range(len(qualifier)):
if qualifier[b]==record.features[a].qualifiers[c].key:
count_qualifier[b]=count_qualifier[b]+1
flag=1
break
if flag==0:
qualifier.append(record.features[a].qualifiers[c].key)
count_qualifier.append(1)
#print "----" + record.features[i].qualifiers[j].key
record=iterator.next()
print "===================ORGANISM========================"
for i in range(len(organism)):
print organism[i] + "\t" + str(count_organism[i])
print "===================END_ORGANISM===================="
print "===================FEATURES========================"
for i in range(len(feature)):
print feature[i] + "\t" + str(count_feature[i])
print "===================END_FEATURES===================="
print "===================QUALIFIERS========================"
for i in range(len(qualifier)):
print qualifier[i] + "\t" + str(count_qualifier[i])
print "===================END_QUALIFIERS===================="
print "The End!!!"
x=raw_input("Press ENTER to continue...")
============================================================
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 10:06:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:06:32 -0400
Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR
In-Reply-To:
Message-ID: <200705091406.l49E6WGi008294@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:06 EST -------
Confirmed: the parser currently only accepts entries
'DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'.
Could you tell me where you got this GenBank file from? It would be helpful
for testing (and I may want to add a similar example to the test suite).
Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 10:25:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:25:59 -0400
Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR
In-Reply-To:
Message-ID: <200705091425.l49EPxNf009285@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
------- Comment #2 from Daniel.Nicorici at gmail.com 2007-05-09 10:25 EST -------
Hello,
The entry ss-cRNA appears in the file:
ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 10:48:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:48:30 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091448.l49EmU9D010377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|blocker |normal
Status|ASSIGNED |RESOLVED
OS/Version|Windows XP |All
Platform|PC |All
Resolution| |FIXED
Summary|LOCUS ss-cRNA => ERROR |Can't parse GenBank files
| |with "ss-cRNA" in the LOCUS
| |line
Version|1.24 |Not Applicable
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:48 EST -------
See also Bug 2231. With hindsight checking against a known list of sequences
types was too harsh. It now just looks for the text "DNA" or "RNA" within this
field of the LOCUS line in GenBank files.
I've checked in a fix to CVS, and checked I can parse GenBank file NC_005236
The simplest way to update your machine Daniel is to download and replace the
file D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py with revision 1.11
from here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython
There will be a slight time delay before the CVS web site updates itself - you
can of course get the file sfrom CVS directly if you would rather.
Please let us know (on this bug) if that doesn't solve this problem.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 10:51:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:51:05 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091451.l49Ep5HC010511@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:51 EST -------
P.S. I have not tried the full file from here, as the FTP site was timing out.
ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz (15 MB)
I just tried the single GenBank record for NC_005236
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 10:57:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:57:06 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091457.l49Ev6L0010785@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
------- Comment #5 from Daniel.Nicorici at gmail.com 2007-05-09 10:57 EST -------
Here is the part of the file that generates the error:
=======================================================================
LOCUS NC_005236 1769 bp ss-cRNA linear VRL 20-FEB-2007
DEFINITION Seoul virus strain 80-39 segment S, complete sequence.
ACCESSION NC_005236
VERSION NC_005236.1 GI:38505529
PROJECT GenomeProject:15027
KEYWORDS .
SOURCE Seoul virus
ORGANISM Seoul virus
Viruses; ssRNA negative-strand viruses; Bunyaviridae; Hantavirus.
REFERENCE 1 (bases 1 to 1769)
AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J.
TITLE Genetic analysis of the full length of S segment of Seoul virus
prototype, 80-39 strain
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 1769)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (12-AUG-2004) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (bases 1 to 1769)
AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J.
TITLE Direct Submission
JOURNAL Submitted (09-APR-2003) Department of Microbiology, College of
Medicine, Korea University, 5-ka, Anam-dong, Sungbuk-ku, Seoul
136-705, Korea
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AY273791.
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..1769
/organism="Seoul virus"
/mol_type="viral cRNA"
/strain="80-39"
/isolation_source="Rattus norvegicus"
/db_xref="taxon:11608"
/segment="segment S"
/country="South Korea"
gene 43..1332
/locus_tag="SEOVsSgp1"
/db_xref="GeneID:2943086"
CDS 43..1332
/locus_tag="SEOVsSgp1"
/codon_start=1
/product="nucleocapsid protein"
/protein_id="NP_942556.1"
/db_xref="GI:38505530"
/db_xref="GeneID:2943086"
/translation="MATMEEIQREISAHEGQLVIARQKVKDAEKQYEKDPDDLNKRAL
HDRESVAASIQSKIDELKRQLADRIAAGKNIGQDRDPTGVEPGDHLKERSALSYGNTL
DLNSLDIDEPTGQTADWLTIIVYLTSFVVPIILKALYMLTTRGRQTSKDNKGMRIRFK
DDSSYEDVNGIRKPKHLYVSMPNAQSSMKAEEITPGRFRTAVCGLYPAQIKARNMVSP
VMSVVGFLALAKDWTSRIEEWLGAPCKFMAESPIAGSLSGNPVNRDYIRQRQGALAGM
EPKEFQALRQHSKDAGCTLVEHIESPSSIWVFAGAPDRCPPTCLFVGGMAELGAFFSI
LQDMRNTIMASKTVGTADEKLRKKSSFYQSYLRRTQSMGIQLDQRIIVMFMVAWGKEA
VDNFHLGDDMDPELRSLAQILIDQKVKEISNQEPMKL"
ORIGIN
1 tagtagtaga ctccctaaag agctactcca ctaacaagag aaatggcaac tatggaggaa
61 atccagagag aaatcagtgc tcacgagggg cagcttgtga tagcacgcca gaaggtcaag
121 gatgcagaaa agcagtatga gaaggatcct gatgacttaa acaagagggc actgcatgat
181 cgggagagtg tcgcagcttc aatacaatca aaaattgatg aactgaagcg ccaacttgcc
241 gacaggattg cagcagggaa gaacatcggg caagaccggg atcctacagg ggtagagccg
301 ggtgatcatc tcaaggaaag atcagcacta agctacggga atacactgga cctgaatagt
361 cttgacattg atgaacctac aggacaaaca gctgattggc tgactataat tgtctatcta
421 acatcattcg tggtcccgat catcttgaag gcactgtaca tgttaacaac aagaggtagg
481 cagacttcaa aggacaacaa ggggatgagg atcagattca aggatgacag ctcatatgag
541 gatgtcaatg ggatcagaaa gcctaaacat ctgtatgtgt caatgccaaa cgcccaatcc
601 agtatgaagg ctgaagagat aacaccagga agattccgca ctgcagtatg tgggctatat
661 cctgcacaga taaaggcaag gaatatggta agccctgtca tgagtgtagt tgggtttttg
721 gcactagcaa aagactggac atctagaatt gaagaatggc ttggcgcacc ctgcaagttc
781 atggcagagt ctcctattgc tgggagttta tctgggaatc ctgtgaatcg tgactatatc
841 agacaaagac aaggtgcact tgcagggatg gagccaaagg aatttcaagc cctcaggcaa
901 cattcaaagg atgctggatg tacactagtt gaacatattg agtcaccatc gtcaatatgg
961 gtgtttgctg gggcccctga taggtgtcca ccaacatgct tgtttgttgg agggatggct
1021 gagttaggtg ccttcttttc tatacttcag gatatgagga acacaatcat ggcttcaaaa
1081 actgtgggca cagctgatga aaagcttcga aagaaatcat cattctatca atcatacctc
1141 agacgcacac aatcaatggg aatacaactg gaccagagga taattgttat gtttatggtt
1201 gcctggggaa aggaggcagt ggacaacttc catctcggtg atgacatgga tccagagctt
1261 cgtagcctgg ctcagatctt gattgaccag aaagtgaagg aaatctcgaa ccaggagcct
1321 atgaaattat aagcacataa atatgtaatc aatactaact ataggttaag aaatactaat
1381 cattagttaa taagaataca gatttattga ataatcatat taaataatta ggtaagttaa
1441 atattattta gttaagttag ctaattgatt tatatgatta tcacaattga atgtaatcat
1501 aagcacaatc actgccatgt ataatcacgg gtatacgggt ggttttcata tggggaacag
1561 ggtgggctta gggccaggtc accttaagtg accttttttt gtatatatgg atgtagattt
1621 caattgatcg aatactaatc ctactgtcct cttttctttt cctttctcct tctttactaa
1681 caacaacaaa ctacctcaca accttctacc tcaatatata ctacctcatt aagttgtttc
1741 cttttgtctt tttagggagt ctactacta
//
========================================================================
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From stephen at blackrim.net Wed May 9 11:58:16 2007
From: stephen at blackrim.net (Stephen A Smith)
Date: Wed, 09 May 2007 11:58:16 -0400
Subject: [Biopython-dev] [Off Topic] Google Group
Message-ID: <4641EF98.90504@blackrim.net>
Hi all,
Just letting you know there is a google group open now for discussions
of all thing programming and evolutionary biology. You can find it here
http://groups.google.com/group/evo_code.
Figured the people at bio* might be interested.
Take care
Stephen Smith
--
Dept. Ecology and Evolutionary Biology
Yale University
http://www.blackrim.org
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS dpu s+: a- C++++ UL++++ P--- L++++ E--- W+++ N-- o-- K++++ w---
O- M-- V- PS+++ PE-- Y++ PGP++ t-- 5 X++ R-- tv++ b++++ DI+ D++
G++ e+++ h--- r+++ y+++
------END GEEK CODE BLOCK------
From bugzilla-daemon at portal.open-bio.org Thu May 10 08:59:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 08:59:07 -0400
Subject: [Biopython-dev] [Bug 2290] New: Not reading 1YVE.pdb
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
Summary: Not reading 1YVE.pdb
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: proszek at gmail.com
biopython 1.42 fails to read 1YVE.pdb file, although it reads test.pdb created
by:
awk '{if($1=="ATOM"){print}}' 1YVE.pdb
line 8610 is a HETATM line
Traceback below (where file=sys.argv[1]=1YVE.pdb)
WARNING: Chain J is discontinuous at line 8610.
Traceback (most recent call last):
File "./wezly.py", line 122, in ?
b=Protein(sys.argv[1])
File "./wezly.py", line 15, in __init__
self.struct=self.parser.get_structure('X',file)
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 66, in
get_structure
self._parse(file.readlines())
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 87, in
_parse
self.trailer=self._parse_coordinates(coords_trailer)
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 179, in
_parse_coordinates
structure_builder.init_residue(resname, hetero_flag, resseq, icode)
File "/usr/lib/python2.4/site-packages/Bio/PDB/StructureBuilder.py", line
155, in init_residue
self.chain.add(residue)
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 80, in add
raise PDBConstructionException, "%s defined twice" % entity.get_full_id()
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 132, in
get_full_id
parent=self.get_parent()
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 102, in
get_parent
raise PDBException, 'No parent'
PDBException: No parent
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 09:53:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 09:53:05 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705101353.l4ADr544030572@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 09:53 EST -------
Where did you get your 1YVE.pdb file from? Directly from the PDB?
Just as a remark, the "PDBException: No parent" is not the problem.
The error is further back, PDBConstructionException, "??? defined twice", and
when Bio.PDB tries to get the identity of the problem residue it falls over.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 10:06:21 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 10:06:21 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705101406.l4AE6LRW031886@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 10:06 EST -------
Which version of Biopython do you have?
The "no parent bug" was fixed as bug 1936, make sure you have Biopython 1.43 or
later.
Mine installation of Biopython works but spits out a LOT of
PDBConstructionException warnings about multiply defined water atoms (aka
"Residue HOH").
Looking at the raw PDB file, there is a problem with multiply defined waters.
As you can see below, the identifier jumps from 799 back to 1 (i.e. there are
two waters with residue number 1).
...
HETATM16581 O HOH 793 36.450 15.564 -9.023 1.00 39.79 O
HETATM16582 O HOH 794 33.448 13.711 -11.019 1.00 40.42 O
HETATM16583 O HOH 796 28.414 11.908 -16.047 1.00 48.15 O
HETATM16584 O HOH 797 29.445 8.114 -11.059 1.00 55.49 O
HETATM16585 O HOH 799 28.383 5.173 -8.998 1.00 33.85 O
HETATM16586 O HOH 1 26.615 4.599 -6.718 1.00 24.95 O
HETATM16587 O HOH 2 23.353 4.948 -7.137 1.00 34.47 O
HETATM16588 O HOH 3 17.401 11.710 0.938 1.00 35.16 O
HETATM16589 O HOH 4 21.326 11.092 8.215 1.00 22.51 O
HETATM16590 O HOH 5 13.703 2.159 11.421 1.00 24.87 O
...
Are you happy for me to mark this as a duplicate of bug 1936
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 11:07:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 11:07:56 -0400
Subject: [Biopython-dev] [Bug 2291] New: __init__.py missing in the
Bio.PDB.mmCIF folder after the install
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
Summary: __init__.py missing in the Bio.PDB.mmCIF folder after
the install
Product: Biopython
Version: Not Applicable
Platform: Macintosh
OS/Version: MacOS X
Status: NEW
Severity: normal
Priority: P2
Component: Website
AssignedTo: biopython-dev at biopython.org
ReportedBy: jean.lechner at gmail.com
When you install Biopython you musst uncoment some lines in the setup.py file
But at the end of the instalation the __init__.py file ils not created in the
Bio.PDB.mmCIF directory
So you cannot use MMCIFParser or MMCIF2Dict because biopython cannot import
MMCIFlex from Bio.PDB.mmCIF
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 11:08:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 11:08:48 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
folder after the install
In-Reply-To:
Message-ID: <200705101508.l4AF8mQf003465@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
jean.lechner at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jean.lechner at gmail.com
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From idoerg at gmail.com Thu May 10 13:18:23 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Thu, 10 May 2007 10:18:23 -0700
Subject: [Biopython-dev] Biopython talk at BOSC 2007?
Message-ID: <464353DF.6030700@burnham.org>
Anybody giving a talk on Biopython? We can get a 20-30 minute slot in
Vienna, but someone has to show up and talk.
Personally, I will actually be there for the ISMB SIGs, but as I am
running my own conference, it will be a bit of a strain to talk at BOSC.
However, the main reason I do not want to speak is that there are people
much more deserving here. So if anyone plans to be at ISMB 2007 in any
case, and wishes to represent Biopython with serpentine honor, contact
Darin.
Best,
Iddo
-------- Original Message --------
Subject: BOSC 2007 Second Call For Papers
Date: Thu, 10 May 2007 12:17:41 -0400
From: darin.london at duke.edu
To: biopython-owner at lists.open-bio.org
The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th. The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions. Thank you, and we hope to see you in Vienna.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From bugzilla-daemon at portal.open-bio.org Sun May 13 16:30:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 May 2007 16:30:10 -0400
Subject: [Biopython-dev] [Bug 2292] New: Bio.PDBIO writes TER records
without any required fields
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
Summary: Bio.PDBIO writes TER records without any required fields
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: misiek at genesilico.pl
Bio.PDBIO is happy to write TER records as "TER\n", which is inconsistent with
PDB format specification.
The PDB format requires that TER records have some fields similar to ATOM
records:
'''The TER record has the same residue name, chain identifier, sequence number
and insertion code as the terminal residue. The serial number of the TER record
is one number greater than the serial number of the ATOM/HETATM preceding the
TER.'''
[See http://www.wwpdb.org/documentation/format23/sect9.html#TER]
It leads to problem with programs that require correct TER records (like
multiple structural alignment program MUSTANG), and crash when they are not
found.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun May 13 16:31:18 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 May 2007 16:31:18 -0400
Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any
required fields
In-Reply-To:
Message-ID: <200705132031.l4DKVIP9008944@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
------- Comment #1 from misiek at genesilico.pl 2007-05-13 16:31 EST -------
Created an attachment (id=652)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=652&action=view)
Proposed patch to PDBIO.py
This is a simple fix.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From idoerg at gmail.com Mon May 14 12:27:42 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 14 May 2007 09:27:42 -0700
Subject: [Biopython-dev] Subject: BOSC 2007 2nd Call For Papers.
Message-ID: <46488DFE.3070908@burnham.org>
The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th. The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions. Thank you, and we hope to see you in Vienna.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From idoerg at gmail.com Mon May 14 12:28:36 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 14 May 2007 09:28:36 -0700
Subject: [Biopython-dev] BOSC 2007 Abstract Submission Deadline Extension
Message-ID: <46488E34.8000604@burnham.org>
Subject: BOSC 2007 Abstract Submission Deadline Extension
Due to technical difficulties in sending out the 2nd call for papers,
the BOSC organizers are extending the deadline for abstract submissions
to Monday May 21st. The announcement day will remain the same so that
it remains before the Early Discount Date.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From bugzilla-daemon at portal.open-bio.org Mon May 14 18:18:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 May 2007 18:18:47 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705142218.l4EMIlwD008110@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
Version|Not Applicable |1.42
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-14 18:18 EST -------
*** This bug has been marked as a duplicate of bug 1936 ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon May 14 18:29:05 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 14 May 2007 23:29:05 +0100
Subject: [Biopython-dev] Bugzilla Version Numbers
In-Reply-To: <46273FB4.4030805@maubp.freeserve.co.uk>
References: <128a885f0704102204p2872f42fh685919bb8b4656c3@mail.gmail.com>
<46273FB4.4030805@maubp.freeserve.co.uk>
Message-ID: <4648E2B1.5040706@maubp.freeserve.co.uk>
Peter wrote:
> Chris Lasher wrote:
>> Hi all,
>>
>> Does anybody active with Biopython have administrative capabilities
>> for the project's Bugzilla tracker? The version numbers are a wee
>> bit out of date.
>
> They are, aren't they! I asked on the list last month about this,
> and updating the component fields too:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2007-March/002652.html
>
> As no-one on the list has come forward, I guess one of us should get
> in touch with the relevant Open Bio people, probably by emailing
> "support" at the domain helpdesk.open-bio.org
>
> Who needs/wants bugzilla admin rights?
I've been in touch with Jason Stajich and he has done some magic:
Michiel and I can now creategroups, editclassifications,
editcomponents, editkeywords. I think that's all we need?
I have initially added 1.42 and 1.43 to the version field for Biopython
in bugzilla.
I would also propose we have a few new components, such as PDB, Nexus
and SeqIO (or perhaps rather than SeqIO something more general like
sequence parsing).
Peter
From jfeala at gmail.com Tue May 15 12:42:30 2007
From: jfeala at gmail.com (Jake Feala)
Date: Tue, 15 May 2007 09:42:30 -0700
Subject: [Biopython-dev] interaction networks in biopython
Message-ID: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
Hello Biopython people -
With all the new research in genome-wide cellular interaction
networks I was a little surprised not to see much support for these
type of data in Biopython. I know that Bioperl has a networks package
that looks like the kind of thing that I would love to also see in
Python for all the obvious reasons.
First - has this already been done and I missed it? All I could find
were a few scattered and application-specific scripts across the web,
plus the Pathway package in BioPython.
If not, then would there be any interest in development along these
lines? A while back I wrote a few scripts that parse interaction
datasets, stick them into a MySQL database, and retrieve the
interactions into a Network object that can be used to analyze the
graph of nodes and links. I would be glad to update these to fit into
the biopython framework, as it would be useful to my own research.
One caveat is that I am an engineering PhD student and my programming
skills are mostly self-taught beyond two Java courses, so I might need
a little guidance in testing and preparing the code for distribution.
I have only ever written code for my own personal research but I think
my style is decent and I would love to get better.
Any opinion or advice?
Thanks
-Jake Feala
Bioengineering Dept.
University of California, San Diego
From edschofield at gmail.com Tue May 15 14:37:30 2007
From: edschofield at gmail.com (Ed Schofield)
Date: Tue, 15 May 2007 19:37:30 +0100
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
References: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
Message-ID: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com>
On 5/15/07, Jake Feala wrote:
> Hello Biopython people -
>
> With all the new research in genome-wide cellular interaction
> networks I was a little surprised not to see much support for these
> type of data in Biopython. I know that Bioperl has a networks package
> that looks like the kind of thing that I would love to also see in
> Python for all the obvious reasons.
>
> First - has this already been done and I missed it? All I could find
> were a few scattered and application-specific scripts across the web,
> plus the Pathway package in BioPython.
>
> If not, then would there be any interest in development along these
> lines? A while back I wrote a few scripts that parse interaction
> datasets, stick them into a MySQL database, and retrieve the
> interactions into a Network object that can be used to analyze the
> graph of nodes and links. I would be glad to update these to fit into
> the biopython framework, as it would be useful to my own research.
>
> One caveat is that I am an engineering PhD student and my programming
> skills are mostly self-taught beyond two Java courses, so I might need
> a little guidance in testing and preparing the code for distribution.
> I have only ever written code for my own personal research but I think
> my style is decent and I would love to get better.
>
> Any opinion or advice?
This would interest me too; I'd be glad to have such functionality in
BioPython. I can offer you some guidance on Python, packaging and
testing, and (if you need it) use of external array packages.
-- Ed
From yair.benita at gmail.com Tue May 15 15:25:27 2007
From: yair.benita at gmail.com (Yair Benita)
Date: Tue, 15 May 2007 15:25:27 -0400
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com>
Message-ID:
I would be happy to contribute to this too.
Currently I have a python script that uses HPRD to generate protein protein
interaction maps. I have deferent filtering methods to display only classes
of proteins or only links to a specific kegg pathway. It will need a bit of
work before I can submit this to CVS. As for drawing the map, I am currently
generating a dot file that can be converted to an image using GRAPHVIZ. If
anyone wants to suggest anything else, please do.
Yair
on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
> On 5/15/07, Jake Feala wrote:
>> Hello Biopython people -
>>
>> With all the new research in genome-wide cellular interaction
>> networks I was a little surprised not to see much support for these
>> type of data in Biopython. I know that Bioperl has a networks package
>> that looks like the kind of thing that I would love to also see in
>> Python for all the obvious reasons.
>>
>> First - has this already been done and I missed it? All I could find
>> were a few scattered and application-specific scripts across the web,
>> plus the Pathway package in BioPython.
>>
>> If not, then would there be any interest in development along these
>> lines? A while back I wrote a few scripts that parse interaction
>> datasets, stick them into a MySQL database, and retrieve the
>> interactions into a Network object that can be used to analyze the
>> graph of nodes and links. I would be glad to update these to fit into
>> the biopython framework, as it would be useful to my own research.
>>
>> One caveat is that I am an engineering PhD student and my programming
>> skills are mostly self-taught beyond two Java courses, so I might need
>> a little guidance in testing and preparing the code for distribution.
>> I have only ever written code for my own personal research but I think
>> my style is decent and I would love to get better.
>>
>> Any opinion or advice?
>
> This would interest me too; I'd be glad to have such functionality in
> BioPython. I can offer you some guidance on Python, packaging and
> testing, and (if you need it) use of external array packages.
>
> -- Ed
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Tue May 15 15:19:23 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 May 2007 20:19:23 +0100
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
Message-ID: <464A07BB.8020206@maubp.freeserve.co.uk>
Chris Lasher wrote:
> Since no one else has volunteered, I'm taking up responsibility for
> the transition. I got the ball moving by contacting "support at
> open-bio.org" to get alert them of our interest and get any contacts
> we'll need to make this happen. Also, if anybody on the list has any
> information that would be helpful in this (e.g., who administers the
> CVS repo) please feel free to send it along. Likewise, feel free to
> raise any questions, concerns, and comments on the list.
Did you get any information from the Open Bioinformatics Foundation guys
about moving from CVS to subversion?
Peter
From lists.steve at arachnedesign.net Tue May 15 15:56:46 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Tue, 15 May 2007 15:56:46 -0400
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
Message-ID:
Hi,
On May 15, 2007, at 3:25 PM, Yair Benita wrote:
> I would be happy to contribute to this too.
> Currently I have a python script that uses HPRD to generate protein
> protein
> interaction maps. I have deferent filtering methods to display only
> classes
> of proteins or only links to a specific kegg pathway. It will need
> a bit of
> work before I can submit this to CVS. As for drawing the map, I am
> currently
> generating a dot file that can be converted to an image using
> GRAPHVIZ. If
> anyone wants to suggest anything else, please do.
I've been using NetworkX[1] to play w/ networks/graphs interactively.
You can display them if you have matplotlib installed, and can save
the graphs to dot format as well.
-steve
[1] NetworkX: https://networkx.lanl.gov/wiki
>
> Yair
>
>
> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>
>> On 5/15/07, Jake Feala wrote:
>>> Hello Biopython people -
>>>
>>> With all the new research in genome-wide cellular interaction
>>> networks I was a little surprised not to see much support for these
>>> type of data in Biopython. I know that Bioperl has a networks
>>> package
>>> that looks like the kind of thing that I would love to also see in
>>> Python for all the obvious reasons.
>>>
>>> First - has this already been done and I missed it? All I could
>>> find
>>> were a few scattered and application-specific scripts across the
>>> web,
>>> plus the Pathway package in BioPython.
>>>
>>> If not, then would there be any interest in development along these
>>> lines? A while back I wrote a few scripts that parse interaction
>>> datasets, stick them into a MySQL database, and retrieve the
>>> interactions into a Network object that can be used to analyze the
>>> graph of nodes and links. I would be glad to update these to fit
>>> into
>>> the biopython framework, as it would be useful to my own research.
>>>
>>> One caveat is that I am an engineering PhD student and my
>>> programming
>>> skills are mostly self-taught beyond two Java courses, so I might
>>> need
>>> a little guidance in testing and preparing the code for
>>> distribution.
>>> I have only ever written code for my own personal research but I
>>> think
>>> my style is decent and I would love to get better.
>>>
>>> Any opinion or advice?
>>
>> This would interest me too; I'd be glad to have such functionality in
>> BioPython. I can offer you some guidance on Python, packaging and
>> testing, and (if you need it) use of external array packages.
>>
>> -- Ed
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From salish at picasso.ucsf.edu Tue May 15 16:36:05 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Tue, 15 May 2007 13:36:05 -0700
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface
Message-ID: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
Hello everyone,
I started using Biopython in my research and I needed a way to
write GenBank files from a SeqRecord (which was parsed from other
GenBank/etc files). So I wrote something up. It uses the SeqIO
interface and behaves like the fasta writer.
The SeqIO.write(record, handle, "genbank") interface accepts "record"
as either a SeqRecord generator with multiple records or a single
record from SeqRecord. So record = SeqRecord or record =
SeqRecord.next() both work. (I'm a relatively new to Python, so please
excuse any bad terminology or stylistic deficiencies).
The changes are: a new file called GenBankWriter.py in Bio/GenBank.
Small changes to the __init__.py of Bio/GenBank. Changes to the
_feed_first_line function of Scanner.py of Bio/GenBank.
I had to change the way Bio/GenBank/Scanner.py reads the Locus line of
a GenBank file in order to handle missing data and newer molecule
types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). I also add/change a couple
of lines in __init__.py to store whether a sequence was linear or
circular and to store the string that encodes its molecule type
(ss-RNA, etc). The output of SeqIO.write(record,handle,"genbank") is
functionally identical to a GenBank file from NCBI except for some
spacing and word wrap issues.
What is the best way to submit new code for review? Whom do I send it
to and should I send only the modified files?
I've included one of my test scripts below just to show how it works.
(Does anyone suggest any changes in the interface?)
Thank you.
Sincerely,
Howard Salis
Postdoctoral Scholar
UC San Francisco
#ASimpleTest.py
"""A vigorous exercise of the GenBankWriter class and the SeqIO interface."""
from Bio import SeqIO
from Bio import GenBank
working_dir = "E:\\Plasmids\\"
#Get some arbitrarily chosen GenBank files (these are relatively small ones)
gi_list = GenBank.search_for("EF470550 OR EF470551")
print gi_list
ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
#Write the pair of strings to a single file.
handle = open(working_dir + "Source.gb","w")
for gi in gi_list:
handle.write(str(ncbi_dict[gi]))
handle.close()
#Parse the Source file into a SeqRecord generator (two records)
handle = open(working_dir + "Source.gb","r")
records = SeqIO.parse(handle,"genbank")
#write many records to a single GenBank file
file = open(working_dir + "ManyRecords.gb","w")
SeqIO.write(records,file,"genbank")
file.close()
handle.close()
#----
#Parse the Source file into a SeqRecord generator (two records)
handle = open(working_dir +"Source.gb","r")
records = SeqIO.parse(handle,"genbank")
#Write individual records into their own GenBank file
counter=0
for record in records:
counter+=1
file = open(working_dir + "OneFile_" + str(counter) + ".gb","w")
SeqIO.write(record,file,"genbank")
file.close()
handle.close()
#Open then back up again, parse them, and write them to a single file
handle = open(working_dir + "ManyRecords_Out.gb","w")
for num in range(1,counter+1):
print num
file = open(working_dir +"OneFile_" + str(num) + ".gb","r")
records = SeqIO.parse(file,"genbank")
SeqIO.write(records,handle,"genbank")
file.close()
handle.close()
#Compare the original GenBank file in Source.gb to the GenBankWriter'd one.
original = open(working_dir +"Source.gb","r")
newone = open(working_dir + "ManyRecords_Out.gb","r")
records_original = SeqIO.parse(original,"genbank")
records_newone = SeqIO.parse(newone,"genbank")
for (record_original,record_newone) in zip(records_original,records_newone):
print str(record_original)
print str(record_newone)
original.close()
newone.close()
print "Done"
From biopython-dev at maubp.freeserve.co.uk Tue May 15 16:59:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 May 2007 21:59:55 +0100
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO
interface
In-Reply-To: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
Message-ID: <464A1F4B.9020705@maubp.freeserve.co.uk>
Howard Salis wrote:
> Hello everyone,
>
> I started using Biopython in my research and I needed a way to
> write GenBank files from a SeqRecord (which was parsed from other
> GenBank/etc files). So I wrote something up. It uses the SeqIO
> interface and behaves like the fasta writer.
Sounds nice - its something I've been thinking about doing myself, but I
wanted to do both both GenBank and EMBL, sharing the feature table
writing code.
Something else to keep in mind is writing any SeqRecord to a GenBank (or
EMBL) file, even if it did not get created from a GenBank or EMBL file
and is therefore lacking lots of annotation.
> The changes are: a new file called GenBankWriter.py in Bio/GenBank.
> Small changes to the __init__.py of Bio/GenBank. Changes to the
> _feed_first_line function of Scanner.py of Bio/GenBank.
>
> I had to change the way Bio/GenBank/Scanner.py reads the Locus line of
> a GenBank file in order to handle missing data and newer molecule
> types (e.g. ss-RNA, ds-DNA, mt-DNA, etc).
That was recently fixed on Bug 2289
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
> I also add/change a couple
> of lines in __init__.py to store whether a sequence was linear or
> circular and to store the string that encodes its molecule type
> (ss-RNA, etc).
I thought we already stored this information - but I'm not sure off hand.
> The output of SeqIO.write(record,handle,"genbank") is
> functionally identical to a GenBank file from NCBI except for some
> spacing and word wrap issues.
Good :)
> What is the best way to submit new code for review? Whom do I send it
> to and should I send only the modified files?
You could email it directly to me, but it would be better to create a
bug (an "enhancement") and then attached the changes to the bug. Edited
versions of files will do, but patch files are best.
You should use the unix "diff" command line tool to create a patch file.
One way to do this on Windows is to install cygwin...
> I've included one of my test scripts below just to show how it works.
> (Does anyone suggest any changes in the interface?)
Looking at the code, at first glance it looks like you are hooking into
the existing Bio.SeqIO interface nicely.
I look forward to seeing your code Howard.
Peter
From bugzilla-daemon at portal.open-bio.org Tue May 15 21:55:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 May 2007 21:55:14 -0400
Subject: [Biopython-dev] [Bug 2294] New: These patches allow one to write a
GenBank file using the SeqIO interface
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
Summary: These patches allow one to write a GenBank file using
the SeqIO interface
Product: Biopython
Version: 1.43
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: howard.salis at gmail.com
The SeqIO interface currently reads from, but does not write to the GenBank
format. The GenBank format is widely used and is often chosen as the data
storage format for many plasmid, genome, and other nucleotide editors. By
giving Biopython the capability of writing annotated sequences to the GenBank
format, one can use Biopython to read in raw sequences, analyze and annotate
them, and then view them in a nucleotide visual editor. The following patches
do exactly this and use the current SeqIO interface to do it.
The following attached patches enable the command
SeqIO.write(record,handle,"genbank"), where handle is an open, writable
file-object and record is _either_ a SeqRecord generator or the result of one
of its iterations. That is, if one did manyrecords =
SeqIO.parse(handle,"genbank") or onerecord = manyrecords.next(), then one could
pass either manyrecords or onerecord to SeqIO.write(). If a generator
containing multiple records is passed, all records are written to a single
GenBank file. If one record is passed, it is written to file. The file is not
closed, though, and may be called multiple times to write additional records to
file.
The attached patches make small modifications to Bio/SeqIO/__init__.py and
Bio/SeqIO/InsdcIO.py. The _feed_first_line function in Bio/GenBank/Scanner.py
is altered to handle missing data (it uses a very Pythonic dictionary of test
lambda functions to parse the meaning of words). Finally, a new file is created
called Bio/GenBank/GenBankWriter.py.
Questions, Comments, Suggestions, Criticisms, etc are welcome.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 15 21:56:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 May 2007 21:56:32 -0400
Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a
GenBank file using the SeqIO interface
In-Reply-To:
Message-ID: <200705160156.l4G1uWRZ005077@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
howard.salis at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|biopython-dev at biopython.org |howard.salis at gmail.com
Status|NEW |ASSIGNED
------- Comment #1 from howard.salis at gmail.com 2007-05-15 21:56 EST -------
Created an attachment (id=654)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=654&action=view)
patch to Bio/GenBank/Scanner.py (alters _feed_first_line under GenBank class)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
From salish at picasso.ucsf.edu Tue May 15 22:34:08 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Tue, 15 May 2007 19:34:08 -0700
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO
interface
In-Reply-To: <464A1F4B.9020705@maubp.freeserve.co.uk>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
<464A1F4B.9020705@maubp.freeserve.co.uk>
Message-ID: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
On 5/15/07, Peter wrote:
> Sounds nice - its something I've been thinking about doing myself, but I
> wanted to do both both GenBank and EMBL, sharing the feature table
> writing code.
Yep, since EMBL and GenBank share the same feature format, I've
separated the "foreword", feature table, and sequence write functions.
So if someone wants to write the EMBL writer, they just need to write
the appropriate foreword. I think the sequence data is stored the same
too? Is that correct?
> Something else to keep in mind is writing any SeqRecord to a GenBank (or
> EMBL) file, even if it did not get created from a GenBank or EMBL file
> and is therefore lacking lots of annotation.
Very true. The GenBankWriter.py will either leave these fields blank,
leave out their keywords entirely if they are optional, or add
something like or when it's necessary
to have something there.
> > I also add/change a couple
> > of lines in __init__.py to store whether a sequence was linear or
> > circular and to store the string that encodes its molecule type
> > (ss-RNA, etc).
>
> I thought we already stored this information - but I'm not sure off hand.
Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA())
that says whether it's DNA, RNA, peptide, etc, but even if I matched
these ups with strings, then the "ss-", "ds-", etc part would be
missing. I just saved the exact wording of the sequence type (e.g.
"ds-DNA", "ss-RNA", etc) to an dictionary key named
self.data.annotations["sequence_type"] in the _FeatureConsumer class
under GenBank. This is in addition to the alphabet of the sequence so
it shouldn't conflict.
> You could email it directly to me, but it would be better to create a
> bug (an "enhancement") and then attached the changes to the bug. Edited
> versions of files will do, but patch files are best.
Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294
> I look forward to seeing your code Howard.
>
> Peter
Thank you! And I hope to continue to contribute to Biopython.
-Howard
From biopython-dev at maubp.freeserve.co.uk Wed May 16 03:53:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 May 2007 08:53:41 +0100
Subject: [Biopython-dev] About a new GenBankWriter class with
SeqIO interface
In-Reply-To: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk>
<9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
Message-ID: <464AB885.80305@maubp.freeserve.co.uk>
Howard Salis wrote:
>> Sounds nice - its something I've been thinking about doing myself,
>> but I wanted to do both both GenBank and EMBL, sharing the feature
>> table writing code.
>
> Yep, since EMBL and GenBank share the same feature format, I've
> separated the "foreword", feature table, and sequence write
> functions.
Using "foreword / features / sequence" avoids clashing with the terms
"header" and "footer" used in Bio.SeqIO to mean parts of a
multi-sequence file which do not belong to a specific record. Maybe I
should update Bio/GenBank/Scanner.py to use similar terminology...
> So if someone wants to write the EMBL writer, they just need to write
> the appropriate foreword.
There is also the issue of translation between EMBL/GenBank terminology,
for example where someone has read in an EMBL file and wants to write it
out as a GenBank file. For a simple example, the division class should
probably map: {'PRI': 'MAM', 'BCT': 'PRO', 'UNA': 'UNC'}
> I think the sequence data is stored the same too? Is that correct?
Actually, the way the sequence is printed out is slightly different.
>>> I also add/change a couple of lines in __init__.py to store
>>> whether a sequence was linear or circular and to store the string
>>> that encodes its molecule type (ss-RNA, etc).
>> I thought we already stored this information - but I'm not sure off
>> hand.
>
> Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA())
> that says whether it's DNA, RNA, peptide, etc, but even if I matched
> these ups with strings, then the "ss-", "ds-", etc part would be
> missing. I just saved the exact wording of the sequence type (e.g.
> "ds-DNA", "ss-RNA", etc) to an dictionary key named
> self.data.annotations["sequence_type"] in the _FeatureConsumer class
> under GenBank. This is in addition to the alphabet of the sequence so
> it shouldn't conflict.
That's probably a good idea. However, we would need to check what the
EMBL equivalents are and convert them when writing GenBank files. Maybe
we should just keep things simple and write one of RNA/DNA/Protein only?
> Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294
I have made some more specific comments on the bug. I this email I have
tried to stick to the broader picture.
Peter
From jfeala at gmail.com Wed May 16 13:25:37 2007
From: jfeala at gmail.com (Jake Feala)
Date: Wed, 16 May 2007 10:25:37 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
Message-ID: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Thanks Ed and Yair, I'm really glad there's some interest in this!
I'll get started on dusting off my code and adding more documentation.
Steve - great suggestion. I had already seen at NetworkX and was
already thinking about switching over to this as the back-end graph
representation. Are there any issues that I should think about when
creating these extra dependencies?
Also, what is the next step in this process? Should we agree on an
API and class hierarchy before we start dumping code on each other?
Which aspects can we make compatible with other Biopython objects? (I
was thinking maybe parsers for the interaction datasets and the SQL
interface)
-Jake
On 5/15/07, Steve Lianoglou wrote:
> Hi,
>
> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>
> > I would be happy to contribute to this too.
> > Currently I have a python script that uses HPRD to generate protein
> > protein
> > interaction maps. I have deferent filtering methods to display only
> > classes
> > of proteins or only links to a specific kegg pathway. It will need
> > a bit of
> > work before I can submit this to CVS. As for drawing the map, I am
> > currently
> > generating a dot file that can be converted to an image using
> > GRAPHVIZ. If
> > anyone wants to suggest anything else, please do.
>
> I've been using NetworkX[1] to play w/ networks/graphs interactively.
> You can display them if you have matplotlib installed, and can save
> the graphs to dot format as well.
>
> -steve
>
> [1] NetworkX: https://networkx.lanl.gov/wiki
>
> >
> > Yair
> >
> >
> > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
> >
> >> On 5/15/07, Jake Feala wrote:
> >>> Hello Biopython people -
> >>>
> >>> With all the new research in genome-wide cellular interaction
> >>> networks I was a little surprised not to see much support for these
> >>> type of data in Biopython. I know that Bioperl has a networks
> >>> package
> >>> that looks like the kind of thing that I would love to also see in
> >>> Python for all the obvious reasons.
> >>>
> >>> First - has this already been done and I missed it? All I could
> >>> find
> >>> were a few scattered and application-specific scripts across the
> >>> web,
> >>> plus the Pathway package in BioPython.
> >>>
> >>> If not, then would there be any interest in development along these
> >>> lines? A while back I wrote a few scripts that parse interaction
> >>> datasets, stick them into a MySQL database, and retrieve the
> >>> interactions into a Network object that can be used to analyze the
> >>> graph of nodes and links. I would be glad to update these to fit
> >>> into
> >>> the biopython framework, as it would be useful to my own research.
> >>>
> >>> One caveat is that I am an engineering PhD student and my
> >>> programming
> >>> skills are mostly self-taught beyond two Java courses, so I might
> >>> need
> >>> a little guidance in testing and preparing the code for
> >>> distribution.
> >>> I have only ever written code for my own personal research but I
> >>> think
> >>> my style is decent and I would love to get better.
> >>>
> >>> Any opinion or advice?
> >>
> >> This would interest me too; I'd be glad to have such functionality in
> >> BioPython. I can offer you some guidance on Python, packaging and
> >> testing, and (if you need it) use of external array packages.
> >>
> >> -- Ed
> >> _______________________________________________
> >> Biopython-dev mailing list
> >> Biopython-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From jhackney at stanford.edu Wed May 16 14:10:38 2007
From: jhackney at stanford.edu (Jason A. Hackney)
Date: Wed, 16 May 2007 11:10:38 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
References:
<12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Message-ID:
Hi All,
I'm also interested in an interaction network class for biopython.
I'm willing to contribute to the effort with either code review or
testing.
Cheers,
Jason
Jason A. Hackney
Postdoctoral Fellow
Department of Microbiology and Immunology
Stanford University
e-mail: jhackney at stanford.edu
lab phone: 650-724-3891
mobile: 650-283-6907
On May 16, 2007, at 10:25 AM, Jake Feala wrote:
> Thanks Ed and Yair, I'm really glad there's some interest in this!
> I'll get started on dusting off my code and adding more documentation.
>
> Steve - great suggestion. I had already seen at NetworkX and was
> already thinking about switching over to this as the back-end graph
> representation. Are there any issues that I should think about when
> creating these extra dependencies?
>
> Also, what is the next step in this process? Should we agree on an
> API and class hierarchy before we start dumping code on each other?
> Which aspects can we make compatible with other Biopython objects? (I
> was thinking maybe parsers for the interaction datasets and the SQL
> interface)
>
> -Jake
>
>
> On 5/15/07, Steve Lianoglou wrote:
>> Hi,
>>
>> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>>
>>> I would be happy to contribute to this too.
>>> Currently I have a python script that uses HPRD to generate protein
>>> protein
>>> interaction maps. I have deferent filtering methods to display only
>>> classes
>>> of proteins or only links to a specific kegg pathway. It will need
>>> a bit of
>>> work before I can submit this to CVS. As for drawing the map, I am
>>> currently
>>> generating a dot file that can be converted to an image using
>>> GRAPHVIZ. If
>>> anyone wants to suggest anything else, please do.
>>
>> I've been using NetworkX[1] to play w/ networks/graphs interactively.
>> You can display them if you have matplotlib installed, and can save
>> the graphs to dot format as well.
>>
>> -steve
>>
>> [1] NetworkX: https://networkx.lanl.gov/wiki
>>
>>>
>>> Yair
>>>
>>>
>>> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>>>
>>>> On 5/15/07, Jake Feala wrote:
>>>>> Hello Biopython people -
>>>>>
>>>>> With all the new research in genome-wide cellular interaction
>>>>> networks I was a little surprised not to see much support for
>>>>> these
>>>>> type of data in Biopython. I know that Bioperl has a networks
>>>>> package
>>>>> that looks like the kind of thing that I would love to also see in
>>>>> Python for all the obvious reasons.
>>>>>
>>>>> First - has this already been done and I missed it? All I could
>>>>> find
>>>>> were a few scattered and application-specific scripts across the
>>>>> web,
>>>>> plus the Pathway package in BioPython.
>>>>>
>>>>> If not, then would there be any interest in development along
>>>>> these
>>>>> lines? A while back I wrote a few scripts that parse interaction
>>>>> datasets, stick them into a MySQL database, and retrieve the
>>>>> interactions into a Network object that can be used to analyze the
>>>>> graph of nodes and links. I would be glad to update these to fit
>>>>> into
>>>>> the biopython framework, as it would be useful to my own research.
>>>>>
>>>>> One caveat is that I am an engineering PhD student and my
>>>>> programming
>>>>> skills are mostly self-taught beyond two Java courses, so I might
>>>>> need
>>>>> a little guidance in testing and preparing the code for
>>>>> distribution.
>>>>> I have only ever written code for my own personal research but I
>>>>> think
>>>>> my style is decent and I would love to get better.
>>>>>
>>>>> Any opinion or advice?
>>>>
>>>> This would interest me too; I'd be glad to have such
>>>> functionality in
>>>> BioPython. I can offer you some guidance on Python, packaging and
>>>> testing, and (if you need it) use of external array packages.
>>>>
>>>> -- Ed
>>>> _______________________________________________
>>>> Biopython-dev mailing list
>>>> Biopython-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>>
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Wed May 16 18:05:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 May 2007 23:05:44 +0100
Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a
GenBank file using the SeqIO interface
In-Reply-To: <200705161904.l4GJ4pue012542@portal.open-bio.org>
References: <200705161904.l4GJ4pue012542@portal.open-bio.org>
Message-ID: <464B8038.9020900@maubp.freeserve.co.uk>
Hi Howard,
I'm replying to the mailing list as you've raised a few more general
issues on bug 2294.
Peter wrote:
>> In the writer class, for your write_file(self, records) method, you allow
>> explicitly check for and allow "records" to be a single SeqRecord. Don't. Any
>> such "helpfulness" should be done in Bio.SeqIO.write() only, and not the
>> individual write_file. Otherwise we'll end up with a situation where some
>> writers are "helpful" and others are not.
Howard replied:
> Currently, the SeqIO's write function is
>
> def write(sequences, handle, format):
> ...
>
> I can add checks to see if "sequences" is (a) a generator, (b) a SeqRecord
> object, or (c) something else. If (a), then call
> writer_class(handle).write_file(sequences). If (b), then call
> writer_class(handle).write_record(sequences). If (c), spit out an error (for
> now).
I've added a check in Bio.SeqIO.write() for the "sequences" argument
being a SeqRecord (your case b), and if so it now raises a ValueError.
This is better than whatever cryptic error would have happened.
I agree that it might be "nicer" if Bio.SeqIO.write() would also accept
a SeqRecord object as input and did the expected thing, but having a
fixed simple API is more straight forward.
For comparison, see the previous discussions on the mailing list about
having the file argument accepting either a handle or a filename (it was
agreed that we would accept handles only).
> Ok, so the standard is very exact in what the LOCUS line should be. However,
> I've found that many programs do not write Genbank files exactly according to
> this standard! So we might want to make the Genbank parser a bit more forgiving
> to small changes in the spacing of Locus line, especially since many programs
> leave out keywords.
Have you got some examples? I would be keen to add a few more test
cases for reasonable GenBank variations.
> As it stands now, the patch code can handle missing keywords in the LOCUS line.
If it doesn't already, the existing code column based code can easily do
this too.
> For example, the code defines a pair of dictionaries with lambda functions as
> their keys
>
> ...
>
> I know this looks crazy, but it works really well. Where else but Python can I
> have a dictionary / hash / whatever with the key being a function! :) Play
> around with the code and you'll see how it works.
Crazy code is scary ;) I'll try and have a play with this at the weekend.
Note that this issue (parsing the LOCUS line) is a bit tangential to
writing GenBank records.
>> It also looks like when you write the LOCUS line you are not following
>> the column based definition
>
> I'll fix this. The writing of the Genbank file should follow the standard to
> the exactitude.
I agree completely - As a general principle we should be a little bit
flexible on reading files, but very strict on output.
Regards,
Peter
From chris.lasher at gmail.com Sat May 19 16:21:03 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 19 May 2007 16:21:03 -0400
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <464A07BB.8020206@maubp.freeserve.co.uk>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com>
<128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com>
<45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
<464A07BB.8020206@maubp.freeserve.co.uk>
Message-ID: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
On 5/15/07, Peter wrote:
> Did you get any information from the Open Bioinformatics Foundation guys
> about moving from CVS to subversion?
I didn't, with regards to public anonymous access to the Subversion
repositories. I'm also on impromptu leave until this upcoming Monday,
but we'll have this up and running by the end of the month.
Chris
From O.Doehring at cs.ucl.ac.uk Mon May 21 15:45:36 2007
From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk)
Date: 21 May 2007 20:45:36 +0100
Subject: [Biopython-dev] Biopython to parse not only .pdb-files but also
NACCESS .asa files
Message-ID:
Dear community,
I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent
Accessible Area Calculations ' to calculate two features which are not
contained in standard .pdb-files. These two features are atomic
accessiblity and van der Waal radius. As can be read in the readme file at
http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output
files' and at the PDB-Format site at
http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'.
NACCESS does the following: 'The output format is PDB, with B-factors and
occupancies removed, then atomic accessiblity in square Angstroms, followed
by the assigned van der Waal radius.' Note that Occupancy gets replaced by
atomic accessiblity and B-factor by the van der Waal radius. This 'new'
.pdb-file has extension .asa.
I chose a quite straight-forward approach: I wanted to use Biopython as
before, e.g. calling the B-Factor method but yielding the atomic
accessiblity instead. But Biopython seems to type-check the .asa-file and
complains that the B-factor is not of type float.
Is there a way to access the data of .asa-files programmatically via the
Biopython library? The only other way then seems to write a parser for
.asa-files and to figure out which atomic element in the .pdb-file
corresponds to the respective one in the .asa-file and finally to retrieve
the wanted values for atomic accessiblity and van der Waal radius.
Here are some more technical details. As an example I chose the '1DHR'
protein:
------------------------------------------------------------------------------
def __init__(self,structure_id="1DHR",indices=[ 0]):
# which residues are part of the patch
self.indices = indices
# If 1 (DEFAULT), the exceptions are caught, but some residues or atoms
will be missing.
# THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE!
self.p=PDBParser(PERMISSIVE=
1)
# which protein to analyse
self.structure_id = structure_id
self.fileName = self.structure_id +
'.asa'
self.structure = self.p.get_structure(self.structure_id, self.fileName)
------------------------------------------------------------------------------
Error message:
Traceback (most recent call last):
File "C:\Dokumente und Einstellungen\Renate
D?hring\workspace\test\src\root\nested\compactness.py", line 249, in
c = compact(indices=[0,1])
File "C:\Dokumente und Einstellungen\Renate
D?hring\workspace\test\src\root\nested\compactness.py", line 17, in
__init__ self.structure = self.p.get_structure(self.structure_id,
self.fileName)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in
get_structure self._parse(file.readlines())
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in
_parse self.trailer=self._parse_coordinates(coords_trailer)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in
_parse_coordinates bfactor=float(line[60:66])
ValueError: invalid literal for float(): 31 1.
------------------------------------------------------------------------------
I hope this question above was not discussed before but neither the search
engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor
could I find anything useful via a google search restricted to the archive
using the 'site' attribute.
What do you recommend for my situation. Many thanks!
Yours,
Orlando
From edschofield at gmail.com Tue May 22 12:57:49 2007
From: edschofield at gmail.com (Ed Schofield)
Date: Tue, 22 May 2007 17:57:49 +0100
Subject: [Biopython-dev] [BioPython] Biopython to parse not only
.pdb-files but also NACCESS .asa files
In-Reply-To:
References:
Message-ID: <1b5a37350705220957o24f6a436k89d60764729695da@mail.gmail.com>
On 21 May 2007 20:45:36 +0100, O.Doehring at cs.ucl.ac.uk
wrote:
>
> ValueError: invalid literal for float(): 31 1.
>
> ...
>
> What do you recommend for my situation. Many thanks!
Is that a space between 31 and 1? There's your problem. My advice is to insert
import pdb
pdb.set_trace()
at line 159 in PDBParser.py and check out why the columns in your data
are misaligned with what PDBParser.py expects. A quick scan of
nac_readme.html implies that perhaps you need the -f argument to give
you the full output format?
But if you need to write your own parser for .asa files, you could use
_parse_coordinates(self, coords_trailer) as a template.
-- Ed
From dalke at dalkescientific.com Sat May 26 06:10:21 2007
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 May 2007 12:10:21 +0200
Subject: [Biopython-dev] [Biopython-announce] is this supposed to be
really slow?
In-Reply-To:
References:
<20070525233151.GA4507@caltech.edu>
Message-ID:
(Move this from the -announce to the -dev list)
Bryan Smith, replying to Titus Brown wrote:
> i did see this constraint for only one request per 3 seconds, but did
> not realize each time i went through my loop that this was a separate
> request.
> is there anything to do about this constraint?
In your "search_for" call add delay=0.
def search_for(search, reldate=None, mindate=None, maxdate=None,
batchsize=100, delay=2, callback_fn=None,
start_id=0, max_ids=None):
"""search_for(search[, reldate][, mindate][, maxdate]
[, batchsize][, delay][, callback_fn][, start_id][, max_ids]) ->
ids
Search PubMed and return a list of the PMID's that match the
criteria. search is the search string used to search the
database. reldate is the number of dates prior to the current
date to restrict the search. mindate and maxdate are the dates to
restrict the search, e.g. 2002/01/01. batchsize specifies the
number of ids to return at one time. By default, it is set to
10000, the maximum. delay is the number of seconds to wait
between queries (default 2). callback_fn is an optional callback
function that will be called as passed a PMID as results are
retrieved. start_id specifies the index of the first id to
retrieve and max_ids specifies the maximum number of id's to
retrieve.
in your Dictionary creation also add delay=0
class Dictionary:
def __init__(self, delay=5.0, parser=None):
"""Dictionary(delay=5.0, parser=None)
Create a new Dictionary to access PubMed. parser is an
optional
parser (e.g. Medline.RecordParser) object to change the results
into another form. If set to None, then the raw contents of
the
file will be returned. delay is the number of seconds to wait
between each query.
>> I personally tend to just use the NCBI retrieval URLs directly, but
>> that's kind of ugly.
NCBI also watches those requests, and if you do too many
you might get a warning or be blocked off, or so rumor has it.
BTW, in your original code you can simplify
> for idx in range( len( termIds ) ):
> pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ]
> ].publication_date[ 0:4 ] )
> idx = idx + 1
to
for idx, termId in enumerate(termIds):
pubDates[idx] = int(medlineDict[termId]].publication_date[:4])
Andrew
dalke at dalkescientific.com
From chris.lasher at gmail.com Thu May 31 00:30:38 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Thu, 31 May 2007 00:30:38 -0400
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com>
<128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com>
<45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
<464A07BB.8020206@maubp.freeserve.co.uk>
<128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
Message-ID: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
On 5/19/07, Chris Lasher wrote:
> On 5/15/07, Peter wrote:
> > Did you get any information from the Open Bioinformatics Foundation guys
> > about moving from CVS to subversion?
>
> I didn't, with regards to public anonymous access to the Subversion
> repositories. I'm also on impromptu leave until this upcoming Monday,
> but we'll have this up and running by the end of the month.
>
> Chris
>
I'm obviously missing another target, and BOSC 2007 is fast
approaching. I'm being held up by 4 files that are in the CVS
repository that were foolishly committed with carriage returns (i.e.,
"\r") in the filenames. How that's possible, I have no clue, but I
need to alter the data in the CVS repository so those filenames are
correct, or otherwise completely removed, over the entire history of
those files. Does anyone have any experience with the internals of CVS
repositories? I definitely do not.
Chris
From biopython-dev at maubp.freeserve.co.uk Thu May 31 05:07:59 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 31 May 2007 10:07:59 +0100
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
<128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
Message-ID: <465E906F.1080704@maubp.freeserve.co.uk>
Chris Lasher wrote:
> On 5/19/07, Chris Lasher wrote:
>> On 5/15/07, Peter wrote:
>>> Did you get any information from the Open Bioinformatics Foundation guys
>>> about moving from CVS to subversion?
>> I didn't, with regards to public anonymous access to the Subversion
>> repositories. I'm also on impromptu leave until this upcoming Monday,
>> but we'll have this up and running by the end of the month.
>>
>> Chris
>>
>
> I'm obviously missing another target, and BOSC 2007 is fast
> approaching.
Are you going to BOSC 2007 Chris?
> I'm being held up by 4 files that are in the CVS
> repository that were foolishly committed with carriage returns (i.e.,
> "\r") in the filenames. How that's possible, I have no clue, but I
> need to alter the data in the CVS repository so those filenames are
> correct, or otherwise completely removed, over the entire history of
> those files. Does anyone have any experience with the internals of CVS
> repositories? I definitely do not.
How strange! I have no experience with the internals of CVS so can't
help you there. What are the four offending files? Maybe we could just
purge them for the move to SVN.
Also, I suspect (but have not checked this) that a few of the examples
files in the unit tests have been checked in as binary files rather than
text (due to some odd differences in new lines across platforms). Again,
a CVS expert would probably be able to generate a list of all "binary"
files in the repository fairly easily.
Peter
From bugzilla-daemon at portal.open-bio.org Thu May 31 09:14:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:14:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311314.l4VDEV2X031189@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:14 EST -------
Created an attachment (id=661)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=661&action=view)
Updated version of Bio/Cluster/cluster.c
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 31 09:15:17 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:15:17 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311315.l4VDFH6D031294@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:15 EST -------
Created an attachment (id=662)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=662&action=view)
Updated version of Bio/Cluster/clustermodule.c
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 31 09:17:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:17:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311317.l4VDHVB7031418@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:17 EST -------
Could you try with the updated Bio/Cluster/cluster.c,
Bio/Cluster/clustermodule.c (see attachments)? These should solve the problems
with the Cluster unit test. If they work fine, I'll upload them to CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jfeala at gmail.com Thu May 31 19:52:36 2007
From: jfeala at gmail.com (Jake Feala)
Date: Thu, 31 May 2007 16:52:36 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
<12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Message-ID: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com>
Hi Everybody -
I've been thinking about the possible structure of a "BioNet" package,
and here is what I think would be most useful:
InteractionRecord.py - a storage object for biological interactions,
mirroring information stored in the PSI-MI (Proteomics Standards
Initiative - Molecular Interaction) XML standard - unless someone
knows a better one.
Network.py - network object inheriting a NetworkX graph class with
additional methods for manipulating an InteractionRecord stored with
each edge
InteractionIO - a submodule with parsers to read and write
interactions to/from Cytoscape, PSI-MI, and other formats or online
interaction databases
BioNetSQL - a submodule for storing and querying to a local SQL
database of interactions
I've started on the code, including parsers for Cytoscape, PSI-MI XML
files, and GRID flat files. I haven't fixed up my SQL scripts yet
because I want to rethink the database design. All the code is
available at http://cmrg.ucsd.edu/JakeFeala#software
Here is an example that worked fine for me:
from Network import *
f = open()
parser = GRIDIterator(f):
net = create_network()
net.load(parser)
Are there any suggestions, regarding (1) the standard for
InteractionRecord, (2) methods for the Network object, (3) structure
of the SQL database, (4) overall structure of the package? Also, does
anyone want to contribute to any specific part (e.g. Yair can add his
HPRD parser)?
Thanks!
-Jake
On 5/16/07, Jason A. Hackney wrote:
> Hi All,
>
> I'm also interested in an interaction network class for biopython. I'm
> willing to contribute to the effort with either code review or testing.
>
> Cheers,
>
> Jason
>
>
>
> Jason A. Hackney
>
> Postdoctoral Fellow
> Department of Microbiology and Immunology
> Stanford University
>
> e-mail: jhackney at stanford.edu
> lab phone: 650-724-3891
> mobile: 650-283-6907
>
>
>
>
>
> On May 16, 2007, at 10:25 AM, Jake Feala wrote:
>
> Thanks Ed and Yair, I'm really glad there's some interest in this!
> I'll get started on dusting off my code and adding more documentation.
>
> Steve - great suggestion. I had already seen at NetworkX and was
> already thinking about switching over to this as the back-end graph
> representation. Are there any issues that I should think about when
> creating these extra dependencies?
>
> Also, what is the next step in this process? Should we agree on an
> API and class hierarchy before we start dumping code on each other?
> Which aspects can we make compatible with other Biopython objects? (I
> was thinking maybe parsers for the interaction datasets and the SQL
> interface)
>
> -Jake
>
>
> On 5/15/07, Steve Lianoglou wrote:
> Hi,
>
> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>
>
> I would be happy to contribute to this too.
> Currently I have a python script that uses HPRD to generate protein
> protein
> interaction maps. I have deferent filtering methods to display only
> classes
> of proteins or only links to a specific kegg pathway. It will need
> a bit of
> work before I can submit this to CVS. As for drawing the map, I am
> currently
> generating a dot file that can be converted to an image using
> GRAPHVIZ. If
> anyone wants to suggest anything else, please do.
>
> I've been using NetworkX[1] to play w/ networks/graphs interactively.
> You can display them if you have matplotlib installed, and can save
> the graphs to dot format as well.
>
> -steve
>
> [1] NetworkX: https://networkx.lanl.gov/wiki
>
>
>
> Yair
>
>
> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>
>
> On 5/15/07, Jake Feala wrote:
> Hello Biopython people -
>
> With all the new research in genome-wide cellular interaction
> networks I was a little surprised not to see much support for these
> type of data in Biopython. I know that Bioperl has a networks
> package
> that looks like the kind of thing that I would love to also see in
> Python for all the obvious reasons.
>
> First - has this already been done and I missed it? All I could
> find
> were a few scattered and application-specific scripts across the
> web,
> plus the Pathway package in BioPython.
>
> If not, then would there be any interest in development along these
> lines? A while back I wrote a few scripts that parse interaction
> datasets, stick them into a MySQL database, and retrieve the
> interactions into a Network object that can be used to analyze the
> graph of nodes and links. I would be glad to update these to fit
> into
> the biopython framework, as it would be useful to my own research.
>
> One caveat is that I am an engineering PhD student and my
> programming
> skills are mostly self-taught beyond two Java courses, so I might
> need
> a little guidance in testing and preparing the code for
> distribution.
> I have only ever written code for my own personal research but I
> think
> my style is decent and I would love to get better.
>
> Any opinion or advice?
>
> This would interest me too; I'd be glad to have such functionality in
> BioPython. I can offer you some guidance on Python, packaging and
> testing, and (if you need it) use of external array packages.
>
> -- Ed
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From bugzilla-daemon at portal.open-bio.org Tue May 1 12:01:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 08:01:49 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011201.l41C1nXg017300@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-05-01 08:01 EST -------
Chris,
I was not able to replicate this bug on any of the platforms I've tried so far
(Windows 32-bits, Mac OS X, Unix, Linux). However, since it does occur on your
system, I still feel that this is a true bug that should be fixed. Would you be
willing to compile and run some test cases on your platform to find the source
of this problem?
One possibility is that the k-means algorithm gets stuck in an infinite
(periodic) loop in which genes are assigned back and forth between clusters. I
thought that with the current implementation, that was no longer possible, but
maybe there is some case that I overlooked. Since the k-means algorithm starts
from a random initial state, maybe on your platform starts from some funny
initial state that doesn't appear on the other platforms, causing this bug to
appear on your platform only.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 18:31:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 14:31:06 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011831.l41IV6ZU000918@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #9 from chris.lasher at gmail.com 2007-05-01 14:31 EST -------
I'd definitely be willing to run any tests. Just to note, I am not the one who
discovered this bug, I was only the one who filed it. Credit for discovering it
goes to Alex Lancaster who sent in notification of this on April 11th to the
BioPython mailing list (see ). That was on a Fedora
Core installation, so this is not just specific to 32-bit Ubuntu.
Could this involve the source of the Numeric and mxTextTools packages? I
installed Numeric Python and eGenix mxTextTools from the Ubuntu distribution
packages, rather than from direct sources for both software packages. I can't
see why this would make a difference but it is something to consider. Also,
there's a possibility that I don't have all the required software, but I did
not get any warnings when installing from CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 18:48:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 14:48:35 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011848.l41ImZOB001726@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 14:48 EST -------
Checking the version of Numeric may be worth while - I recall from the MMTK
mailing list that some versions appeared to cause subtle bugs. In late 2005
Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version
23, but I don't know if he ever pinned down what the problem was (or indeed, if
there really was a problem).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 19:06:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 15:06:30 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011906.l41J6UED002525@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #11 from chris.lasher at gmail.com 2007-05-01 15:06 EST -------
(In reply to comment #10)
> Checking the version of Numeric may be worth while - I recall from the MMTK
> mailing list that some versions appeared to cause subtle bugs. In late 2005
> Konrad Hinsen was suggesting MMTK users downgrade from version 24 to version
> 23, but I don't know if he ever pinned down what the problem was (or indeed, if
> there really was a problem).
>
On Dapper Drake, Edgy Eft and Feisty Fawn, the Numeric packages are 24.2.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue May 1 19:50:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 May 2007 15:50:47 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705011950.l41JolgE004634@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-01 15:50 EST -------
For reference, on my 64bit Ubuntu Dapper Drake system (where test_Cluster.py
works) I have the following packages installed:
python 2.4.2-0ubuntu3
python-reportlab 1.20debian-3ubuntu1
python-numeric 24.2-1ubuntu2
python-egenix-mxtexttools 2.0.6ubuntu1-1ubuntu4
i.e. Numeric 24.2 does work with test_Cluster.py for me.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 2 18:44:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 May 2007 14:44:01 -0400
Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with
alignments like Bio.SeqIO does sequences
In-Reply-To:
Message-ID: <200705021844.l42Ii154024905@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-02 14:44 EST -------
Created an attachment (id=643)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=643&action=view)
ZIP file containing four python scripts to go in Bio/AlignIO/*.py
There is a follow up patch to Bio/SeqIO/__init__.py to basically use
Bio.AlignIO for reading/writing clustal, stockholm and phylip instead. The
corresponding parsers under Bio/SeqIO/*.py would then be removed.
I have not yet worked out what a Nexus file looks like when it holds more than
one alignment (if in fact this is possible).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri May 4 09:20:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 May 2007 05:20:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705040920.l449KVW3015656@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2007-05-04 05:20 EST -------
Chris,
I found one Linux system on which test_Cluster.py hangs in the call to kmedoids
instead of the call to kcluster. It turned out that this was due to a
floating-point comparison in the kmedoids function. Since the same comparison
occurs in the kcluster function, this may very well be the reason
test_Cluster.py hangs on your platform in the call to kcluster. The comparison
involves two floating-point variables which are bit-wise identical to each
other. However, variable1 <= variable2 returns False.
Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release
1.43) and print out the two variables "total" and "previous"? (You may find
that test_Cluster.py no longer hangs when you add the printf statement; at
least that is what happened with the call to kmedoids). If total and previous
have the same value, but total>=previous returns False, then that would explain
why the call to kcluster hangs.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon May 7 14:16:38 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 07 May 2007 15:16:38 +0100
Subject: [Biopython-dev] Unified alignment input/output, Bio.AlignIO?
In-Reply-To: <463240F9.8010907@maubp.freeserve.co.uk>
References: <463240F9.8010907@maubp.freeserve.co.uk>
Message-ID: <463F34C6.90008@maubp.freeserve.co.uk>
Peter wrote:
> Following the release of Biopython 1.43 with Bio.SeqIO, I would like to
> do a better job for multiple sequence alignment file formats - creating
> a new module Bio.AlignIO
>
> While most multiple sequence alignment files usually contain a single
> alignment (made up of multiple sequences), this is not the general case.
>
> In the PHYLIP suite, concatenated alignments in phylip format are
> produced by the seqboot program for tasks like bootstrapping of a
> phylogenetic tree. Currently SeqIO chokes on these!
>
> Another example is the output of some the EMBOSS programs can contain
> many multiple sequences alignments, for example the water and needle
> tools can produce many pairwise alignments.
>
> In such cases, being able to write code like the following seems to be
> the logical extension of the Bio.SeqIO style we have agreed on:
>
> from Bio import AlignIO
> for alignment in AlignIO.parse("many.phy", "phylip") :
> print "Alignment with %i sequences of length %i" \
> % (len(alignment.get_all_seqs()),
> alignment.get_alignment_length())
> ...
>
> i.e. The AlignIO.parse() function would be an iterator returning
> alignment objects. Does this sound reasonable so far?
I have pressed ahead with this, there is a version attached to bug 2285
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
This handles reading and writing of clustal, phylip, stockholm/pfam. I
have not yet converted the Bio.SeqIO Nexus parser. Also, I plan to add a
parser for reading the EMBOSS alignment format.
As a side effect, this will actually remove a lot of the Bio.SeqIO code
as handling any alignment file can be delegated to Bio.AlignIO instead.
Would anyone like to comment on the scheme?
Peter
From bugzilla-daemon at portal.open-bio.org Mon May 7 17:45:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 13:45:32 -0400
Subject: [Biopython-dev] [Bug 2285] Creating Bio.AlignIO to cope with
alignments like Bio.SeqIO does sequences
In-Reply-To:
Message-ID: <200705071745.l47HjWGl031779@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2285
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #643 is|0 |1
obsolete| |
AssignedTo|biopython-dev at biopython.org |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
Status|NEW |ASSIGNED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-07 13:45 EST -------
Created an attachment (id=646)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=646&action=view)
ZIP file containing four python scripts to go in Bio/AlignIO/*.py
Misc updates to previous version
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 19:42:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 15:42:15 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705071942.l47JgFi3004609@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #14 from chris.lasher at gmail.com 2007-05-07 15:42 EST -------
Created an attachment (id=648)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=648&action=view)
modified_Cluster.c_output.txt
This is output from Cluster.c modified with a printf statement prior to line
2071 for total and previous.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 19:56:27 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 15:56:27 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705071956.l47JuRcu005295@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #15 from chris.lasher at gmail.com 2007-05-07 15:56 EST -------
(In reply to comment #13)
> Chris,
>
> Could you have a look at line 2071 in Bio/Cluster/cluster.c (Biopython release
> 1.43) and print out the two variables "total" and "previous"? (You may find
> that test_Cluster.py no longer hangs when you add the printf statement; at
> least that is what happened with the call to kmedoids). If total and previous
> have the same value, but total>=previous returns False, then that would explain
> why the call to kcluster hangs.
>
This did allow it to proceed up to test_distancematrix_kmedoids, however, it
once again reaches an infinite loop in this test. Additionally, the value for
"previous" reaches an enourmous number and I suspect it's not supposed to. (See
the attached output.)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon May 7 23:06:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 May 2007 19:06:19 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705072306.l47N6JR3012976@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2007-05-07 19:06 EST -------
Thanks, Chris!
Actually, this looks OK.
The kcluster routine runs the k-means algorithm 100 times starting from random
initial clusterings. On each run, total is initialized to DBL_MAX (the largest
number representable as a double). This is the huge number that is printed
(printf usually has problems to print DBL_MAX nicely, so it may appear weird in
the output).
The same floating-point comparison that causes kcluster to hang also appears in
kmedoids, so it's no surprise that the code hangs there too.
I'll write a patch that avoids this floating-point comparison and post it here.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 13:48:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 09:48:11 -0400
Subject: [Biopython-dev] [Bug 2289] New: LOCUS ss-cRNA => ERROR
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
Summary: LOCUS ss-cRNA => ERROR
Product: Biopython
Version: 1.24
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: blocker
Priority: P1
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: Daniel.Nicorici at gmail.com
When I am processing a GenBank file from NCBI I get this error:
=======================================================================
Traceback (most recent call last):
File "F:\silvermine\tool\populator\ncbigenomic\source\python\do.py", line 26,
in
record = iterator.next()
File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 142, in
nex
t
return self._parser.parse(self.handle)
File "D:\Python25\lib\site-packages\Bio\GenBank\__init__.py", line 208, in
par
se
self._scanner.feed(handle, self._consumer)
File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 360, in
feed
self._feed_first_line(consumer, self.line)
File "D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py", line 782, in
_fee
d_first_line
'LOCUS line does not contain valid sequence type (DNA, RNA, ...):\n' + line
AssertionError: LOCUS line does not contain valid sequence type (DNA, RNA,
...):
LOCUS NC_005236 1769 bp ss-cRNA linear VRL
26-FEB-2007
================================================================================
It seems that the error comes from the parser who is not able to handle
ss-cRNA. If I replace ss-cRNA with ss-RNA then is no error anymore.
Here is my python program which gives the error:
===========================================================
import glob
from Bio import GenBank
# the files which will be processed
path="G:\\Data\\NCBI\\genomic\\gbff\\temp\\complete*.genomic.gbff"
print "Starting..."
organism=[]
count_organism=[]
feature=[]
count_feature=[]
qualifier=[]
count_qualifier=[]
files = glob.glob(path)
for file in files:
print ">>>>>>>>>>>>>>>>>>>>>>>>>> " + file + " <<<<<<<<<<<<<<<<<<<<<<<<<"
parser = GenBank.RecordParser()
#infile = open("complete1short.genomic.gbff")
infile = open(file);
iterator = GenBank.Iterator(infile, parser)
record = iterator.next()
while record is not None:
print record.locus + " --- " + record.organism + " --- " +
record.version
# organism
flag=0
for b in range(len(organism)):
if organism[b]==record.organism:
count_organism[b]=count_organism[b]+1
flag=1
break
if flag==0:
organism.append(record.organism)
count_organism.append(1)
# features
for a in range(len(record.features)):
flag=0
for b in range(len(feature)):
if feature[b]==record.features[a].key:
count_feature[b]=count_feature[b]+1
flag=1
break
if flag==0:
feature.append(record.features[a].key)
count_feature.append(1)
#print "--" + record.features[i].key
# qualifiers
for c in range(len(record.features[a].qualifiers)):
flag=0
for b in range(len(qualifier)):
if qualifier[b]==record.features[a].qualifiers[c].key:
count_qualifier[b]=count_qualifier[b]+1
flag=1
break
if flag==0:
qualifier.append(record.features[a].qualifiers[c].key)
count_qualifier.append(1)
#print "----" + record.features[i].qualifiers[j].key
record=iterator.next()
print "===================ORGANISM========================"
for i in range(len(organism)):
print organism[i] + "\t" + str(count_organism[i])
print "===================END_ORGANISM===================="
print "===================FEATURES========================"
for i in range(len(feature)):
print feature[i] + "\t" + str(count_feature[i])
print "===================END_FEATURES===================="
print "===================QUALIFIERS========================"
for i in range(len(qualifier)):
print qualifier[i] + "\t" + str(count_qualifier[i])
print "===================END_QUALIFIERS===================="
print "The End!!!"
x=raw_input("Press ENTER to continue...")
============================================================
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 14:06:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:06:32 -0400
Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR
In-Reply-To:
Message-ID: <200705091406.l49E6WGi008294@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:06 EST -------
Confirmed: the parser currently only accepts entries
'DNA','RNA','tRNA','mRNA','uRNA','snRNA','cDNA'.
Could you tell me where you got this GenBank file from? It would be helpful
for testing (and I may want to add a similar example to the test suite).
Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 14:25:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:25:59 -0400
Subject: [Biopython-dev] [Bug 2289] LOCUS ss-cRNA => ERROR
In-Reply-To:
Message-ID: <200705091425.l49EPxNf009285@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
------- Comment #2 from Daniel.Nicorici at gmail.com 2007-05-09 10:25 EST -------
Hello,
The entry ss-cRNA appears in the file:
ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 14:48:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:48:30 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091448.l49EmU9D010377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|blocker |normal
Status|ASSIGNED |RESOLVED
OS/Version|Windows XP |All
Platform|PC |All
Resolution| |FIXED
Summary|LOCUS ss-cRNA => ERROR |Can't parse GenBank files
| |with "ss-cRNA" in the LOCUS
| |line
Version|1.24 |Not Applicable
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:48 EST -------
See also Bug 2231. With hindsight checking against a known list of sequences
types was too harsh. It now just looks for the text "DNA" or "RNA" within this
field of the LOCUS line in GenBank files.
I've checked in a fix to CVS, and checked I can parse GenBank file NC_005236
The simplest way to update your machine Daniel is to download and replace the
file D:\Python25\lib\site-packages\Bio\GenBank\Scanner.py with revision 1.11
from here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython
There will be a slight time delay before the CVS web site updates itself - you
can of course get the file sfrom CVS directly if you would rather.
Please let us know (on this bug) if that doesn't solve this problem.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 14:51:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:51:05 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091451.l49Ep5HC010511@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-09 10:51 EST -------
P.S. I have not tried the full file from here, as the FTP site was timing out.
ftp.ncbi.nih.gov/refseq/release/complete/complete72.genomic.gbff.gz (15 MB)
I just tried the single GenBank record for NC_005236
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 9 14:57:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 May 2007 10:57:06 -0400
Subject: [Biopython-dev] [Bug 2289] Can't parse GenBank files with "ss-cRNA"
in the LOCUS line
In-Reply-To:
Message-ID: <200705091457.l49Ev6L0010785@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
------- Comment #5 from Daniel.Nicorici at gmail.com 2007-05-09 10:57 EST -------
Here is the part of the file that generates the error:
=======================================================================
LOCUS NC_005236 1769 bp ss-cRNA linear VRL 20-FEB-2007
DEFINITION Seoul virus strain 80-39 segment S, complete sequence.
ACCESSION NC_005236
VERSION NC_005236.1 GI:38505529
PROJECT GenomeProject:15027
KEYWORDS .
SOURCE Seoul virus
ORGANISM Seoul virus
Viruses; ssRNA negative-strand viruses; Bunyaviridae; Hantavirus.
REFERENCE 1 (bases 1 to 1769)
AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J.
TITLE Genetic analysis of the full length of S segment of Seoul virus
prototype, 80-39 strain
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 1769)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (12-AUG-2004) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 3 (bases 1 to 1769)
AUTHORS Song,J.-W., Moon,J.Y., Baek,L.J. and Song,K.-J.
TITLE Direct Submission
JOURNAL Submitted (09-APR-2003) Department of Microbiology, College of
Medicine, Korea University, 5-ka, Anam-dong, Sungbuk-ku, Seoul
136-705, Korea
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from AY273791.
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..1769
/organism="Seoul virus"
/mol_type="viral cRNA"
/strain="80-39"
/isolation_source="Rattus norvegicus"
/db_xref="taxon:11608"
/segment="segment S"
/country="South Korea"
gene 43..1332
/locus_tag="SEOVsSgp1"
/db_xref="GeneID:2943086"
CDS 43..1332
/locus_tag="SEOVsSgp1"
/codon_start=1
/product="nucleocapsid protein"
/protein_id="NP_942556.1"
/db_xref="GI:38505530"
/db_xref="GeneID:2943086"
/translation="MATMEEIQREISAHEGQLVIARQKVKDAEKQYEKDPDDLNKRAL
HDRESVAASIQSKIDELKRQLADRIAAGKNIGQDRDPTGVEPGDHLKERSALSYGNTL
DLNSLDIDEPTGQTADWLTIIVYLTSFVVPIILKALYMLTTRGRQTSKDNKGMRIRFK
DDSSYEDVNGIRKPKHLYVSMPNAQSSMKAEEITPGRFRTAVCGLYPAQIKARNMVSP
VMSVVGFLALAKDWTSRIEEWLGAPCKFMAESPIAGSLSGNPVNRDYIRQRQGALAGM
EPKEFQALRQHSKDAGCTLVEHIESPSSIWVFAGAPDRCPPTCLFVGGMAELGAFFSI
LQDMRNTIMASKTVGTADEKLRKKSSFYQSYLRRTQSMGIQLDQRIIVMFMVAWGKEA
VDNFHLGDDMDPELRSLAQILIDQKVKEISNQEPMKL"
ORIGIN
1 tagtagtaga ctccctaaag agctactcca ctaacaagag aaatggcaac tatggaggaa
61 atccagagag aaatcagtgc tcacgagggg cagcttgtga tagcacgcca gaaggtcaag
121 gatgcagaaa agcagtatga gaaggatcct gatgacttaa acaagagggc actgcatgat
181 cgggagagtg tcgcagcttc aatacaatca aaaattgatg aactgaagcg ccaacttgcc
241 gacaggattg cagcagggaa gaacatcggg caagaccggg atcctacagg ggtagagccg
301 ggtgatcatc tcaaggaaag atcagcacta agctacggga atacactgga cctgaatagt
361 cttgacattg atgaacctac aggacaaaca gctgattggc tgactataat tgtctatcta
421 acatcattcg tggtcccgat catcttgaag gcactgtaca tgttaacaac aagaggtagg
481 cagacttcaa aggacaacaa ggggatgagg atcagattca aggatgacag ctcatatgag
541 gatgtcaatg ggatcagaaa gcctaaacat ctgtatgtgt caatgccaaa cgcccaatcc
601 agtatgaagg ctgaagagat aacaccagga agattccgca ctgcagtatg tgggctatat
661 cctgcacaga taaaggcaag gaatatggta agccctgtca tgagtgtagt tgggtttttg
721 gcactagcaa aagactggac atctagaatt gaagaatggc ttggcgcacc ctgcaagttc
781 atggcagagt ctcctattgc tgggagttta tctgggaatc ctgtgaatcg tgactatatc
841 agacaaagac aaggtgcact tgcagggatg gagccaaagg aatttcaagc cctcaggcaa
901 cattcaaagg atgctggatg tacactagtt gaacatattg agtcaccatc gtcaatatgg
961 gtgtttgctg gggcccctga taggtgtcca ccaacatgct tgtttgttgg agggatggct
1021 gagttaggtg ccttcttttc tatacttcag gatatgagga acacaatcat ggcttcaaaa
1081 actgtgggca cagctgatga aaagcttcga aagaaatcat cattctatca atcatacctc
1141 agacgcacac aatcaatggg aatacaactg gaccagagga taattgttat gtttatggtt
1201 gcctggggaa aggaggcagt ggacaacttc catctcggtg atgacatgga tccagagctt
1261 cgtagcctgg ctcagatctt gattgaccag aaagtgaagg aaatctcgaa ccaggagcct
1321 atgaaattat aagcacataa atatgtaatc aatactaact ataggttaag aaatactaat
1381 cattagttaa taagaataca gatttattga ataatcatat taaataatta ggtaagttaa
1441 atattattta gttaagttag ctaattgatt tatatgatta tcacaattga atgtaatcat
1501 aagcacaatc actgccatgt ataatcacgg gtatacgggt ggttttcata tggggaacag
1561 ggtgggctta gggccaggtc accttaagtg accttttttt gtatatatgg atgtagattt
1621 caattgatcg aatactaatc ctactgtcct cttttctttt cctttctcct tctttactaa
1681 caacaacaaa ctacctcaca accttctacc tcaatatata ctacctcatt aagttgtttc
1741 cttttgtctt tttagggagt ctactacta
//
========================================================================
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From stephen at blackrim.net Wed May 9 15:58:16 2007
From: stephen at blackrim.net (Stephen A Smith)
Date: Wed, 09 May 2007 11:58:16 -0400
Subject: [Biopython-dev] [Off Topic] Google Group
Message-ID: <4641EF98.90504@blackrim.net>
Hi all,
Just letting you know there is a google group open now for discussions
of all thing programming and evolutionary biology. You can find it here
http://groups.google.com/group/evo_code.
Figured the people at bio* might be interested.
Take care
Stephen Smith
--
Dept. Ecology and Evolutionary Biology
Yale University
http://www.blackrim.org
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS dpu s+: a- C++++ UL++++ P--- L++++ E--- W+++ N-- o-- K++++ w---
O- M-- V- PS+++ PE-- Y++ PGP++ t-- 5 X++ R-- tv++ b++++ DI+ D++
G++ e+++ h--- r+++ y+++
------END GEEK CODE BLOCK------
From bugzilla-daemon at portal.open-bio.org Thu May 10 12:59:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 08:59:07 -0400
Subject: [Biopython-dev] [Bug 2290] New: Not reading 1YVE.pdb
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
Summary: Not reading 1YVE.pdb
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: proszek at gmail.com
biopython 1.42 fails to read 1YVE.pdb file, although it reads test.pdb created
by:
awk '{if($1=="ATOM"){print}}' 1YVE.pdb
line 8610 is a HETATM line
Traceback below (where file=sys.argv[1]=1YVE.pdb)
WARNING: Chain J is discontinuous at line 8610.
Traceback (most recent call last):
File "./wezly.py", line 122, in ?
b=Protein(sys.argv[1])
File "./wezly.py", line 15, in __init__
self.struct=self.parser.get_structure('X',file)
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 66, in
get_structure
self._parse(file.readlines())
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 87, in
_parse
self.trailer=self._parse_coordinates(coords_trailer)
File "/usr/lib/python2.4/site-packages/Bio/PDB/PDBParser.py", line 179, in
_parse_coordinates
structure_builder.init_residue(resname, hetero_flag, resseq, icode)
File "/usr/lib/python2.4/site-packages/Bio/PDB/StructureBuilder.py", line
155, in init_residue
self.chain.add(residue)
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 80, in add
raise PDBConstructionException, "%s defined twice" % entity.get_full_id()
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 132, in
get_full_id
parent=self.get_parent()
File "/usr/lib/python2.4/site-packages/Bio/PDB/Entity.py", line 102, in
get_parent
raise PDBException, 'No parent'
PDBException: No parent
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 13:53:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 09:53:05 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705101353.l4ADr544030572@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 09:53 EST -------
Where did you get your 1YVE.pdb file from? Directly from the PDB?
Just as a remark, the "PDBException: No parent" is not the problem.
The error is further back, PDBConstructionException, "??? defined twice", and
when Bio.PDB tries to get the identity of the problem residue it falls over.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 14:06:21 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 10:06:21 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705101406.l4AE6LRW031886@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-10 10:06 EST -------
Which version of Biopython do you have?
The "no parent bug" was fixed as bug 1936, make sure you have Biopython 1.43 or
later.
Mine installation of Biopython works but spits out a LOT of
PDBConstructionException warnings about multiply defined water atoms (aka
"Residue HOH").
Looking at the raw PDB file, there is a problem with multiply defined waters.
As you can see below, the identifier jumps from 799 back to 1 (i.e. there are
two waters with residue number 1).
...
HETATM16581 O HOH 793 36.450 15.564 -9.023 1.00 39.79 O
HETATM16582 O HOH 794 33.448 13.711 -11.019 1.00 40.42 O
HETATM16583 O HOH 796 28.414 11.908 -16.047 1.00 48.15 O
HETATM16584 O HOH 797 29.445 8.114 -11.059 1.00 55.49 O
HETATM16585 O HOH 799 28.383 5.173 -8.998 1.00 33.85 O
HETATM16586 O HOH 1 26.615 4.599 -6.718 1.00 24.95 O
HETATM16587 O HOH 2 23.353 4.948 -7.137 1.00 34.47 O
HETATM16588 O HOH 3 17.401 11.710 0.938 1.00 35.16 O
HETATM16589 O HOH 4 21.326 11.092 8.215 1.00 22.51 O
HETATM16590 O HOH 5 13.703 2.159 11.421 1.00 24.87 O
...
Are you happy for me to mark this as a duplicate of bug 1936
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 15:07:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 11:07:56 -0400
Subject: [Biopython-dev] [Bug 2291] New: __init__.py missing in the
Bio.PDB.mmCIF folder after the install
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
Summary: __init__.py missing in the Bio.PDB.mmCIF folder after
the install
Product: Biopython
Version: Not Applicable
Platform: Macintosh
OS/Version: MacOS X
Status: NEW
Severity: normal
Priority: P2
Component: Website
AssignedTo: biopython-dev at biopython.org
ReportedBy: jean.lechner at gmail.com
When you install Biopython you musst uncoment some lines in the setup.py file
But at the end of the instalation the __init__.py file ils not created in the
Bio.PDB.mmCIF directory
So you cannot use MMCIFParser or MMCIF2Dict because biopython cannot import
MMCIFlex from Bio.PDB.mmCIF
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 10 15:08:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 May 2007 11:08:48 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
folder after the install
In-Reply-To:
Message-ID: <200705101508.l4AF8mQf003465@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
jean.lechner at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jean.lechner at gmail.com
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From idoerg at gmail.com Thu May 10 17:18:23 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Thu, 10 May 2007 10:18:23 -0700
Subject: [Biopython-dev] Biopython talk at BOSC 2007?
Message-ID: <464353DF.6030700@burnham.org>
Anybody giving a talk on Biopython? We can get a 20-30 minute slot in
Vienna, but someone has to show up and talk.
Personally, I will actually be there for the ISMB SIGs, but as I am
running my own conference, it will be a bit of a strain to talk at BOSC.
However, the main reason I do not want to speak is that there are people
much more deserving here. So if anyone plans to be at ISMB 2007 in any
case, and wishes to represent Biopython with serpentine honor, contact
Darin.
Best,
Iddo
-------- Original Message --------
Subject: BOSC 2007 Second Call For Papers
Date: Thu, 10 May 2007 12:17:41 -0400
From: darin.london at duke.edu
To: biopython-owner at lists.open-bio.org
The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th. The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions. Thank you, and we hope to see you in Vienna.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From bugzilla-daemon at portal.open-bio.org Sun May 13 20:30:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 May 2007 16:30:10 -0400
Subject: [Biopython-dev] [Bug 2292] New: Bio.PDBIO writes TER records
without any required fields
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
Summary: Bio.PDBIO writes TER records without any required fields
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: misiek at genesilico.pl
Bio.PDBIO is happy to write TER records as "TER\n", which is inconsistent with
PDB format specification.
The PDB format requires that TER records have some fields similar to ATOM
records:
'''The TER record has the same residue name, chain identifier, sequence number
and insertion code as the terminal residue. The serial number of the TER record
is one number greater than the serial number of the ATOM/HETATM preceding the
TER.'''
[See http://www.wwpdb.org/documentation/format23/sect9.html#TER]
It leads to problem with programs that require correct TER records (like
multiple structural alignment program MUSTANG), and crash when they are not
found.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun May 13 20:31:18 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 May 2007 16:31:18 -0400
Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any
required fields
In-Reply-To:
Message-ID: <200705132031.l4DKVIP9008944@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
------- Comment #1 from misiek at genesilico.pl 2007-05-13 16:31 EST -------
Created an attachment (id=652)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=652&action=view)
Proposed patch to PDBIO.py
This is a simple fix.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From idoerg at gmail.com Mon May 14 16:27:42 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 14 May 2007 09:27:42 -0700
Subject: [Biopython-dev] Subject: BOSC 2007 2nd Call For Papers.
Message-ID: <46488DFE.3070908@burnham.org>
The BOSC Organizing Committee are proud to announce BOSC 2007, occurring
in Vienna, Austria on July 19th, 20th. The conference this year
promises to be exciting, as the BOSC developers attempt to define and
solve currently intractable problems in Bioinformatics. Please refer to
the following website for complete information, and requests for
submissions. Thank you, and we hope to see you in Vienna.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From idoerg at gmail.com Mon May 14 16:28:36 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 14 May 2007 09:28:36 -0700
Subject: [Biopython-dev] BOSC 2007 Abstract Submission Deadline Extension
Message-ID: <46488E34.8000604@burnham.org>
Subject: BOSC 2007 Abstract Submission Deadline Extension
Due to technical difficulties in sending out the 2nd call for papers,
the BOSC organizers are extending the deadline for abstract submissions
to Monday May 21st. The announcement day will remain the same so that
it remains before the Early Discount Date.
http://open-bio.org/wiki/BOSC_2007
The BOSC organizing Committee
Please pass this email on to anyone that would be interested.
--
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
T: +1 858 646 3100 x3516
wengophone: idoerg
http://iddo-friedberg.org
http://2007.BioFunctionPrediction.org
From bugzilla-daemon at portal.open-bio.org Mon May 14 22:18:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 May 2007 18:18:47 -0400
Subject: [Biopython-dev] [Bug 2290] Not reading 1YVE.pdb
In-Reply-To:
Message-ID: <200705142218.l4EMIlwD008110@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2290
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
Version|Not Applicable |1.42
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-05-14 18:18 EST -------
*** This bug has been marked as a duplicate of bug 1936 ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon May 14 22:29:05 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 14 May 2007 23:29:05 +0100
Subject: [Biopython-dev] Bugzilla Version Numbers
In-Reply-To: <46273FB4.4030805@maubp.freeserve.co.uk>
References: <128a885f0704102204p2872f42fh685919bb8b4656c3@mail.gmail.com>
<46273FB4.4030805@maubp.freeserve.co.uk>
Message-ID: <4648E2B1.5040706@maubp.freeserve.co.uk>
Peter wrote:
> Chris Lasher wrote:
>> Hi all,
>>
>> Does anybody active with Biopython have administrative capabilities
>> for the project's Bugzilla tracker? The version numbers are a wee
>> bit out of date.
>
> They are, aren't they! I asked on the list last month about this,
> and updating the component fields too:
>
> http://lists.open-bio.org/pipermail/biopython-dev/2007-March/002652.html
>
> As no-one on the list has come forward, I guess one of us should get
> in touch with the relevant Open Bio people, probably by emailing
> "support" at the domain helpdesk.open-bio.org
>
> Who needs/wants bugzilla admin rights?
I've been in touch with Jason Stajich and he has done some magic:
Michiel and I can now creategroups, editclassifications,
editcomponents, editkeywords. I think that's all we need?
I have initially added 1.42 and 1.43 to the version field for Biopython
in bugzilla.
I would also propose we have a few new components, such as PDB, Nexus
and SeqIO (or perhaps rather than SeqIO something more general like
sequence parsing).
Peter
From jfeala at gmail.com Tue May 15 16:42:30 2007
From: jfeala at gmail.com (Jake Feala)
Date: Tue, 15 May 2007 09:42:30 -0700
Subject: [Biopython-dev] interaction networks in biopython
Message-ID: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
Hello Biopython people -
With all the new research in genome-wide cellular interaction
networks I was a little surprised not to see much support for these
type of data in Biopython. I know that Bioperl has a networks package
that looks like the kind of thing that I would love to also see in
Python for all the obvious reasons.
First - has this already been done and I missed it? All I could find
were a few scattered and application-specific scripts across the web,
plus the Pathway package in BioPython.
If not, then would there be any interest in development along these
lines? A while back I wrote a few scripts that parse interaction
datasets, stick them into a MySQL database, and retrieve the
interactions into a Network object that can be used to analyze the
graph of nodes and links. I would be glad to update these to fit into
the biopython framework, as it would be useful to my own research.
One caveat is that I am an engineering PhD student and my programming
skills are mostly self-taught beyond two Java courses, so I might need
a little guidance in testing and preparing the code for distribution.
I have only ever written code for my own personal research but I think
my style is decent and I would love to get better.
Any opinion or advice?
Thanks
-Jake Feala
Bioengineering Dept.
University of California, San Diego
From edschofield at gmail.com Tue May 15 18:37:30 2007
From: edschofield at gmail.com (Ed Schofield)
Date: Tue, 15 May 2007 19:37:30 +0100
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
References: <12c863fe0705150942t108e3131jaf50821ef9ecf2da@mail.gmail.com>
Message-ID: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com>
On 5/15/07, Jake Feala wrote:
> Hello Biopython people -
>
> With all the new research in genome-wide cellular interaction
> networks I was a little surprised not to see much support for these
> type of data in Biopython. I know that Bioperl has a networks package
> that looks like the kind of thing that I would love to also see in
> Python for all the obvious reasons.
>
> First - has this already been done and I missed it? All I could find
> were a few scattered and application-specific scripts across the web,
> plus the Pathway package in BioPython.
>
> If not, then would there be any interest in development along these
> lines? A while back I wrote a few scripts that parse interaction
> datasets, stick them into a MySQL database, and retrieve the
> interactions into a Network object that can be used to analyze the
> graph of nodes and links. I would be glad to update these to fit into
> the biopython framework, as it would be useful to my own research.
>
> One caveat is that I am an engineering PhD student and my programming
> skills are mostly self-taught beyond two Java courses, so I might need
> a little guidance in testing and preparing the code for distribution.
> I have only ever written code for my own personal research but I think
> my style is decent and I would love to get better.
>
> Any opinion or advice?
This would interest me too; I'd be glad to have such functionality in
BioPython. I can offer you some guidance on Python, packaging and
testing, and (if you need it) use of external array packages.
-- Ed
From yair.benita at gmail.com Tue May 15 19:25:27 2007
From: yair.benita at gmail.com (Yair Benita)
Date: Tue, 15 May 2007 15:25:27 -0400
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <1b5a37350705151137t75ea7e07r6596ba1ce35a8716@mail.gmail.com>
Message-ID:
I would be happy to contribute to this too.
Currently I have a python script that uses HPRD to generate protein protein
interaction maps. I have deferent filtering methods to display only classes
of proteins or only links to a specific kegg pathway. It will need a bit of
work before I can submit this to CVS. As for drawing the map, I am currently
generating a dot file that can be converted to an image using GRAPHVIZ. If
anyone wants to suggest anything else, please do.
Yair
on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
> On 5/15/07, Jake Feala wrote:
>> Hello Biopython people -
>>
>> With all the new research in genome-wide cellular interaction
>> networks I was a little surprised not to see much support for these
>> type of data in Biopython. I know that Bioperl has a networks package
>> that looks like the kind of thing that I would love to also see in
>> Python for all the obvious reasons.
>>
>> First - has this already been done and I missed it? All I could find
>> were a few scattered and application-specific scripts across the web,
>> plus the Pathway package in BioPython.
>>
>> If not, then would there be any interest in development along these
>> lines? A while back I wrote a few scripts that parse interaction
>> datasets, stick them into a MySQL database, and retrieve the
>> interactions into a Network object that can be used to analyze the
>> graph of nodes and links. I would be glad to update these to fit into
>> the biopython framework, as it would be useful to my own research.
>>
>> One caveat is that I am an engineering PhD student and my programming
>> skills are mostly self-taught beyond two Java courses, so I might need
>> a little guidance in testing and preparing the code for distribution.
>> I have only ever written code for my own personal research but I think
>> my style is decent and I would love to get better.
>>
>> Any opinion or advice?
>
> This would interest me too; I'd be glad to have such functionality in
> BioPython. I can offer you some guidance on Python, packaging and
> testing, and (if you need it) use of external array packages.
>
> -- Ed
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Tue May 15 19:19:23 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 May 2007 20:19:23 +0100
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
Message-ID: <464A07BB.8020206@maubp.freeserve.co.uk>
Chris Lasher wrote:
> Since no one else has volunteered, I'm taking up responsibility for
> the transition. I got the ball moving by contacting "support at
> open-bio.org" to get alert them of our interest and get any contacts
> we'll need to make this happen. Also, if anybody on the list has any
> information that would be helpful in this (e.g., who administers the
> CVS repo) please feel free to send it along. Likewise, feel free to
> raise any questions, concerns, and comments on the list.
Did you get any information from the Open Bioinformatics Foundation guys
about moving from CVS to subversion?
Peter
From lists.steve at arachnedesign.net Tue May 15 19:56:46 2007
From: lists.steve at arachnedesign.net (Steve Lianoglou)
Date: Tue, 15 May 2007 15:56:46 -0400
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
Message-ID:
Hi,
On May 15, 2007, at 3:25 PM, Yair Benita wrote:
> I would be happy to contribute to this too.
> Currently I have a python script that uses HPRD to generate protein
> protein
> interaction maps. I have deferent filtering methods to display only
> classes
> of proteins or only links to a specific kegg pathway. It will need
> a bit of
> work before I can submit this to CVS. As for drawing the map, I am
> currently
> generating a dot file that can be converted to an image using
> GRAPHVIZ. If
> anyone wants to suggest anything else, please do.
I've been using NetworkX[1] to play w/ networks/graphs interactively.
You can display them if you have matplotlib installed, and can save
the graphs to dot format as well.
-steve
[1] NetworkX: https://networkx.lanl.gov/wiki
>
> Yair
>
>
> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>
>> On 5/15/07, Jake Feala wrote:
>>> Hello Biopython people -
>>>
>>> With all the new research in genome-wide cellular interaction
>>> networks I was a little surprised not to see much support for these
>>> type of data in Biopython. I know that Bioperl has a networks
>>> package
>>> that looks like the kind of thing that I would love to also see in
>>> Python for all the obvious reasons.
>>>
>>> First - has this already been done and I missed it? All I could
>>> find
>>> were a few scattered and application-specific scripts across the
>>> web,
>>> plus the Pathway package in BioPython.
>>>
>>> If not, then would there be any interest in development along these
>>> lines? A while back I wrote a few scripts that parse interaction
>>> datasets, stick them into a MySQL database, and retrieve the
>>> interactions into a Network object that can be used to analyze the
>>> graph of nodes and links. I would be glad to update these to fit
>>> into
>>> the biopython framework, as it would be useful to my own research.
>>>
>>> One caveat is that I am an engineering PhD student and my
>>> programming
>>> skills are mostly self-taught beyond two Java courses, so I might
>>> need
>>> a little guidance in testing and preparing the code for
>>> distribution.
>>> I have only ever written code for my own personal research but I
>>> think
>>> my style is decent and I would love to get better.
>>>
>>> Any opinion or advice?
>>
>> This would interest me too; I'd be glad to have such functionality in
>> BioPython. I can offer you some guidance on Python, packaging and
>> testing, and (if you need it) use of external array packages.
>>
>> -- Ed
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From salish at picasso.ucsf.edu Tue May 15 20:36:05 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Tue, 15 May 2007 13:36:05 -0700
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO interface
Message-ID: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
Hello everyone,
I started using Biopython in my research and I needed a way to
write GenBank files from a SeqRecord (which was parsed from other
GenBank/etc files). So I wrote something up. It uses the SeqIO
interface and behaves like the fasta writer.
The SeqIO.write(record, handle, "genbank") interface accepts "record"
as either a SeqRecord generator with multiple records or a single
record from SeqRecord. So record = SeqRecord or record =
SeqRecord.next() both work. (I'm a relatively new to Python, so please
excuse any bad terminology or stylistic deficiencies).
The changes are: a new file called GenBankWriter.py in Bio/GenBank.
Small changes to the __init__.py of Bio/GenBank. Changes to the
_feed_first_line function of Scanner.py of Bio/GenBank.
I had to change the way Bio/GenBank/Scanner.py reads the Locus line of
a GenBank file in order to handle missing data and newer molecule
types (e.g. ss-RNA, ds-DNA, mt-DNA, etc). I also add/change a couple
of lines in __init__.py to store whether a sequence was linear or
circular and to store the string that encodes its molecule type
(ss-RNA, etc). The output of SeqIO.write(record,handle,"genbank") is
functionally identical to a GenBank file from NCBI except for some
spacing and word wrap issues.
What is the best way to submit new code for review? Whom do I send it
to and should I send only the modified files?
I've included one of my test scripts below just to show how it works.
(Does anyone suggest any changes in the interface?)
Thank you.
Sincerely,
Howard Salis
Postdoctoral Scholar
UC San Francisco
#ASimpleTest.py
"""A vigorous exercise of the GenBankWriter class and the SeqIO interface."""
from Bio import SeqIO
from Bio import GenBank
working_dir = "E:\\Plasmids\\"
#Get some arbitrarily chosen GenBank files (these are relatively small ones)
gi_list = GenBank.search_for("EF470550 OR EF470551")
print gi_list
ncbi_dict = GenBank.NCBIDictionary("nucleotide", "genbank")
#Write the pair of strings to a single file.
handle = open(working_dir + "Source.gb","w")
for gi in gi_list:
handle.write(str(ncbi_dict[gi]))
handle.close()
#Parse the Source file into a SeqRecord generator (two records)
handle = open(working_dir + "Source.gb","r")
records = SeqIO.parse(handle,"genbank")
#write many records to a single GenBank file
file = open(working_dir + "ManyRecords.gb","w")
SeqIO.write(records,file,"genbank")
file.close()
handle.close()
#----
#Parse the Source file into a SeqRecord generator (two records)
handle = open(working_dir +"Source.gb","r")
records = SeqIO.parse(handle,"genbank")
#Write individual records into their own GenBank file
counter=0
for record in records:
counter+=1
file = open(working_dir + "OneFile_" + str(counter) + ".gb","w")
SeqIO.write(record,file,"genbank")
file.close()
handle.close()
#Open then back up again, parse them, and write them to a single file
handle = open(working_dir + "ManyRecords_Out.gb","w")
for num in range(1,counter+1):
print num
file = open(working_dir +"OneFile_" + str(num) + ".gb","r")
records = SeqIO.parse(file,"genbank")
SeqIO.write(records,handle,"genbank")
file.close()
handle.close()
#Compare the original GenBank file in Source.gb to the GenBankWriter'd one.
original = open(working_dir +"Source.gb","r")
newone = open(working_dir + "ManyRecords_Out.gb","r")
records_original = SeqIO.parse(original,"genbank")
records_newone = SeqIO.parse(newone,"genbank")
for (record_original,record_newone) in zip(records_original,records_newone):
print str(record_original)
print str(record_newone)
original.close()
newone.close()
print "Done"
From biopython-dev at maubp.freeserve.co.uk Tue May 15 20:59:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 15 May 2007 21:59:55 +0100
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO
interface
In-Reply-To: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
Message-ID: <464A1F4B.9020705@maubp.freeserve.co.uk>
Howard Salis wrote:
> Hello everyone,
>
> I started using Biopython in my research and I needed a way to
> write GenBank files from a SeqRecord (which was parsed from other
> GenBank/etc files). So I wrote something up. It uses the SeqIO
> interface and behaves like the fasta writer.
Sounds nice - its something I've been thinking about doing myself, but I
wanted to do both both GenBank and EMBL, sharing the feature table
writing code.
Something else to keep in mind is writing any SeqRecord to a GenBank (or
EMBL) file, even if it did not get created from a GenBank or EMBL file
and is therefore lacking lots of annotation.
> The changes are: a new file called GenBankWriter.py in Bio/GenBank.
> Small changes to the __init__.py of Bio/GenBank. Changes to the
> _feed_first_line function of Scanner.py of Bio/GenBank.
>
> I had to change the way Bio/GenBank/Scanner.py reads the Locus line of
> a GenBank file in order to handle missing data and newer molecule
> types (e.g. ss-RNA, ds-DNA, mt-DNA, etc).
That was recently fixed on Bug 2289
http://bugzilla.open-bio.org/show_bug.cgi?id=2289
> I also add/change a couple
> of lines in __init__.py to store whether a sequence was linear or
> circular and to store the string that encodes its molecule type
> (ss-RNA, etc).
I thought we already stored this information - but I'm not sure off hand.
> The output of SeqIO.write(record,handle,"genbank") is
> functionally identical to a GenBank file from NCBI except for some
> spacing and word wrap issues.
Good :)
> What is the best way to submit new code for review? Whom do I send it
> to and should I send only the modified files?
You could email it directly to me, but it would be better to create a
bug (an "enhancement") and then attached the changes to the bug. Edited
versions of files will do, but patch files are best.
You should use the unix "diff" command line tool to create a patch file.
One way to do this on Windows is to install cygwin...
> I've included one of my test scripts below just to show how it works.
> (Does anyone suggest any changes in the interface?)
Looking at the code, at first glance it looks like you are hooking into
the existing Bio.SeqIO interface nicely.
I look forward to seeing your code Howard.
Peter
From bugzilla-daemon at portal.open-bio.org Wed May 16 01:55:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 May 2007 21:55:14 -0400
Subject: [Biopython-dev] [Bug 2294] New: These patches allow one to write a
GenBank file using the SeqIO interface
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
Summary: These patches allow one to write a GenBank file using
the SeqIO interface
Product: Biopython
Version: 1.43
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: howard.salis at gmail.com
The SeqIO interface currently reads from, but does not write to the GenBank
format. The GenBank format is widely used and is often chosen as the data
storage format for many plasmid, genome, and other nucleotide editors. By
giving Biopython the capability of writing annotated sequences to the GenBank
format, one can use Biopython to read in raw sequences, analyze and annotate
them, and then view them in a nucleotide visual editor. The following patches
do exactly this and use the current SeqIO interface to do it.
The following attached patches enable the command
SeqIO.write(record,handle,"genbank"), where handle is an open, writable
file-object and record is _either_ a SeqRecord generator or the result of one
of its iterations. That is, if one did manyrecords =
SeqIO.parse(handle,"genbank") or onerecord = manyrecords.next(), then one could
pass either manyrecords or onerecord to SeqIO.write(). If a generator
containing multiple records is passed, all records are written to a single
GenBank file. If one record is passed, it is written to file. The file is not
closed, though, and may be called multiple times to write additional records to
file.
The attached patches make small modifications to Bio/SeqIO/__init__.py and
Bio/SeqIO/InsdcIO.py. The _feed_first_line function in Bio/GenBank/Scanner.py
is altered to handle missing data (it uses a very Pythonic dictionary of test
lambda functions to parse the meaning of words). Finally, a new file is created
called Bio/GenBank/GenBankWriter.py.
Questions, Comments, Suggestions, Criticisms, etc are welcome.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed May 16 01:56:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 May 2007 21:56:32 -0400
Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a
GenBank file using the SeqIO interface
In-Reply-To:
Message-ID: <200705160156.l4G1uWRZ005077@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
howard.salis at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|biopython-dev at biopython.org |howard.salis at gmail.com
Status|NEW |ASSIGNED
------- Comment #1 from howard.salis at gmail.com 2007-05-15 21:56 EST -------
Created an attachment (id=654)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=654&action=view)
patch to Bio/GenBank/Scanner.py (alters _feed_first_line under GenBank class)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
From salish at picasso.ucsf.edu Wed May 16 02:34:08 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Tue, 15 May 2007 19:34:08 -0700
Subject: [Biopython-dev] About a new GenBankWriter class with SeqIO
interface
In-Reply-To: <464A1F4B.9020705@maubp.freeserve.co.uk>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com>
<464A1F4B.9020705@maubp.freeserve.co.uk>
Message-ID: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
On 5/15/07, Peter wrote:
> Sounds nice - its something I've been thinking about doing myself, but I
> wanted to do both both GenBank and EMBL, sharing the feature table
> writing code.
Yep, since EMBL and GenBank share the same feature format, I've
separated the "foreword", feature table, and sequence write functions.
So if someone wants to write the EMBL writer, they just need to write
the appropriate foreword. I think the sequence data is stored the same
too? Is that correct?
> Something else to keep in mind is writing any SeqRecord to a GenBank (or
> EMBL) file, even if it did not get created from a GenBank or EMBL file
> and is therefore lacking lots of annotation.
Very true. The GenBankWriter.py will either leave these fields blank,
leave out their keywords entirely if they are optional, or add
something like or when it's necessary
to have something there.
> > I also add/change a couple
> > of lines in __init__.py to store whether a sequence was linear or
> > circular and to store the string that encodes its molecule type
> > (ss-RNA, etc).
>
> I thought we already stored this information - but I'm not sure off hand.
Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA())
that says whether it's DNA, RNA, peptide, etc, but even if I matched
these ups with strings, then the "ss-", "ds-", etc part would be
missing. I just saved the exact wording of the sequence type (e.g.
"ds-DNA", "ss-RNA", etc) to an dictionary key named
self.data.annotations["sequence_type"] in the _FeatureConsumer class
under GenBank. This is in addition to the alphabet of the sequence so
it shouldn't conflict.
> You could email it directly to me, but it would be better to create a
> bug (an "enhancement") and then attached the changes to the bug. Edited
> versions of files will do, but patch files are best.
Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294
> I look forward to seeing your code Howard.
>
> Peter
Thank you! And I hope to continue to contribute to Biopython.
-Howard
From biopython-dev at maubp.freeserve.co.uk Wed May 16 07:53:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 May 2007 08:53:41 +0100
Subject: [Biopython-dev] About a new GenBankWriter class with
SeqIO interface
In-Reply-To: <9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
References: <9fa7e98e0705151336va7e1c86la137c137883d6886@mail.gmail.com> <464A1F4B.9020705@maubp.freeserve.co.uk>
<9fa7e98e0705151934j1aeb2df7ja76e0ce315515d91@mail.gmail.com>
Message-ID: <464AB885.80305@maubp.freeserve.co.uk>
Howard Salis wrote:
>> Sounds nice - its something I've been thinking about doing myself,
>> but I wanted to do both both GenBank and EMBL, sharing the feature
>> table writing code.
>
> Yep, since EMBL and GenBank share the same feature format, I've
> separated the "foreword", feature table, and sequence write
> functions.
Using "foreword / features / sequence" avoids clashing with the terms
"header" and "footer" used in Bio.SeqIO to mean parts of a
multi-sequence file which do not belong to a specific record. Maybe I
should update Bio/GenBank/Scanner.py to use similar terminology...
> So if someone wants to write the EMBL writer, they just need to write
> the appropriate foreword.
There is also the issue of translation between EMBL/GenBank terminology,
for example where someone has read in an EMBL file and wants to write it
out as a GenBank file. For a simple example, the division class should
probably map: {'PRI': 'MAM', 'BCT': 'PRO', 'UNA': 'UNC'}
> I think the sequence data is stored the same too? Is that correct?
Actually, the way the sequence is printed out is slightly different.
>>> I also add/change a couple of lines in __init__.py to store
>>> whether a sequence was linear or circular and to store the string
>>> that encodes its molecule type (ss-RNA, etc).
>> I thought we already stored this information - but I'm not sure off
>> hand.
>
> Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA())
> that says whether it's DNA, RNA, peptide, etc, but even if I matched
> these ups with strings, then the "ss-", "ds-", etc part would be
> missing. I just saved the exact wording of the sequence type (e.g.
> "ds-DNA", "ss-RNA", etc) to an dictionary key named
> self.data.annotations["sequence_type"] in the _FeatureConsumer class
> under GenBank. This is in addition to the alphabet of the sequence so
> it shouldn't conflict.
That's probably a good idea. However, we would need to check what the
EMBL equivalents are and convert them when writing GenBank files. Maybe
we should just keep things simple and write one of RNA/DNA/Protein only?
> Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294
I have made some more specific comments on the bug. I this email I have
tried to stick to the broader picture.
Peter
From jfeala at gmail.com Wed May 16 17:25:37 2007
From: jfeala at gmail.com (Jake Feala)
Date: Wed, 16 May 2007 10:25:37 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
Message-ID: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Thanks Ed and Yair, I'm really glad there's some interest in this!
I'll get started on dusting off my code and adding more documentation.
Steve - great suggestion. I had already seen at NetworkX and was
already thinking about switching over to this as the back-end graph
representation. Are there any issues that I should think about when
creating these extra dependencies?
Also, what is the next step in this process? Should we agree on an
API and class hierarchy before we start dumping code on each other?
Which aspects can we make compatible with other Biopython objects? (I
was thinking maybe parsers for the interaction datasets and the SQL
interface)
-Jake
On 5/15/07, Steve Lianoglou wrote:
> Hi,
>
> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>
> > I would be happy to contribute to this too.
> > Currently I have a python script that uses HPRD to generate protein
> > protein
> > interaction maps. I have deferent filtering methods to display only
> > classes
> > of proteins or only links to a specific kegg pathway. It will need
> > a bit of
> > work before I can submit this to CVS. As for drawing the map, I am
> > currently
> > generating a dot file that can be converted to an image using
> > GRAPHVIZ. If
> > anyone wants to suggest anything else, please do.
>
> I've been using NetworkX[1] to play w/ networks/graphs interactively.
> You can display them if you have matplotlib installed, and can save
> the graphs to dot format as well.
>
> -steve
>
> [1] NetworkX: https://networkx.lanl.gov/wiki
>
> >
> > Yair
> >
> >
> > on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
> >
> >> On 5/15/07, Jake Feala wrote:
> >>> Hello Biopython people -
> >>>
> >>> With all the new research in genome-wide cellular interaction
> >>> networks I was a little surprised not to see much support for these
> >>> type of data in Biopython. I know that Bioperl has a networks
> >>> package
> >>> that looks like the kind of thing that I would love to also see in
> >>> Python for all the obvious reasons.
> >>>
> >>> First - has this already been done and I missed it? All I could
> >>> find
> >>> were a few scattered and application-specific scripts across the
> >>> web,
> >>> plus the Pathway package in BioPython.
> >>>
> >>> If not, then would there be any interest in development along these
> >>> lines? A while back I wrote a few scripts that parse interaction
> >>> datasets, stick them into a MySQL database, and retrieve the
> >>> interactions into a Network object that can be used to analyze the
> >>> graph of nodes and links. I would be glad to update these to fit
> >>> into
> >>> the biopython framework, as it would be useful to my own research.
> >>>
> >>> One caveat is that I am an engineering PhD student and my
> >>> programming
> >>> skills are mostly self-taught beyond two Java courses, so I might
> >>> need
> >>> a little guidance in testing and preparing the code for
> >>> distribution.
> >>> I have only ever written code for my own personal research but I
> >>> think
> >>> my style is decent and I would love to get better.
> >>>
> >>> Any opinion or advice?
> >>
> >> This would interest me too; I'd be glad to have such functionality in
> >> BioPython. I can offer you some guidance on Python, packaging and
> >> testing, and (if you need it) use of external array packages.
> >>
> >> -- Ed
> >> _______________________________________________
> >> Biopython-dev mailing list
> >> Biopython-dev at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From jhackney at stanford.edu Wed May 16 18:10:38 2007
From: jhackney at stanford.edu (Jason A. Hackney)
Date: Wed, 16 May 2007 11:10:38 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To: <12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
References:
<12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Message-ID:
Hi All,
I'm also interested in an interaction network class for biopython.
I'm willing to contribute to the effort with either code review or
testing.
Cheers,
Jason
Jason A. Hackney
Postdoctoral Fellow
Department of Microbiology and Immunology
Stanford University
e-mail: jhackney at stanford.edu
lab phone: 650-724-3891
mobile: 650-283-6907
On May 16, 2007, at 10:25 AM, Jake Feala wrote:
> Thanks Ed and Yair, I'm really glad there's some interest in this!
> I'll get started on dusting off my code and adding more documentation.
>
> Steve - great suggestion. I had already seen at NetworkX and was
> already thinking about switching over to this as the back-end graph
> representation. Are there any issues that I should think about when
> creating these extra dependencies?
>
> Also, what is the next step in this process? Should we agree on an
> API and class hierarchy before we start dumping code on each other?
> Which aspects can we make compatible with other Biopython objects? (I
> was thinking maybe parsers for the interaction datasets and the SQL
> interface)
>
> -Jake
>
>
> On 5/15/07, Steve Lianoglou wrote:
>> Hi,
>>
>> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>>
>>> I would be happy to contribute to this too.
>>> Currently I have a python script that uses HPRD to generate protein
>>> protein
>>> interaction maps. I have deferent filtering methods to display only
>>> classes
>>> of proteins or only links to a specific kegg pathway. It will need
>>> a bit of
>>> work before I can submit this to CVS. As for drawing the map, I am
>>> currently
>>> generating a dot file that can be converted to an image using
>>> GRAPHVIZ. If
>>> anyone wants to suggest anything else, please do.
>>
>> I've been using NetworkX[1] to play w/ networks/graphs interactively.
>> You can display them if you have matplotlib installed, and can save
>> the graphs to dot format as well.
>>
>> -steve
>>
>> [1] NetworkX: https://networkx.lanl.gov/wiki
>>
>>>
>>> Yair
>>>
>>>
>>> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>>>
>>>> On 5/15/07, Jake Feala wrote:
>>>>> Hello Biopython people -
>>>>>
>>>>> With all the new research in genome-wide cellular interaction
>>>>> networks I was a little surprised not to see much support for
>>>>> these
>>>>> type of data in Biopython. I know that Bioperl has a networks
>>>>> package
>>>>> that looks like the kind of thing that I would love to also see in
>>>>> Python for all the obvious reasons.
>>>>>
>>>>> First - has this already been done and I missed it? All I could
>>>>> find
>>>>> were a few scattered and application-specific scripts across the
>>>>> web,
>>>>> plus the Pathway package in BioPython.
>>>>>
>>>>> If not, then would there be any interest in development along
>>>>> these
>>>>> lines? A while back I wrote a few scripts that parse interaction
>>>>> datasets, stick them into a MySQL database, and retrieve the
>>>>> interactions into a Network object that can be used to analyze the
>>>>> graph of nodes and links. I would be glad to update these to fit
>>>>> into
>>>>> the biopython framework, as it would be useful to my own research.
>>>>>
>>>>> One caveat is that I am an engineering PhD student and my
>>>>> programming
>>>>> skills are mostly self-taught beyond two Java courses, so I might
>>>>> need
>>>>> a little guidance in testing and preparing the code for
>>>>> distribution.
>>>>> I have only ever written code for my own personal research but I
>>>>> think
>>>>> my style is decent and I would love to get better.
>>>>>
>>>>> Any opinion or advice?
>>>>
>>>> This would interest me too; I'd be glad to have such
>>>> functionality in
>>>> BioPython. I can offer you some guidance on Python, packaging and
>>>> testing, and (if you need it) use of external array packages.
>>>>
>>>> -- Ed
>>>> _______________________________________________
>>>> Biopython-dev mailing list
>>>> Biopython-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>>
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Wed May 16 22:05:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 May 2007 23:05:44 +0100
Subject: [Biopython-dev] [Bug 2294] These patches allow one to write a
GenBank file using the SeqIO interface
In-Reply-To: <200705161904.l4GJ4pue012542@portal.open-bio.org>
References: <200705161904.l4GJ4pue012542@portal.open-bio.org>
Message-ID: <464B8038.9020900@maubp.freeserve.co.uk>
Hi Howard,
I'm replying to the mailing list as you've raised a few more general
issues on bug 2294.
Peter wrote:
>> In the writer class, for your write_file(self, records) method, you allow
>> explicitly check for and allow "records" to be a single SeqRecord. Don't. Any
>> such "helpfulness" should be done in Bio.SeqIO.write() only, and not the
>> individual write_file. Otherwise we'll end up with a situation where some
>> writers are "helpful" and others are not.
Howard replied:
> Currently, the SeqIO's write function is
>
> def write(sequences, handle, format):
> ...
>
> I can add checks to see if "sequences" is (a) a generator, (b) a SeqRecord
> object, or (c) something else. If (a), then call
> writer_class(handle).write_file(sequences). If (b), then call
> writer_class(handle).write_record(sequences). If (c), spit out an error (for
> now).
I've added a check in Bio.SeqIO.write() for the "sequences" argument
being a SeqRecord (your case b), and if so it now raises a ValueError.
This is better than whatever cryptic error would have happened.
I agree that it might be "nicer" if Bio.SeqIO.write() would also accept
a SeqRecord object as input and did the expected thing, but having a
fixed simple API is more straight forward.
For comparison, see the previous discussions on the mailing list about
having the file argument accepting either a handle or a filename (it was
agreed that we would accept handles only).
> Ok, so the standard is very exact in what the LOCUS line should be. However,
> I've found that many programs do not write Genbank files exactly according to
> this standard! So we might want to make the Genbank parser a bit more forgiving
> to small changes in the spacing of Locus line, especially since many programs
> leave out keywords.
Have you got some examples? I would be keen to add a few more test
cases for reasonable GenBank variations.
> As it stands now, the patch code can handle missing keywords in the LOCUS line.
If it doesn't already, the existing code column based code can easily do
this too.
> For example, the code defines a pair of dictionaries with lambda functions as
> their keys
>
> ...
>
> I know this looks crazy, but it works really well. Where else but Python can I
> have a dictionary / hash / whatever with the key being a function! :) Play
> around with the code and you'll see how it works.
Crazy code is scary ;) I'll try and have a play with this at the weekend.
Note that this issue (parsing the LOCUS line) is a bit tangential to
writing GenBank records.
>> It also looks like when you write the LOCUS line you are not following
>> the column based definition
>
> I'll fix this. The writing of the Genbank file should follow the standard to
> the exactitude.
I agree completely - As a general principle we should be a little bit
flexible on reading files, but very strict on output.
Regards,
Peter
From chris.lasher at gmail.com Sat May 19 20:21:03 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 19 May 2007 16:21:03 -0400
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <464A07BB.8020206@maubp.freeserve.co.uk>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com>
<128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com>
<45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
<464A07BB.8020206@maubp.freeserve.co.uk>
Message-ID: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
On 5/15/07, Peter wrote:
> Did you get any information from the Open Bioinformatics Foundation guys
> about moving from CVS to subversion?
I didn't, with regards to public anonymous access to the Subversion
repositories. I'm also on impromptu leave until this upcoming Monday,
but we'll have this up and running by the end of the month.
Chris
From O.Doehring at cs.ucl.ac.uk Mon May 21 19:45:36 2007
From: O.Doehring at cs.ucl.ac.uk (O.Doehring at cs.ucl.ac.uk)
Date: 21 May 2007 20:45:36 +0100
Subject: [Biopython-dev] Biopython to parse not only .pdb-files but also
NACCESS .asa files
Message-ID:
Dear community,
I am applying the following tool: 'Naccess V2.1.1 - Atomic Solvent
Accessible Area Calculations ' to calculate two features which are not
contained in standard .pdb-files. These two features are atomic
accessiblity and van der Waal radius. As can be read in the readme file at
http://wolf.bi.umist.ac.uk/naccess/nac_readme.html under 'example output
files' and at the PDB-Format site at
http://www.wwpdb.org/documentation/format23/sect9.html under 'Atom'.
NACCESS does the following: 'The output format is PDB, with B-factors and
occupancies removed, then atomic accessiblity in square Angstroms, followed
by the assigned van der Waal radius.' Note that Occupancy gets replaced by
atomic accessiblity and B-factor by the van der Waal radius. This 'new'
.pdb-file has extension .asa.
I chose a quite straight-forward approach: I wanted to use Biopython as
before, e.g. calling the B-Factor method but yielding the atomic
accessiblity instead. But Biopython seems to type-check the .asa-file and
complains that the B-factor is not of type float.
Is there a way to access the data of .asa-files programmatically via the
Biopython library? The only other way then seems to write a parser for
.asa-files and to figure out which atomic element in the .pdb-file
corresponds to the respective one in the .asa-file and finally to retrieve
the wanted values for atomic accessiblity and van der Waal radius.
Here are some more technical details. As an example I chose the '1DHR'
protein:
------------------------------------------------------------------------------
def __init__(self,structure_id="1DHR",indices=[ 0]):
# which residues are part of the patch
self.indices = indices
# If 1 (DEFAULT), the exceptions are caught, but some residues or atoms
will be missing.
# THESE EXCEPTIONS ARE DUE TO PROBLEMS IN THE PDB FILE!
self.p=PDBParser(PERMISSIVE=
1)
# which protein to analyse
self.structure_id = structure_id
self.fileName = self.structure_id +
'.asa'
self.structure = self.p.get_structure(self.structure_id, self.fileName)
------------------------------------------------------------------------------
Error message:
Traceback (most recent call last):
File "C:\Dokumente und Einstellungen\Renate
D?hring\workspace\test\src\root\nested\compactness.py", line 249, in
c = compact(indices=[0,1])
File "C:\Dokumente und Einstellungen\Renate
D?hring\workspace\test\src\root\nested\compactness.py", line 17, in
__init__ self.structure = self.p.get_structure(self.structure_id,
self.fileName)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 65, in
get_structure self._parse(file.readlines())
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 85, in
_parse self.trailer=self._parse_coordinates(coords_trailer)
File "C:\Python25\Lib\site-packages\Bio\PDB\PDBParser.py", line 159, in
_parse_coordinates bfactor=float(line[60:66])
ValueError: invalid literal for float(): 31 1.
------------------------------------------------------------------------------
I hope this question above was not discussed before but neither the search
engine at http://search.open-bio.org/cgi-bin/mail-search.cgi works nor
could I find anything useful via a google search restricted to the archive
using the 'site' attribute.
What do you recommend for my situation. Many thanks!
Yours,
Orlando
From edschofield at gmail.com Tue May 22 16:57:49 2007
From: edschofield at gmail.com (Ed Schofield)
Date: Tue, 22 May 2007 17:57:49 +0100
Subject: [Biopython-dev] [BioPython] Biopython to parse not only
.pdb-files but also NACCESS .asa files
In-Reply-To:
References:
Message-ID: <1b5a37350705220957o24f6a436k89d60764729695da@mail.gmail.com>
On 21 May 2007 20:45:36 +0100, O.Doehring at cs.ucl.ac.uk
wrote:
>
> ValueError: invalid literal for float(): 31 1.
>
> ...
>
> What do you recommend for my situation. Many thanks!
Is that a space between 31 and 1? There's your problem. My advice is to insert
import pdb
pdb.set_trace()
at line 159 in PDBParser.py and check out why the columns in your data
are misaligned with what PDBParser.py expects. A quick scan of
nac_readme.html implies that perhaps you need the -f argument to give
you the full output format?
But if you need to write your own parser for .asa files, you could use
_parse_coordinates(self, coords_trailer) as a template.
-- Ed
From dalke at dalkescientific.com Sat May 26 10:10:21 2007
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 May 2007 12:10:21 +0200
Subject: [Biopython-dev] [Biopython-announce] is this supposed to be
really slow?
In-Reply-To:
References:
<20070525233151.GA4507@caltech.edu>
Message-ID:
(Move this from the -announce to the -dev list)
Bryan Smith, replying to Titus Brown wrote:
> i did see this constraint for only one request per 3 seconds, but did
> not realize each time i went through my loop that this was a separate
> request.
> is there anything to do about this constraint?
In your "search_for" call add delay=0.
def search_for(search, reldate=None, mindate=None, maxdate=None,
batchsize=100, delay=2, callback_fn=None,
start_id=0, max_ids=None):
"""search_for(search[, reldate][, mindate][, maxdate]
[, batchsize][, delay][, callback_fn][, start_id][, max_ids]) ->
ids
Search PubMed and return a list of the PMID's that match the
criteria. search is the search string used to search the
database. reldate is the number of dates prior to the current
date to restrict the search. mindate and maxdate are the dates to
restrict the search, e.g. 2002/01/01. batchsize specifies the
number of ids to return at one time. By default, it is set to
10000, the maximum. delay is the number of seconds to wait
between queries (default 2). callback_fn is an optional callback
function that will be called as passed a PMID as results are
retrieved. start_id specifies the index of the first id to
retrieve and max_ids specifies the maximum number of id's to
retrieve.
in your Dictionary creation also add delay=0
class Dictionary:
def __init__(self, delay=5.0, parser=None):
"""Dictionary(delay=5.0, parser=None)
Create a new Dictionary to access PubMed. parser is an
optional
parser (e.g. Medline.RecordParser) object to change the results
into another form. If set to None, then the raw contents of
the
file will be returned. delay is the number of seconds to wait
between each query.
>> I personally tend to just use the NCBI retrieval URLs directly, but
>> that's kind of ugly.
NCBI also watches those requests, and if you do too many
you might get a warning or be blocked off, or so rumor has it.
BTW, in your original code you can simplify
> for idx in range( len( termIds ) ):
> pubDates[idx] = string.atoi( medlineDict[ termIds[ idx ]
> ].publication_date[ 0:4 ] )
> idx = idx + 1
to
for idx, termId in enumerate(termIds):
pubDates[idx] = int(medlineDict[termId]].publication_date[:4])
Andrew
dalke at dalkescientific.com
From chris.lasher at gmail.com Thu May 31 04:30:38 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Thu, 31 May 2007 00:30:38 -0400
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com>
<128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com>
<45F235B7.6000409@c2b2.columbia.edu>
<128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com>
<464A07BB.8020206@maubp.freeserve.co.uk>
<128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
Message-ID: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
On 5/19/07, Chris Lasher wrote:
> On 5/15/07, Peter wrote:
> > Did you get any information from the Open Bioinformatics Foundation guys
> > about moving from CVS to subversion?
>
> I didn't, with regards to public anonymous access to the Subversion
> repositories. I'm also on impromptu leave until this upcoming Monday,
> but we'll have this up and running by the end of the month.
>
> Chris
>
I'm obviously missing another target, and BOSC 2007 is fast
approaching. I'm being held up by 4 files that are in the CVS
repository that were foolishly committed with carriage returns (i.e.,
"\r") in the filenames. How that's possible, I have no clue, but I
need to alter the data in the CVS repository so those filenames are
correct, or otherwise completely removed, over the entire history of
those files. Does anyone have any experience with the internals of CVS
repositories? I definitely do not.
Chris
From biopython-dev at maubp.freeserve.co.uk Thu May 31 09:07:59 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 31 May 2007 10:07:59 +0100
Subject: [Biopython-dev] Subversion Repository
In-Reply-To: <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com>
<128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com>
Message-ID: <465E906F.1080704@maubp.freeserve.co.uk>
Chris Lasher wrote:
> On 5/19/07, Chris Lasher wrote:
>> On 5/15/07, Peter wrote:
>>> Did you get any information from the Open Bioinformatics Foundation guys
>>> about moving from CVS to subversion?
>> I didn't, with regards to public anonymous access to the Subversion
>> repositories. I'm also on impromptu leave until this upcoming Monday,
>> but we'll have this up and running by the end of the month.
>>
>> Chris
>>
>
> I'm obviously missing another target, and BOSC 2007 is fast
> approaching.
Are you going to BOSC 2007 Chris?
> I'm being held up by 4 files that are in the CVS
> repository that were foolishly committed with carriage returns (i.e.,
> "\r") in the filenames. How that's possible, I have no clue, but I
> need to alter the data in the CVS repository so those filenames are
> correct, or otherwise completely removed, over the entire history of
> those files. Does anyone have any experience with the internals of CVS
> repositories? I definitely do not.
How strange! I have no experience with the internals of CVS so can't
help you there. What are the four offending files? Maybe we could just
purge them for the move to SVN.
Also, I suspect (but have not checked this) that a few of the examples
files in the unit tests have been checked in as binary files rather than
text (due to some odd differences in new lines across platforms). Again,
a CVS expert would probably be able to generate a list of all "binary"
files in the repository fairly easily.
Peter
From bugzilla-daemon at portal.open-bio.org Thu May 31 13:14:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:14:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311314.l4VDEV2X031189@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:14 EST -------
Created an attachment (id=661)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=661&action=view)
Updated version of Bio/Cluster/cluster.c
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 31 13:15:17 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:15:17 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311315.l4VDFH6D031294@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:15 EST -------
Created an attachment (id=662)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=662&action=view)
Updated version of Bio/Cluster/clustermodule.c
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu May 31 13:17:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 May 2007 09:17:31 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200705311317.l4VDHVB7031418@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
------- Comment #19 from mdehoon at ims.u-tokyo.ac.jp 2007-05-31 09:17 EST -------
Could you try with the updated Bio/Cluster/cluster.c,
Bio/Cluster/clustermodule.c (see attachments)? These should solve the problems
with the Cluster unit test. If they work fine, I'll upload them to CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jfeala at gmail.com Thu May 31 23:52:36 2007
From: jfeala at gmail.com (Jake Feala)
Date: Thu, 31 May 2007 16:52:36 -0700
Subject: [Biopython-dev] interaction networks in biopython
In-Reply-To:
References:
<12c863fe0705161025p46b1ff6v8c6b1e5999b29244@mail.gmail.com>
Message-ID: <12c863fe0705311652w44074269y2256aa127b90843b@mail.gmail.com>
Hi Everybody -
I've been thinking about the possible structure of a "BioNet" package,
and here is what I think would be most useful:
InteractionRecord.py - a storage object for biological interactions,
mirroring information stored in the PSI-MI (Proteomics Standards
Initiative - Molecular Interaction) XML standard - unless someone
knows a better one.
Network.py - network object inheriting a NetworkX graph class with
additional methods for manipulating an InteractionRecord stored with
each edge
InteractionIO - a submodule with parsers to read and write
interactions to/from Cytoscape, PSI-MI, and other formats or online
interaction databases
BioNetSQL - a submodule for storing and querying to a local SQL
database of interactions
I've started on the code, including parsers for Cytoscape, PSI-MI XML
files, and GRID flat files. I haven't fixed up my SQL scripts yet
because I want to rethink the database design. All the code is
available at http://cmrg.ucsd.edu/JakeFeala#software
Here is an example that worked fine for me:
from Network import *
f = open()
parser = GRIDIterator(f):
net = create_network()
net.load(parser)
Are there any suggestions, regarding (1) the standard for
InteractionRecord, (2) methods for the Network object, (3) structure
of the SQL database, (4) overall structure of the package? Also, does
anyone want to contribute to any specific part (e.g. Yair can add his
HPRD parser)?
Thanks!
-Jake
On 5/16/07, Jason A. Hackney wrote:
> Hi All,
>
> I'm also interested in an interaction network class for biopython. I'm
> willing to contribute to the effort with either code review or testing.
>
> Cheers,
>
> Jason
>
>
>
> Jason A. Hackney
>
> Postdoctoral Fellow
> Department of Microbiology and Immunology
> Stanford University
>
> e-mail: jhackney at stanford.edu
> lab phone: 650-724-3891
> mobile: 650-283-6907
>
>
>
>
>
> On May 16, 2007, at 10:25 AM, Jake Feala wrote:
>
> Thanks Ed and Yair, I'm really glad there's some interest in this!
> I'll get started on dusting off my code and adding more documentation.
>
> Steve - great suggestion. I had already seen at NetworkX and was
> already thinking about switching over to this as the back-end graph
> representation. Are there any issues that I should think about when
> creating these extra dependencies?
>
> Also, what is the next step in this process? Should we agree on an
> API and class hierarchy before we start dumping code on each other?
> Which aspects can we make compatible with other Biopython objects? (I
> was thinking maybe parsers for the interaction datasets and the SQL
> interface)
>
> -Jake
>
>
> On 5/15/07, Steve Lianoglou wrote:
> Hi,
>
> On May 15, 2007, at 3:25 PM, Yair Benita wrote:
>
>
> I would be happy to contribute to this too.
> Currently I have a python script that uses HPRD to generate protein
> protein
> interaction maps. I have deferent filtering methods to display only
> classes
> of proteins or only links to a specific kegg pathway. It will need
> a bit of
> work before I can submit this to CVS. As for drawing the map, I am
> currently
> generating a dot file that can be converted to an image using
> GRAPHVIZ. If
> anyone wants to suggest anything else, please do.
>
> I've been using NetworkX[1] to play w/ networks/graphs interactively.
> You can display them if you have matplotlib installed, and can save
> the graphs to dot format as well.
>
> -steve
>
> [1] NetworkX: https://networkx.lanl.gov/wiki
>
>
>
> Yair
>
>
> on 5/15/07 2:37 PM, Ed Schofield at edschofield at gmail.com wrote:
>
>
> On 5/15/07, Jake Feala wrote:
> Hello Biopython people -
>
> With all the new research in genome-wide cellular interaction
> networks I was a little surprised not to see much support for these
> type of data in Biopython. I know that Bioperl has a networks
> package
> that looks like the kind of thing that I would love to also see in
> Python for all the obvious reasons.
>
> First - has this already been done and I missed it? All I could
> find
> were a few scattered and application-specific scripts across the
> web,
> plus the Pathway package in BioPython.
>
> If not, then would there be any interest in development along these
> lines? A while back I wrote a few scripts that parse interaction
> datasets, stick them into a MySQL database, and retrieve the
> interactions into a Network object that can be used to analyze the
> graph of nodes and links. I would be glad to update these to fit
> into
> the biopython framework, as it would be useful to my own research.
>
> One caveat is that I am an engineering PhD student and my
> programming
> skills are mostly self-taught beyond two Java courses, so I might
> need
> a little guidance in testing and preparing the code for
> distribution.
> I have only ever written code for my own personal research but I
> think
> my style is decent and I would love to get better.
>
> Any opinion or advice?
>
> This would interest me too; I'd be glad to have such functionality in
> BioPython. I can offer you some guidance on Python, packaging and
> testing, and (if you need it) use of external array packages.
>
> -- Ed
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>