From biopython-dev at maubp.freeserve.co.uk Wed Aug 8 06:59:25 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 08 Aug 2007 11:59:25 +0100 Subject: [Biopython-dev] Subversion Repository (moving from CVS to SVN) In-Reply-To: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> Message-ID: <46B9A20D.4010303@maubp.freeserve.co.uk> Chris Lasher wrote: >>> I'm obviously missing another target, and BOSC 2007 is fast >>> approaching. >> >> Are you going to BOSC 2007 Chris? > > I wish I were going to BOSC, but unfortunately, I will not go. While at BOSC 2007 I had a chance to chat to Jason Stajich from BioPerl and the Open Bioinformatics Foundation (OBF, the nice guys who look after our servers). The BioPerl project is looking at moving from CVS to SVN, and assuming that all goes smoothly, moving Biopython over as well should be simple enough. Peter From mdehoon at c2b2.columbia.edu Tue Aug 7 22:57:23 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 7 Aug 2007 22:57:23 -0400 Subject: [Biopython-dev] Bio.Wise Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Hi everybody, Bio.Wise currently causes a deprecation warning when running the Biopython tests (using Biopython from CVS). This warning is caused by the deprecated Bio.SeqIO.FASTA: # In Bio.Wise.__init__.py: from Bio.SeqIO.FASTA import FastaReader, FastaWriter The FastaReader, FastaWriter functions are used as follows: for filename, input_file in zip(pair, input_files): input_file.close() FastaWriter(file(input_file.name, "w")).write(FastaReader(file(filename)).next()) To me, it looks like all this does is to read one Fasta record from filename, and then store it in input_file. I was wondering why we go through the Fasta reader/writer instead of reading/writing the file contents directly, as in for filename, input_file in zip(pair, input_files): input_file.close() file(input_file.name, "w").write(file(filename).read()) On a related note, the input_file refers to a temporary file. To create this temporary file, Bio.Wise prefers to use NamedTemporaryFile in the poly module, instead of NamedTemporaryFile in the tempfile module: try: import poly _NamedTemporaryFile = poly.NamedTemporaryFile except ImportError: import tempfile try: _NamedTemporaryFile = tempfile.NamedTemporaryFile except AttributeError: # no NamedTemporaryFile on 2.2, stuck without it _NamedTemporaryFile = tempfile.TemporaryFile The tempfile module is in the Python standard library, the poly module is not. Is using the poly module still relevant? I am asking since the current code in Bio.Wise does not seem to be handling temporary files correctly, and it'll be easier to fix it if we don't have to consider both poly.NamedTemporaryFile and tempfile.NamedTemporaryFile. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Wed Aug 8 20:28:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 8 Aug 2007 20:28:31 -0400 Subject: [Biopython-dev] Bio.Wise References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> Sebastian Bassi wrote: > On 8/7/07, Michiel De Hoon wrote: > > I was wondering why we go through the Fasta reader/writer instead of > > reading/writing the file contents directly, as in > > for filename, input_file in zip(pair, input_files): > > input_file.close() > > file(input_file.name, "w").write(file(filename).read()) > > The old Fasta writer used to write a 70 column formated fasta file. > Your method (and I think also the new seq.io) write the fasta data as > a one big line. Peter, can we change the behavior of SeqIO.write so that it writes the fasta data in some fixed column format? For comparison, Bioperl appears to use a column width of 60 characters: http://www.bioperl.org/wiki/FASTA_sequence_format --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Thu Aug 9 04:10:22 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 09 Aug 2007 09:10:22 +0100 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> Message-ID: <46BACBEE.20301@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Sebastian Bassi wrote: >> On 8/7/07, Michiel De Hoon wrote: >>> I was wondering why we go through the Fasta reader/writer instead of >>> reading/writing the file contents directly, as in >>> for filename, input_file in zip(pair, input_files): >>> input_file.close() >>> file(input_file.name, "w").write(file(filename).read()) >> The old Fasta writer used to write a 70 column formated fasta file. >> Your method (and I think also the new seq.io) write the fasta data as >> a one big line. Maybe wise doesn't like its input as one long line? > Peter, can we change the behavior of SeqIO.write so that it writes the fasta > data in some fixed column format? For comparison, Bioperl appears to use a > column width of 60 characters: > > http://www.bioperl.org/wiki/FASTA_sequence_format > > --Michiel. That would be easy, and might improve compatibility with some tools which recommend the lines be at most 80 letters long. 60 does seem to be considered a default. My personal preference is with no line breaks, partly because I tend to work more with domain sequences (usually less than 100 characters). This also means that when viewing a sequence in a text editor I can simply halve the line number to get the record number. Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files with a max sequence line length of 60. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 9 06:58:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Aug 2007 06:58:44 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708091058.l79Awimb007337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #26 from tiagoantao at gmail.com 2007-08-09 06:58 EST ------- Created an attachment (id=724) --> (http://bugzilla.open-bio.org/attachment.cgi?id=724&action=view) Documentation for the GenePop parser -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Thu Aug 9 13:22:59 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 9 Aug 2007 14:22:59 -0300 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <46BACBEE.20301@maubp.freeserve.co.uk> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> <46BACBEE.20301@maubp.freeserve.co.uk> Message-ID: On 8/9/07, Peter wrote: .... > Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files > with a max sequence line length of 60. Do you mean a default length of 60, but could be set to other length if desired (as before with the old fasta writer)? That is good to me. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Sat Aug 11 01:35:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 01:35:57 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110535.l7B5Zv63020979@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 01:35 EST ------- I have committed the documentation for the GenePop parser to CVS. Next time, please don't attach your patch to a bug report that is unrelated to GenePop. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 02:15:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:15:03 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110615.l7B6F3cq023236@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:15 EST ------- [Comment 22 from Peter] > I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py > and noticed that while crc64, gcg and seguid will cope with both strings and > Seq objects, crc32 will only cope with strings. > > Any objections to me fixing this like so: [Comment 24 from Michiel] > A better solution would be for Seq to inherit from str, instead of Seq having > str as a member. Then we don't have to modify crc32, and other code in > Biopython will also become simpler. [Comment 25 from Peter] > Changing the Seq object to be a subclass of string might be nice... > More importantly, wouldn't this dramatic change break a lot of > existing scripts? Probably something for the mailing list! OK, so I have committed your solution from comment #22 to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 02:37:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:37:25 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110637.l7B6bPhu024290@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:37 EST ------- I have committed the unit test by Peter (from comment #23) to CVS, with some slight modifications to remove the try/except at the end. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 02:43:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:43:25 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110643.l7B6hPRw024637@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:43 EST ------- [In reply to comment #21]: > Maybe it could be useful to add a 'GCG checksum' attribute to the > BioPython Seq object. Note that you can already do that without changing Biopython: >>> from Bio.Seq import Seq >>> s = Seq("ACGT") >>> from Bio.SeqUtils import CheckSum >>> s.crc32 = CheckSum.crc32(s) >>> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 02:45:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:45:36 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110645.l7B6jaLh024750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:45 EST ------- (In reply to comment #14 from Sebastian) > [Michiel:] > > We should also add your example from #7 to the manual. > > I could add it to the wiki, but after the code is in its place in CVS so the > sample would refer to the proper module. Could you add your example to the Wiki or the manual? As far as I can tell, we are then ready to close this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sat Aug 11 02:47:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 15:47:32 +0900 Subject: [Biopython-dev] Bio.PopGen tests fail Message-ID: <46BD5B84.9000302@c2b2.columbia.edu> The current unit tests for Bio.PopGen fail, apparently due to the output files being missing. Tiago (or somebody else who has those files), could you add them? --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 03:27:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 03:27:39 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110727.l7B7Rdcv026746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-11 03:27 EST ------- Regarding comment 26 and comment 27, I think Tiago ment to attach that file bug 2170 - whoops. Regarding comment 28, that sounds great Michiel. I agree that the example in comment 7 would be a nice addition to the manual or wiki. One final thing, I would like to rewrite the doc string comments for Bio/SeqUtils/CheckSum to follow PEP 257 more closely (which a lot of our code seems to do). http://www.python.org/dev/peps/pep-0257/ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 07:42:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 07:42:03 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708111142.l7BBg3M4014540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-11 07:42 EST ------- In biopython/Bio/SeqUtils/CheckSum.py revision 1.3 I have tweaked the docstring comments to closer follow PEP 257 and existing Biopython usage. I also added my change from comment 22 (which Michiel said he did in comment 28, but wasn't in CVS yet). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 08:33:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 08:33:50 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708111233.l7BCXoKj017566@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 08:33 EST ------- > I also added my change from comment 22 (which Michiel said he did in comment > 28, but wasn't in CVS yet). Sometimes it takes several hours before a CVS commit actually shows up (I don't know why). If mine arrives later, it may overwrite your change. Let's see in a day or two if something funny shows up in CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sat Aug 11 08:33:53 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Aug 2007 13:33:53 +0100 Subject: [Biopython-dev] Reorganising the tutorial Message-ID: <46BDACB1.8040304@maubp.freeserve.co.uk> I'd like to expand the section of Chapter 2 on sequence input/output into an entire chapter (between the current Chapter 2 Quick Start and Chapter 3 BLAST). Any comments? Suggestions for examples? Peter From mdehoon at c2b2.columbia.edu Sat Aug 11 09:14:21 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 22:14:21 +0900 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDACB1.8040304@maubp.freeserve.co.uk> References: <46BDACB1.8040304@maubp.freeserve.co.uk> Message-ID: <46BDB62D.2010209@c2b2.columbia.edu> Peter wrote: > I'd like to expand the section of Chapter 2 on sequence input/output > into an entire chapter (between the current Chapter 2 Quick Start and > Chapter 3 BLAST). > > Any comments? Suggestions for examples? By all means, go for it. I feel that sequence input/output is a big enough topic to deserve a topic of its own. --Michiel. From mdehoon at c2b2.columbia.edu Sat Aug 11 09:39:15 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 22:39:15 +0900 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDB62D.2010209@c2b2.columbia.edu> References: <46BDACB1.8040304@maubp.freeserve.co.uk> <46BDB62D.2010209@c2b2.columbia.edu> Message-ID: <46BDBC03.5020501@c2b2.columbia.edu> Michiel de Hoon wrote: > Peter wrote: >> I'd like to expand the section of Chapter 2 on sequence input/output >> into an entire chapter (between the current Chapter 2 Quick Start and >> Chapter 3 BLAST). >> >> Any comments? Suggestions for examples? > > By all means, go for it. I feel that sequence input/output is a big > enough topic to deserve a topic of its own. This should be: > enough topic to deserve a chapter of its own. ^^^^^^^ --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Aug 11 16:48:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Aug 2007 21:48:01 +0100 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <46BACBEE.20301@maubp.freeserve.co.uk> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> <46BACBEE.20301@maubp.freeserve.co.uk> Message-ID: <46BE2081.2080408@maubp.freeserve.co.uk> Peter wrote: > Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files > with a max sequence line length of 60. I've switched the default from no wrapping to 60 characters in Bio/SeqIO/FastaIO.py Peter From tiagoantao at gmail.com Sat Aug 11 17:18:03 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 11 Aug 2007 22:18:03 +0100 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: <200708110535.l7B5Zv63020979@portal.open-bio.org> References: <200708110535.l7B5Zv63020979@portal.open-bio.org> Message-ID: <6d941f120708111418s7b6607ceo95bd2f3199024bae@mail.gmail.com> Sorry for this mistake. I don't know how it got there, the idea was to attach this to 2170 for Peter to review. Tiago On 8/11/07, bugzilla-daemon at portal.open-bio.org wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2323 > > > > > > ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 01:35 EST ------- > I have committed the documentation for the GenePop parser to CVS. > Next time, please don't attach your patch to a bug report that is unrelated to > GenePop. > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- http://www.tiago.org/ps From mdehoon at c2b2.columbia.edu Sun Aug 12 10:18:25 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 12 Aug 2007 23:18:25 +0900 Subject: [Biopython-dev] Bio.Wise In-Reply-To: References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Message-ID: <46BF16B1.1010304@c2b2.columbia.edu> Thank you for your explanation. It makes sense now. I have uploaded a modified version of Bio.Wise.__init__.py to CVS. Please let me know if there are any changes you don't agree with. Thanks again, --Michiel. Michael Hoffman wrote: > [Michiel De Hoon] > >> # In Bio.Wise.__init__.py: >> from Bio.SeqIO.FASTA import FastaReader, FastaWriter >> >> The FastaReader, FastaWriter functions are used as follows: >> >> for filename, input_file in zip(pair, input_files): >> input_file.close() >> FastaWriter(file(input_file.name, >> "w")).write(FastaReader(file(filename)).next()) >> >> To me, it looks like all this does is to read one Fasta record from >> filename, >> and then store it in input_file. > > I believe this was done to smooth out troublesome Fasta files because > the Biopython parser was more versatile than that in > Wise2. Specifically, there was a maximum line length restriction in > Wise2. Piping this through a Biopython read/write pairing ensures that > all the lines are short enough to be read in. > >> I am asking since the current code in Bio.Wise does not seem to be >> handling temporary files correctly, and it'll be easier to fix it if >> we don't have to consider both poly.NamedTemporaryFile and >> tempfile.NamedTemporaryFile. > > If it makes things easier for you, please do it by all means. From biopython-dev at maubp.freeserve.co.uk Mon Aug 13 15:04:03 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Aug 2007 20:04:03 +0100 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDB62D.2010209@c2b2.columbia.edu> References: <46BDACB1.8040304@maubp.freeserve.co.uk> <46BDB62D.2010209@c2b2.columbia.edu> Message-ID: <46C0AB23.10202@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I'd like to expand the section of Chapter 2 on sequence input/output >> into an entire chapter (between the current Chapter 2 Quick Start and >> Chapter 3 BLAST). >> >> Any comments? Suggestions for examples? > > By all means, go for it. I feel that sequence input/output is a big > enough topic to deserve a chapter of its own. I've made the SeqIO section into a new chapter, and added some examples of using it to write files. If anyone spots any typos, please let me know ;) I think we should update the SWISS-PROT and GENBANK sections of the Cookbook chapter to either mention Bio.SeqIO as an alternative to using Bio.SwissProt and Bio.GenBank directly for parsing the files. Any comments? Peter From biopython-dev at maubp.freeserve.co.uk Mon Aug 13 18:59:42 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Aug 2007 23:59:42 +0100 Subject: [Biopython-dev] Which NCBI / Entrez module? Message-ID: <46C0E25E.6060007@maubp.freeserve.co.uk> I've just been updating the Tutorial to expand the SeqIO documentation into a full chapter, and one of the things it now covers is parsing a handle to an online databases. For the SwissProt example I was guided by the existing tutorial code and used Bio.WWW.ExPASy.get_sprot_raw() which works fine (but interestingly only fetches one record). I then added an example fetching GenBank records from the NCBI, based on the existing tutorial code which uses Bio.GenBank to do some searches and retrieve records by their GI number. I decided to use Bio.GenBank.download_many() with Bio.SeqIO.parse() in the new example - and this works nicely. Now, looking over the code, the "online" parts of Bio.GenBank are using Bio.EUtils, a complex bit of code dated 2003 by Andrew Dalke. There is another (older and much smaller) module Bio.WWW.NCBI dated 1999-2000 by Jeffrey Chang, which also offers an EUtils interface. This does make an appearance in the tutorial in the "Connecting with biological databases" section. Bio.WWW.NCBI seems to just build EntreZ URLs, and returns raw data as provided by the NCBI. Bio.EUtils says it also does this, and offers a higher level interface supporting history tracking and parsing of query results (in XML). Is anyone here very familiar with either of these modules? Should we depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update its documentation to recommend using that instead? Peter From mdehoon at c2b2.columbia.edu Tue Aug 14 06:40:53 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 14 Aug 2007 06:40:53 -0400 Subject: [Biopython-dev] Which NCBI / Entrez module? References: <46C0E25E.6060007@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B603@mail2.exch.c2b2.columbia.edu> Peter wrote: > Is anyone here very familiar with either of these modules? Should we > depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update > its documentation to recommend using that instead? As Bio.EUtils is more advanced than Bio.WWW.NCBI, I'd be in favor of deprecating Bio.WWW.NCBI if there is sufficient documentation for Bio.EUtils. Currently though, we have some documentation for Bio.WWW.NCBI but little for Bio.EUtils. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2912 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20070814/593ff3bc/attachment.bin From sbassi at gmail.com Tue Aug 14 15:32:14 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 16:32:14 -0300 Subject: [Biopython-dev] Error in doc. Message-ID: I don't know if it worthwhile to open a bug on bugzilla for this little mistake. On http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial002.html There is a line: >>> from Bio.Tools import Translate The correct line should be: >>> from Bio import Translate I guess that this document may not be the latest official document, but is under bioinformatics.org domain and is returned by Google on the top searchers. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython-dev at maubp.freeserve.co.uk Tue Aug 14 16:30:49 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Aug 2007 21:30:49 +0100 Subject: [Biopython-dev] Error in doc. In-Reply-To: References: Message-ID: <46C210F9.1050905@maubp.freeserve.co.uk> Sebastian Bassi wrote: > I don't know if it worthwhile to open a bug on bugzilla for this little mistake. > > On http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial002.html > There is a line: >>>> from Bio.Tools import Translate > The correct line should be: >>>> from Bio import Translate > > I guess that this document may not be the latest official document, > but is under bioinformatics.org domain and is returned by Google on > the top searchers. That document is an old version of the Biopython tutorial, the latest version is here and that line has been fixed: http://biopython.org/DIST/docs/tutorial/Tutorial.html I presume the page you found belongs to Brad Chapman - one of the core contributors to Biopython some years back. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 16 17:15:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 17:15:11 -0400 Subject: [Biopython-dev] [Bug 2348] New: Slicing the Seq object (returns a string when use a stride) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2348 Summary: Slicing the Seq object (returns a string when use a stride) Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I think this is a bug introduced to changes in how python deals with splicing. Currently we have the following in Bio/Seq/Seq.py: class Seq: ... def __getitem__(self, i): return self.data[i] # Seq API requirement def __getslice__(self, i, j): # Seq API requirement i = max(i, 0); j = max(j, 0) return Seq(self.data[i:j], self.alphabet) Quoting: http://docs.python.org/ref/sequence-methods.html > __getslice__ > Deprecated since release 2.0. Support slice objects as parameters > to the > __getitem__() method. Here is an example of how the current code can fail on any Python 2.x version. These all work: from Bio.Seq import Seq x = Seq('ACTATCGTAGTACGGCT') assert isinstance(x[0], str) assert isinstance(x[1], str) assert isinstance(x[2], str) assert isinstance(x[-1], str) assert isinstance(x[1:5], Seq) assert isinstance(x[0:-1], Seq) assert isinstance(x[:], Seq) But, the following variants using a stride will give a string because they are handled by our old fashioned __getitem__ method: x[1:2:3] x[slice(1, 2)] x[slice(1, 2, 3)] x[slice(None)] x[slice(None, None)] x[slice(None, None, None)] x[::-1] x[slice(None, None, -1)] The last two return a reversed string (rather than a reversed Seq) I propose we remove the Seq object's __getslice__ method, and replace the __getitem__ method with a slice aware version. I'll prepare a patch... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 17:33:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 17:33:51 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708162133.l7GLXp8k012237@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #721 is|0 |1 obsolete| | ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-16 17:33 EST ------- Created an attachment (id=730) --> (http://bugzilla.open-bio.org/attachment.cgi?id=730&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method (v2) Updated patch, two changes: - requesting all/part of a single row returns a Seq, not an alignment object - returns Seq objects for all/part of a single row or column (not strings) Recap: align[r,c] gives a single character as a string align[r] gives a row as a SeqRecord align[r,:] or align[r,c1:c2] gives all or part of a row as a Seq align[:,c] or align[r1:r2,c] gives all or part of a column as a Seq align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 NOTE - I am deliberately not attempting to implement __setslice__ at this point. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 18:07:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 18:07:15 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708162207.l7GM7FuS015736@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-16 18:07 EST ------- Created an attachment (id=731) --> (http://bugzilla.open-bio.org/attachment.cgi?id=731&action=view) Patch to Bio/Seq.py to fix slicing with strides This passes test_seq.py (and I think all the other unit tests) but I would like someone who uses MutableSeq objects to double check that bit just in case. The "mini self test" added to the end of Bio/Seq.py in the patch would probably be better off being added to the test_seq.py unit test. P.S. I found this issue during work on bug 1944 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 17 00:56:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Aug 2007 00:56:28 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708170456.l7H4uSDG019960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-08-17 00:56 EST ------- I tried this patch on MutableSeqs, and found no problems with it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 17 10:38:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Aug 2007 10:38:50 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708171438.l7HEco2h028751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-17 10:38 EST ------- Are you happy for me to check this in Michiel (with the little test moved to test_seq.py)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 10:25:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 10:25:19 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708181425.l7IEPJJC004760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-08-18 10:25 EST ------- > Are you happy for me to check this in Michiel > (with the little test moved to test_seq.py)? That is fine with me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 12:35:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 12:35:35 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708181635.l7IGZZbu015100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-18 12:35 EST ------- Fixed in CVS, Bio/seq.py revision: 1.12 Tests/test_seq.py revision: 1.3 Tests/output/test_seq revision: 1.3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 16:19:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 16:19:36 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708182019.l7IKJaHg005402@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-18 16:19 EST ------- I've marked this bug as closed since the code and the unit test have been checked in. I've put a little example on the wiki here, http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum We should still add something similar to the tutorial (perhaps based on Sebastian's comment 7 examples?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 10:34:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 10:34:09 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708191434.l7JEY9PW016605@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-08-19 10:34 EST ------- Whereas I am largely happy with this patch, one thing keeps bothering me: > align[r,:] or align[r,c1:c2] gives all or part of a row as a Seq > align[:,c] or align[r1:r2,c] gives all or part of a column as a Seq A Seq is a string with an alphabet attached. I think it is not advisable to require that all sequences in an alignment have the same alphabet. For example, one sequence may be IUPACUnambiguousDNA, another one IUPACAmbiguousDNA. Or, one is IUPACProtein, and an another one the generic Alphabet because the user did not explicitly specify the alphabet when creating the Seq object. I don't see anything fundamentally wrong with that. So, if we cannot guarantee that all rows in the alignment have the same alphabet, then we cannot really return a column of the alignment as a Seq -- we won't know the appropriate alphabet. From this viewpoint, align[:,c] or align[r1:r2,c] returning a string seems more natural, and then I'd expect align[r,:] or align[r,c1:c2] also to return a string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 12:01:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 12:01:00 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708191601.l7JG10Ho022418@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-19 12:01 EST ------- I thought returning Seq objects rather than strings might be contentious *grin* >From an ideological point of view, returning strings undermines the use of the Seq object in the first place. > A Seq is a string with an alphabet attached. I think it is not > advisable to require that all sequences in an alignment have the > same alphabet. We don't have to require this. The alignment as a whole should have an alphabet even if it is the lowest common denominator (like the generic single letter alphabet). It would be reasonable for the user to create a "generic protein" alignment where some of the SeqRecords have a more precise alphabet such as IUPACProtein. Or, someone might have a "generic nucleotide" alignment where some SeqRecords are DNA and other RNA (this is a bit odd). > For example, one sequence may be IUPACUnambiguousDNA, another one > IUPACAmbiguousDNA. That would be fine - In this case the user should construct their alignment with any of IUPACAmbiguousDNA, generic DNA, generic nucleotide or even generic single letter. > Or, one is IUPACProtein, and an another one the generic Alphabet because > the user did not explicitly specify the alphabet when creating the Seq object. In this example, the only sensible choice of alphabet for the whole alignment would be a generic one. > I don't see anything fundamentally wrong with that. Neither do I. Its nicer (and probably normal) to have all the sequences in the same alignment with the same alphabet, but not essential. > So, if we cannot guarantee that all rows in the alignment have the same > alphabet, then we cannot really return a column of the alignment as a Seq > -- we won't know the appropriate alphabet. But we DO know an appropriate alphabet - whatever was specified for the entire aligment (even if this is the generic single letter alphabet). So in the patch I used that for any column or part column. For any given row or part row, we can take the specific alphabet of the associated SeqRecord (which may be more specific than the alphabet defined for the whole alignment). > From this viewpoint, align[:,c] or align[r1:r2,c] returning a string seems > more natural, and then I'd expect align[r,:] or align[r,c1:c2] also to > return a string. You haven't convinced me. Note that at the moment, when an alignment is created "by hand", you must specify an alphabet (defaulting to the generic single letter alphabet would be reasonable). The add sequence method currently only takes strings, so all the SeqRecords will be created with the same alphabet as specified for the whole alignment. I think the suggested append() method should accept SeqRecords, provided their alphabet matches that of the alignment or is a subclass of the alignment's alphabet. Using the SeqIO.to_alignment() function or otherwise assigning SeqRecords directly to the alignment._records private list this can be overcome. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 18:10:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 18:10:48 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708192210.l7JMAmco015359@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #730 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-19 18:10 EST ------- Created an attachment (id=732) --> (http://bugzilla.open-bio.org/attachment.cgi?id=732&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method (v3) Updated patch now that I have checked in a fix for bug 2348 Added explicit __iter__ method, this makes it very clear to anyone reading the code how iteratation over the rows as SeqRecord objects works, and is probably a bit faster than having python do this for us via __getitem__ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 23 10:07:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Aug 2007 10:07:18 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708231407.l7NE7ILq007046@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-08-23 10:07 EST ------- > You haven't convinced me. Then, let's use your current patch for now to address Marc's concerns in the original post. We can get back to the design of the Alignment class later, after rethinking the Seq/SeqRecord classes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 24 10:18:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Aug 2007 10:18:19 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200708241418.l7OEIJmD026547@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-24 10:18 EST ------- Looking at Ed's patch, from the point of view of the pure python modules in Biopython, if we currently have something this trivial example: import Numeric m = Numeric.zeros([3,3], "f") This becomes: import Bio.numpy_wrapper m = numpy_wrapper.zeros([3,3], "f") where the Bio.numpy_wrapper and other similar classes act as proxies for the real Numeric, or numpy's backwards compatible numpy.oldnumeric, depending on what was used when numpy_selector.c was compiled. i.e. If Biopython was compiled with Numeric, then numpy_selector.c will tell the wrapper classes to import Numeric etc. If Biopython was compiled with NumPy support, then numpy_selector.c will tell the wrapper classes to import numpy.oldnumeric etc. This shouldn't matter for anyone compiling from source, but for Windows users I guess we'll have to provide two versions of the installer during the transition period from Numeric to NumPy. I imagine that Linux distributions will also have to handle the switch at some point too... Are the two C interfaces not binary compatible? Is is really not possible to make the choice at run time? Any progress on that 64bit problem? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 27 14:40:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Aug 2007 14:40:32 -0400 Subject: [Biopython-dev] [Bug 2351] New: Make SeqRecord subclass Seq subclass string? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Summary: Make SeqRecord subclass Seq subclass string? Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk We've started talking on the mailing list about making the SeqRecord class a subclass of the Seq object, and making that a subclass of the Python string. This bug is for holding patches - I suspect a lot of the discussion will continueon the mailing lists rather than here. I explicitly have left the "assign to" field pointing at the dev mailinglist. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 27 18:12:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Aug 2007 18:12:53 -0400 Subject: [Biopython-dev] [Bug 2351] Make SeqRecord subclass Seq subclass string? In-Reply-To: Message-ID: <200708272212.l7RMCrbZ009227@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-27 18:12 EST ------- Created an attachment (id=735) --> (http://bugzilla.open-bio.org/attachment.cgi?id=735&action=view) Patch to Bio.Seq and Bio.SeqRecord (1) Makes __str__ return the full sequence as a string for Seq, Mutable and SeqRecord. I think this is essential for making the objects more interchangeable, but left as-is could cause some confusion to beginners because it is now a little bit harder to work out which type of object they are dealing with. We may want to mention something like this in the tutorial: print x.__class__ (2) Adds __iter__ to SeqRecord, which is passed to the Seq object, allowing iteration over the sequence as single character strings. Arguably this should be in a separate patch. (3) Updates docstrings - e.g. Seq and Mutable method .tostring() is considered deprecated. Still lots of things to discuss before we can implement the full subclassing hierachy, for example should SeqRecord splicing (__getitem__) return a Seq or a SeqRecord? If a SeqRecord, then how should the annotation be handled. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Aug 28 16:37:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Aug 2007 16:37:20 -0400 Subject: [Biopython-dev] [Bug 2351] Make SeqRecord subclass Seq subclass string? In-Reply-To: Message-ID: <200708282037.l7SKbKaC019904@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #735 is|0 |1 obsolete| | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-28 16:37 EST ------- Created an attachment (id=736) --> (http://bugzilla.open-bio.org/attachment.cgi?id=736&action=view) Patch to Bio.Seq and Bio.SeqRecord (v2) More controversial patch, which in addition to changes in comment 1 also: (4) adds __len__ and count to SeqRecord (trivial) (5) adds __getitem__, __add__, __radd__ methods to SeqRecord (which try and be sensible with the meta-data) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 04:25:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 04:25:12 -0400 Subject: [Biopython-dev] [Bug 2353] New: Problem parsing Swissprot (UniProt) files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2353 Summary: Problem parsing Swissprot (UniProt) files Product: Biopython Version: 1.43 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ibdeno at gmail.com I installed biopython-py24-1.43-1001 via fink on an iBook G4. I have found that parsing a Uniprot database from the archaeon M.thermoautotrophicum (downloaded from Integr8) using Bio.SwissProt produces errors. For example, the code (in a file called testing.py): 8<-------------------------------------------- # reading a SwissProt entry from a file from Bio.SwissProt import SProt from sys import * handle = open(argv[1]) sp = SProt.Iterator(handle, SProt.RecordParser()) record = sp.next() print record.entry_name print record.sequence --------------------------------------------------->8 run as: python2.4 testing.py 27.M_thermoautotrophicum.dat gives: Traceback (most recent call last): File "testing.py", line 8, in ? record = sp.next() File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 172, in next return self._parser.parse(File.StringHandle(data)) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 296, in parse self._scanner.feed(handle, self._consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 338, in feed self._scan_record(uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 343, in _scan_record fn(self, uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 483, in _scan_sq self._scan_line('SQ', uhandle, consumer.sequence_header, exactly_one=1) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 365, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/sw/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'SQ': PE 3: Inferred from homology; I have found that this is due to the presence in this file of lines starting with "PE" (as in the example) or with "**". Once I eliminate these lines, there is no problem. In my opinion the parser should deal more elegantly with cases were the records don't have a recognized start... Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 05:42:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 05:42:30 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708290942.l7T9gU8x028892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 05:42 EST ------- Thanks for your report and the diagnosis. That bug has actually already been reported and fixed (just two weeks ago). You'll need to update /sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py with the latest version from CVS (or wait for the next release of Biopython, hopefully later this year). If you don't want to use CVS, then you can download the file from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SwissProt/SProt.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Make a backup of the old version, just in case ;) Peter *** This bug has been marked as a duplicate of bug 2340 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 05:42:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 05:42:38 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200708290942.l7T9gchg028914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ibdeno at gmail.com ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 05:42 EST ------- *** Bug 2353 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 07:49:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 07:49:36 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291149.l7TBnaoB003020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #2 from ibdeno at gmail.com 2007-08-29 07:49 EST ------- Created an attachment (id=739) --> (http://bugzilla.open-bio.org/attachment.cgi?id=739&action=view) example entry giving the described error -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 07:50:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 07:50:52 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291150.l7TBoqA8003174@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #3 from ibdeno at gmail.com 2007-08-29 07:50 EST ------- Dear Peter, Thank you. I'm very sorry to have submitted a duplicated bug... I searched the database with the "Line does not start with" present in the error message, but I realize now I should have done a better search. I have installed the file you recommended me (and removed the SProt.pyc file) and now I don't get the error on entries with "PE" records, but still it fails with the same message on entries having records starting with "**". I'm attaching one (asteriskexample.dat) as an example. Thank you again! Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 08:10:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 08:10:26 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291210.l7TCAQev005528@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|DUPLICATE | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 08:10 EST ------- You are right, its only a partial duplicate of bug 2340 (the PE lines problem). The ** lines are a new issue. I've reopened this bug report. Searching for ACDB_METTH on www.expasy.org gives a normal looking entry with no ** lines, http://www.expasy.org/uniprot/O27745.txt http://www.expasy.org/cgi-bin/get-sprot-entry?O27745 Where did you get the attached ACDB_METTH SwissProt file from? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 08:17:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 08:17:32 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291217.l7TCHWHa005909@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #5 from ibdeno at gmail.com 2007-08-29 08:17 EST ------- In fact I have the complete data from M.thermoautotrophicum downloaded from Integr8: ftp://ftp.ebi.ac.uk/pub/databases/integr8/uniprot/proteomes/27.M_thermoautotrophicum.dat.gz I thought that this might be too particular to Integr8 and perhaps not worth hard-wiring in the SProt.py code. However, couldn't the parser end more gracefully on a record having unknown starter identifier? Perhaps giving a message, but not stopping. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 04:41:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 04:41:02 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708300841.l7U8f2ws031433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Problem parsing Swissprot |Swissprot (UniProt) files |(UniProt) files |with ** lines fail to parse ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 04:41 EST ------- Changed the summary. Based on a few of those files from the EBI, the ** lines appear to only be found between the FT (feature table) and SQ (sequence) sections, if present. It should be simple to update the parser to ignore these lines when present... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 04:43:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 04:43:11 -0400 Subject: [Biopython-dev] [Bug 2340] Swissprot 54 release (UniProt) files with PE lines (protein evidence) fail In-Reply-To: Message-ID: <200708300843.l7U8hBTA031497@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Summary|SProt.py fails to parse the |Swissprot 54 release |current Swiss-Prot version |(UniProt) files with PE |54.0 |lines (protein evidence) | |fail ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 04:43 EST ------- Edited the bug summary to try and help anyone searching for this problem, e.g. the partial duplicate Bug 2353 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 10:03:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:03:01 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301403.l7UE31Aq016525@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 ------- Comment #2 from tiagoantao at gmail.com 2007-08-30 10:03 EST ------- Created an attachment (id=740) --> (http://bugzilla.open-bio.org/attachment.cgi?id=740&action=view) Diff to tutorial.tex with FDist documentation -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 10:03:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:03:50 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301403.l7UE3oEd016668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED OS/Version|Linux |All Version|1.24 |Not Applicable -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 10:05:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:05:48 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301405.l7UE5m81016828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #532 is|0 |1 obsolete| | ------- Comment #3 from tiagoantao at gmail.com 2007-08-30 10:05 EST ------- (From update of attachment 532) A much newer version is on CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 30 10:20:32 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 15:20:32 +0100 Subject: [Biopython-dev] Bio.PopGen status Message-ID: <6d941f120708300720l41290267pdec912102899513f@mail.gmail.com> Hi! This is a small mail to inform all of the effort to create a Bio.PopGen. What is currently available doesn't still deserve to be called a Population Genetics module per se. But I think we are getting there... So what is available? There is code, test code and documentation for working with GenePop files, a format which I suppose is reasonably widely used in population genetics (at least when not considering sequence based data). I am thinking in closing the related bug. There is code, test code and documentation (in this case, under review) to work with Fdist. FDist is a moderately used selection detection application. The main purpose of this code is to serve as a "commit exercise" of moderate dimension before starting to commit more important stuff (therefore learning and making mistakes with a less important component). 3 important parts follow: Statistics, Coalescent Simulations and HapMap. For these parts there is already code written... Statistics: Ralph Haygood sent me code to deal with sequence based data. I have myself code to deal with no-sequence based data. I will work on merging both code bases. Documentation and test code will follow. At this point I think we could say that we have a bare bones Bio.PopGen module. Coalescent Simulations: There exists written (and published on a journal) code to work with simcoal2. Most documentation is also written. At this point I would guess Bio.PopGen would compare rather favorably with BioPerl. HapMap: Part of the code is written, but more will have to be done. This is the current status of things as I see it from here... Comments, corrections, discussion would be most welcome... -- http://www.tiago.org/ps From tiagoantao at gmail.com Thu Aug 30 10:31:42 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 15:31:42 +0100 Subject: [Biopython-dev] Jython and sqlite Message-ID: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Hi! Just a mail to ask about Jython and sqlite. About Jython: Is there any policy on Jython support? I am myself a Jython user, and in the code that I submitted to Bio.PopGen I tend to try to support Jython at least partially. I sometimes use older Python dialects (because Jython is still on 2.2) and avoid changing directories when calling external applications (the JVM doesn't support the concept of changing directories). The code that I have submitted (not the test code) supports Jython even when calling external applications. sqllite: On code that I intend to submit in the near future (HapMap related), I currently use the sqlite module. The major problem is, that module is Python 2.5 only. Therefore it requires a new version of Python (and probably won't be supported in Jython in the near future or ever). OTOH it is quite convenient: An embedded relational database (without the hassle of asking users to install a database server). Any ideas on this? Regards, Tiago -- http://www.tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Thu Aug 30 10:51:17 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Aug 2007 15:51:17 +0100 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Message-ID: <46D6D965.4060101@maubp.freeserve.co.uk> Tiago Ant?o wrote: > Hi! > > Just a mail to ask about Jython and sqlite. > > About Jython: Is there any policy on Jython support? I am myself a > Jython user, and in the code that I submitted to Bio.PopGen I tend to > try to support Jython at least partially. I'm not aware of anyone trying Biopython on Jython (or even other variants like Iron Python). Trying Bio.PopGen on Jython sounds sensible - have you tried the Biopython unit tests to see what happens? > I sometimes use older Python > dialects (because Jython is still on 2.2) ... Biopython has a stated dependence on Python 2.3 or later, so writing to Python 2.2 should be fine. I myself use both Python 2.3 and 2.4, but some of the new stuff in 2.5 like generator expressions may tempt me to update my machines. > ... and avoid changing > directories when calling external applications (the JVM doesn't > support the concept of changing directories). The code that I have > submitted (not the test code) supports Jython even when calling > external applications. Good - I would agree with you avoiding changing the current directory is wise (especially as we support multiple OS). Leave that up to the user. > sqllite: On code that I intend to submit in the near future (HapMap > related), I currently use the sqlite module. The major problem is, > that module is Python 2.5 only. Therefore it requires a new version of > Python (and probably won't be supported in Jython in the near future > or ever). OTOH it is quite convenient: An embedded relational database > (without the hassle of asking users to install a database server). Any > ideas on this? I haven't used sqllite before, but my initial reaction is wanting to use a Python 2.5 only module would be bad. Have you looked at the existing BioSQL code in Biopython... Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 30 12:34:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 12:34:30 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708301634.l7UGYUGr026431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 12:34 EST ------- I think I have fixed this now, please try Bio/SwissProt/SProt.py revision 1.40 from CVS, or from the webpage in about an hours time: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython Please post back to let us know if that worked. Thanks. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 14:19:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 14:19:49 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200708301819.l7UIJnVO031718@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|LATER | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 14:19 EST ------- Was "RESOLVED LATER", re-opening in order to mark as fixed... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 14:21:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 14:21:11 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200708301821.l7UILBEp031795@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 14:21 EST ------- Fixed in Bio/GenBank/__init__.py revision 1.69 Relevant unit tests all pass: test_GenBank, test_GenBankFormat and test_SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 30 18:01:51 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 23:01:51 +0100 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <46D6D965.4060101@maubp.freeserve.co.uk> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> <46D6D965.4060101@maubp.freeserve.co.uk> Message-ID: <6d941f120708301501l2c2f510ckc1bf0097daa27aef@mail.gmail.com> On 8/30/07, Peter wrote: > I'm not aware of anyone trying Biopython on Jython (or even other > variants like Iron Python). Trying Bio.PopGen on Jython sounds sensible > - have you tried the Biopython unit tests to see what happens? Nothing good. I personally use a customized version of Jython from CVS, based on 2.3. I tried with the supplied version on Kubuntu (Gutsy Gibbon) which is 2.1. And it doesn't even complete the import phase. I use Jython myself a bit (Most public software that I do runs inside Java Web Start), and, to be honest, it is still a bit behind CPython. My main question was to see if people would be stressed if I sometimes use old Python dialects because of the fact that I use part of the code inside Jython. Although I suppose that most of the code that I will be committing from now on will not be (even partially) developed inside Jython. > I haven't used sqllite before, but my initial reaction is wanting to use > a Python 2.5 only module would be bad. Have you looked at the existing > BioSQL code in Biopython... I had a brief look at BioSQL. It is not targeted at HapMap at all (I was not expecting it to be). I think I am going to subscribe to the BioSQL mailing list, throw the question about HapMap support and wait for answers, although my guess is that it probably won't happen. In that case probably the best solution would be something like anydbm. Any thoughts? Tiago -- http://www.tiago.org/ps From sbassi at gmail.com Thu Aug 30 18:25:53 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 30 Aug 2007 19:25:53 -0300 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Message-ID: On 8/30/07, Tiago Ant?o wrote: > sqllite: On code that I intend to submit in the near future (HapMap > related), I currently use the sqlite module. The major problem is, > that module is Python 2.5 only. Therefore it requires a new version of There is also a sqlite module for previous version of Python. So I guess you could check python version at the beginning of your code and then set the import properly. The code will just run with python >=2.5. For older version, it will require the standalone sqlite executable and pysqlite2 (available from http://pysqlite.org/). This is what is happening now with reportlab and other external programs that are needed for some biopython modules. This will ad optional software requirement to Biopython, only when running with Python<=2.5 and when the user want to run your module. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Fri Aug 31 03:22:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 03:22:11 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708310722.l7V7MBtL029808@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #8 from ibdeno at gmail.com 2007-08-31 03:22 EST ------- Hi, sorry for the late answer. I downloaded SProt.py and installed it as before. The parser doesn't fail now on the "**" lines. Strangely enough, though, it seems to fail again on "PE" (protein evidence) lines. See an example: Traceback (most recent call last): File "./molsprot.py", line 196, in ? main() File "./molsprot.py", line 144, in main for record in iterator: File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 172, in next return self._parser.parse(File.StringHandle(data)) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 296, in parse self._scanner.feed(handle, self._consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 338, in feed self._scan_record(uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 343, in _scan_record fn(self, uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 495, in _scan_sq self._scan_line('SQ', uhandle, consumer.sequence_header, exactly_one=1) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 365, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/sw/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'SQ': PE 1: Evidence at protein level; I didn't use CVS, just downloaded SProt.py from the link you sent. Perhaps I should download other files? Thank you! Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 03:30:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 03:30:55 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708310730.l7V7Utub030464@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #9 from ibdeno at gmail.com 2007-08-31 03:30 EST ------- Hi again. Adding to my previous post, I have identified the new culprit. They are not normal PE lines but rather look like this one: ** PROSITE; PS00591; GLYCOSYL_HYDROL_F10; FALSE_POS_1. That is, they have a "**" instead of the usual "DR" code. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 06:09:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:09:06 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311009.l7VA9622005171@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 06:09 EST ------- The Biopython Swiss-Prot parser expects the different line types in a certain order - my change meant it would allow a ** line between the FT and SQ lines (as in your attachment 739 for ACDB_METTH). I had assumed these ** lines would allways be at the same position, but from your description, some entries have ** lines further up (near the DR line). Could you attach the entry causing trouble please? And if you can spot any other variations by eye, that would be good. I may need to make a more general fix to ignore ** lines in multiple places... but I would prefer minimal changes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 06:37:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:37:01 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311037.l7VAb1pp006296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #11 from ibdeno at gmail.com 2007-08-31 06:37 EST ------- Created an attachment (id=742) --> (http://bugzilla.open-bio.org/attachment.cgi?id=742&action=view) Entry with "**" line in different position Here goes attached an entry giving problems with a "**" line in a position different from the previous one. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 06:38:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:38:49 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311038.l7VAcnXd006390@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #12 from ibdeno at gmail.com 2007-08-31 06:38 EST ------- Created an attachment (id=743) --> (http://bugzilla.open-bio.org/attachment.cgi?id=743&action=view) context of "**" lines Hi again. This one is the output of: grep -C 2 "^\*\*" 27.M_thermoautotrophicum.dat > asterisklines So you can see the context where these lines appear (in case there are various possibilities) Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 07:52:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 07:52:24 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311152.l7VBqOJS009597@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 07:52 EST ------- That grep output was a nice idea. It looks like while most cases are FR, **, SQ there are several variations to cover. Please try Bio/SwissProt/SProt.py revision 1.41, which should cope with the ** lines anywhere (except some positions within references). This is the only file changed, so you don't need to worry about updating anything else in Biopython. I have tried this on the entire files 27.M_thermoautotrophicum.dat and 121.T_whipplei_Twist.dat from ftp://ftp.ebi.ac.uk/pub/databases/integr8/uniprot/proteomes/ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 09:38:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 09:38:15 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311338.l7VDcFc8016806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #14 from ibdeno at gmail.com 2007-08-31 09:38 EST ------- Excellent! No errors now. Thank you very much for your help. Best, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 09:44:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 09:44:21 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311344.l7VDiLlY017155@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 09:44 EST ------- Great. I'm marking this bug as fixed - please reopen it if you manage to find any other ** files which break. Thanks for the report. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Aug 1 01:50:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 31 Jul 2007 21:50:05 -0400 Subject: [Biopython-dev] Improving the Alignment object References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FD@mail2.exch.c2b2.columbia.edu> Peter wrote: > I'm not sure if requests for part of a single row or column like > [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning > sub-alignments or as special cases (strings/Seq and Seq/SeqRecord > respectively?). Jan wrote: > For instance, the Alignment object should > support changing characters in the alignment without a need of copying > it (using aln[a,x] = "D"). Can it be done now with Alignment which is > a list of SeqRecord objects with sequences implemented as immutable Seq > objects ? > If we allow >>> aln[a,x] = "D" then we should also allow >>> aln[a,x:x+4] = "DEFG" >>> aln[a:a+5,x] = "KLMNO" and perhaps even >>> aln[a:a+5,x:x+3] = ["KLMNO","PQRST","UVWXY"] For consistency, I feel that then aln[a,x:y] and aln[a:b,x] should both return a string. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Wed Aug 8 10:59:25 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 08 Aug 2007 11:59:25 +0100 Subject: [Biopython-dev] Subversion Repository (moving from CVS to SVN) In-Reply-To: <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> References: <128a885f0610092146y5a184ccfw31d433d228a9b05d@mail.gmail.com> <128a885f0703092006v51581253t143339abd3d9ad75@mail.gmail.com> <45F235B7.6000409@c2b2.columbia.edu> <128a885f0703180914t482ab33bid2c1eebdd9888fd@mail.gmail.com> <464A07BB.8020206@maubp.freeserve.co.uk> <128a885f0705191321k32354ecdnafb9912443b9367f@mail.gmail.com> <128a885f0705302130t628794e7v681dc02058244913@mail.gmail.com> <465E906F.1080704@maubp.freeserve.co.uk> <128a885f0706181224o609d4a28oe69cb12c5383d45b@mail.gmail.com> Message-ID: <46B9A20D.4010303@maubp.freeserve.co.uk> Chris Lasher wrote: >>> I'm obviously missing another target, and BOSC 2007 is fast >>> approaching. >> >> Are you going to BOSC 2007 Chris? > > I wish I were going to BOSC, but unfortunately, I will not go. While at BOSC 2007 I had a chance to chat to Jason Stajich from BioPerl and the Open Bioinformatics Foundation (OBF, the nice guys who look after our servers). The BioPerl project is looking at moving from CVS to SVN, and assuming that all goes smoothly, moving Biopython over as well should be simple enough. Peter From mdehoon at c2b2.columbia.edu Wed Aug 8 02:57:23 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 7 Aug 2007 22:57:23 -0400 Subject: [Biopython-dev] Bio.Wise Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Hi everybody, Bio.Wise currently causes a deprecation warning when running the Biopython tests (using Biopython from CVS). This warning is caused by the deprecated Bio.SeqIO.FASTA: # In Bio.Wise.__init__.py: from Bio.SeqIO.FASTA import FastaReader, FastaWriter The FastaReader, FastaWriter functions are used as follows: for filename, input_file in zip(pair, input_files): input_file.close() FastaWriter(file(input_file.name, "w")).write(FastaReader(file(filename)).next()) To me, it looks like all this does is to read one Fasta record from filename, and then store it in input_file. I was wondering why we go through the Fasta reader/writer instead of reading/writing the file contents directly, as in for filename, input_file in zip(pair, input_files): input_file.close() file(input_file.name, "w").write(file(filename).read()) On a related note, the input_file refers to a temporary file. To create this temporary file, Bio.Wise prefers to use NamedTemporaryFile in the poly module, instead of NamedTemporaryFile in the tempfile module: try: import poly _NamedTemporaryFile = poly.NamedTemporaryFile except ImportError: import tempfile try: _NamedTemporaryFile = tempfile.NamedTemporaryFile except AttributeError: # no NamedTemporaryFile on 2.2, stuck without it _NamedTemporaryFile = tempfile.TemporaryFile The tempfile module is in the Python standard library, the poly module is not. Is using the poly module still relevant? I am asking since the current code in Bio.Wise does not seem to be handling temporary files correctly, and it'll be easier to fix it if we don't have to consider both poly.NamedTemporaryFile and tempfile.NamedTemporaryFile. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Aug 9 00:28:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 8 Aug 2007 20:28:31 -0400 Subject: [Biopython-dev] Bio.Wise References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> Sebastian Bassi wrote: > On 8/7/07, Michiel De Hoon wrote: > > I was wondering why we go through the Fasta reader/writer instead of > > reading/writing the file contents directly, as in > > for filename, input_file in zip(pair, input_files): > > input_file.close() > > file(input_file.name, "w").write(file(filename).read()) > > The old Fasta writer used to write a 70 column formated fasta file. > Your method (and I think also the new seq.io) write the fasta data as > a one big line. Peter, can we change the behavior of SeqIO.write so that it writes the fasta data in some fixed column format? For comparison, Bioperl appears to use a column width of 60 characters: http://www.bioperl.org/wiki/FASTA_sequence_format --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Thu Aug 9 08:10:22 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 09 Aug 2007 09:10:22 +0100 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> Message-ID: <46BACBEE.20301@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Sebastian Bassi wrote: >> On 8/7/07, Michiel De Hoon wrote: >>> I was wondering why we go through the Fasta reader/writer instead of >>> reading/writing the file contents directly, as in >>> for filename, input_file in zip(pair, input_files): >>> input_file.close() >>> file(input_file.name, "w").write(file(filename).read()) >> The old Fasta writer used to write a 70 column formated fasta file. >> Your method (and I think also the new seq.io) write the fasta data as >> a one big line. Maybe wise doesn't like its input as one long line? > Peter, can we change the behavior of SeqIO.write so that it writes the fasta > data in some fixed column format? For comparison, Bioperl appears to use a > column width of 60 characters: > > http://www.bioperl.org/wiki/FASTA_sequence_format > > --Michiel. That would be easy, and might improve compatibility with some tools which recommend the lines be at most 80 letters long. 60 does seem to be considered a default. My personal preference is with no line breaks, partly because I tend to work more with domain sequences (usually less than 100 characters). This also means that when viewing a sequence in a text editor I can simply halve the line number to get the record number. Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files with a max sequence line length of 60. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 9 10:58:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 9 Aug 2007 06:58:44 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708091058.l79Awimb007337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #26 from tiagoantao at gmail.com 2007-08-09 06:58 EST ------- Created an attachment (id=724) --> (http://bugzilla.open-bio.org/attachment.cgi?id=724&action=view) Documentation for the GenePop parser -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Thu Aug 9 17:22:59 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 9 Aug 2007 14:22:59 -0300 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <46BACBEE.20301@maubp.freeserve.co.uk> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> <46BACBEE.20301@maubp.freeserve.co.uk> Message-ID: On 8/9/07, Peter wrote: .... > Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files > with a max sequence line length of 60. Do you mean a default length of 60, but could be set to other length if desired (as before with the old fasta writer)? That is good to me. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Sat Aug 11 05:35:57 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 01:35:57 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110535.l7B5Zv63020979@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 01:35 EST ------- I have committed the documentation for the GenePop parser to CVS. Next time, please don't attach your patch to a bug report that is unrelated to GenePop. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 06:15:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:15:03 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110615.l7B6F3cq023236@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:15 EST ------- [Comment 22 from Peter] > I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py > and noticed that while crc64, gcg and seguid will cope with both strings and > Seq objects, crc32 will only cope with strings. > > Any objections to me fixing this like so: [Comment 24 from Michiel] > A better solution would be for Seq to inherit from str, instead of Seq having > str as a member. Then we don't have to modify crc32, and other code in > Biopython will also become simpler. [Comment 25 from Peter] > Changing the Seq object to be a subclass of string might be nice... > More importantly, wouldn't this dramatic change break a lot of > existing scripts? Probably something for the mailing list! OK, so I have committed your solution from comment #22 to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 06:37:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:37:25 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110637.l7B6bPhu024290@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:37 EST ------- I have committed the unit test by Peter (from comment #23) to CVS, with some slight modifications to remove the try/except at the end. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 06:43:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:43:25 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110643.l7B6hPRw024637@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:43 EST ------- [In reply to comment #21]: > Maybe it could be useful to add a 'GCG checksum' attribute to the > BioPython Seq object. Note that you can already do that without changing Biopython: >>> from Bio.Seq import Seq >>> s = Seq("ACGT") >>> from Bio.SeqUtils import CheckSum >>> s.crc32 = CheckSum.crc32(s) >>> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 06:45:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 02:45:36 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110645.l7B6jaLh024750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 02:45 EST ------- (In reply to comment #14 from Sebastian) > [Michiel:] > > We should also add your example from #7 to the manual. > > I could add it to the wiki, but after the code is in its place in CVS so the > sample would refer to the proper module. Could you add your example to the Wiki or the manual? As far as I can tell, we are then ready to close this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sat Aug 11 06:47:32 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 15:47:32 +0900 Subject: [Biopython-dev] Bio.PopGen tests fail Message-ID: <46BD5B84.9000302@c2b2.columbia.edu> The current unit tests for Bio.PopGen fail, apparently due to the output files being missing. Tiago (or somebody else who has those files), could you add them? --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 07:27:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 03:27:39 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708110727.l7B7Rdcv026746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-11 03:27 EST ------- Regarding comment 26 and comment 27, I think Tiago ment to attach that file bug 2170 - whoops. Regarding comment 28, that sounds great Michiel. I agree that the example in comment 7 would be a nice addition to the manual or wiki. One final thing, I would like to rewrite the doc string comments for Bio/SeqUtils/CheckSum to follow PEP 257 more closely (which a lot of our code seems to do). http://www.python.org/dev/peps/pep-0257/ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 11:42:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 07:42:03 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708111142.l7BBg3M4014540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-11 07:42 EST ------- In biopython/Bio/SeqUtils/CheckSum.py revision 1.3 I have tweaked the docstring comments to closer follow PEP 257 and existing Biopython usage. I also added my change from comment 22 (which Michiel said he did in comment 28, but wasn't in CVS yet). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 11 12:33:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 11 Aug 2007 08:33:50 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708111233.l7BCXoKj017566@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 08:33 EST ------- > I also added my change from comment 22 (which Michiel said he did in comment > 28, but wasn't in CVS yet). Sometimes it takes several hours before a CVS commit actually shows up (I don't know why). If mine arrives later, it may overwrite your change. Let's see in a day or two if something funny shows up in CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sat Aug 11 12:33:53 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Aug 2007 13:33:53 +0100 Subject: [Biopython-dev] Reorganising the tutorial Message-ID: <46BDACB1.8040304@maubp.freeserve.co.uk> I'd like to expand the section of Chapter 2 on sequence input/output into an entire chapter (between the current Chapter 2 Quick Start and Chapter 3 BLAST). Any comments? Suggestions for examples? Peter From mdehoon at c2b2.columbia.edu Sat Aug 11 13:14:21 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 22:14:21 +0900 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDACB1.8040304@maubp.freeserve.co.uk> References: <46BDACB1.8040304@maubp.freeserve.co.uk> Message-ID: <46BDB62D.2010209@c2b2.columbia.edu> Peter wrote: > I'd like to expand the section of Chapter 2 on sequence input/output > into an entire chapter (between the current Chapter 2 Quick Start and > Chapter 3 BLAST). > > Any comments? Suggestions for examples? By all means, go for it. I feel that sequence input/output is a big enough topic to deserve a topic of its own. --Michiel. From mdehoon at c2b2.columbia.edu Sat Aug 11 13:39:15 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 11 Aug 2007 22:39:15 +0900 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDB62D.2010209@c2b2.columbia.edu> References: <46BDACB1.8040304@maubp.freeserve.co.uk> <46BDB62D.2010209@c2b2.columbia.edu> Message-ID: <46BDBC03.5020501@c2b2.columbia.edu> Michiel de Hoon wrote: > Peter wrote: >> I'd like to expand the section of Chapter 2 on sequence input/output >> into an entire chapter (between the current Chapter 2 Quick Start and >> Chapter 3 BLAST). >> >> Any comments? Suggestions for examples? > > By all means, go for it. I feel that sequence input/output is a big > enough topic to deserve a topic of its own. This should be: > enough topic to deserve a chapter of its own. ^^^^^^^ --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Aug 11 20:48:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 11 Aug 2007 21:48:01 +0100 Subject: [Biopython-dev] Wrapping sequences in Fasta output In-Reply-To: <46BACBEE.20301@maubp.freeserve.co.uk> References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B602@mail2.exch.c2b2.columbia.edu> <46BACBEE.20301@maubp.freeserve.co.uk> Message-ID: <46BE2081.2080408@maubp.freeserve.co.uk> Peter wrote: > Any other views? Otherwise I'll change Bio.SeqIO to write FASTA files > with a max sequence line length of 60. I've switched the default from no wrapping to 60 characters in Bio/SeqIO/FastaIO.py Peter From tiagoantao at gmail.com Sat Aug 11 21:18:03 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 11 Aug 2007 22:18:03 +0100 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: <200708110535.l7B5Zv63020979@portal.open-bio.org> References: <200708110535.l7B5Zv63020979@portal.open-bio.org> Message-ID: <6d941f120708111418s7b6607ceo95bd2f3199024bae@mail.gmail.com> Sorry for this mistake. I don't know how it got there, the idea was to attach this to 2170 for Peter to review. Tiago On 8/11/07, bugzilla-daemon at portal.open-bio.org wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2323 > > > > > > ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2007-08-11 01:35 EST ------- > I have committed the documentation for the GenePop parser to CVS. > Next time, please don't attach your patch to a bug report that is unrelated to > GenePop. > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- http://www.tiago.org/ps From mdehoon at c2b2.columbia.edu Sun Aug 12 14:18:25 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 12 Aug 2007 23:18:25 +0900 Subject: [Biopython-dev] Bio.Wise In-Reply-To: References: <6243BAA9F5E0D24DA41B27997D1FD14402B601@mail2.exch.c2b2.columbia.edu> Message-ID: <46BF16B1.1010304@c2b2.columbia.edu> Thank you for your explanation. It makes sense now. I have uploaded a modified version of Bio.Wise.__init__.py to CVS. Please let me know if there are any changes you don't agree with. Thanks again, --Michiel. Michael Hoffman wrote: > [Michiel De Hoon] > >> # In Bio.Wise.__init__.py: >> from Bio.SeqIO.FASTA import FastaReader, FastaWriter >> >> The FastaReader, FastaWriter functions are used as follows: >> >> for filename, input_file in zip(pair, input_files): >> input_file.close() >> FastaWriter(file(input_file.name, >> "w")).write(FastaReader(file(filename)).next()) >> >> To me, it looks like all this does is to read one Fasta record from >> filename, >> and then store it in input_file. > > I believe this was done to smooth out troublesome Fasta files because > the Biopython parser was more versatile than that in > Wise2. Specifically, there was a maximum line length restriction in > Wise2. Piping this through a Biopython read/write pairing ensures that > all the lines are short enough to be read in. > >> I am asking since the current code in Bio.Wise does not seem to be >> handling temporary files correctly, and it'll be easier to fix it if >> we don't have to consider both poly.NamedTemporaryFile and >> tempfile.NamedTemporaryFile. > > If it makes things easier for you, please do it by all means. From biopython-dev at maubp.freeserve.co.uk Mon Aug 13 19:04:03 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Aug 2007 20:04:03 +0100 Subject: [Biopython-dev] Reorganising the tutorial In-Reply-To: <46BDB62D.2010209@c2b2.columbia.edu> References: <46BDACB1.8040304@maubp.freeserve.co.uk> <46BDB62D.2010209@c2b2.columbia.edu> Message-ID: <46C0AB23.10202@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I'd like to expand the section of Chapter 2 on sequence input/output >> into an entire chapter (between the current Chapter 2 Quick Start and >> Chapter 3 BLAST). >> >> Any comments? Suggestions for examples? > > By all means, go for it. I feel that sequence input/output is a big > enough topic to deserve a chapter of its own. I've made the SeqIO section into a new chapter, and added some examples of using it to write files. If anyone spots any typos, please let me know ;) I think we should update the SWISS-PROT and GENBANK sections of the Cookbook chapter to either mention Bio.SeqIO as an alternative to using Bio.SwissProt and Bio.GenBank directly for parsing the files. Any comments? Peter From biopython-dev at maubp.freeserve.co.uk Mon Aug 13 22:59:42 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 13 Aug 2007 23:59:42 +0100 Subject: [Biopython-dev] Which NCBI / Entrez module? Message-ID: <46C0E25E.6060007@maubp.freeserve.co.uk> I've just been updating the Tutorial to expand the SeqIO documentation into a full chapter, and one of the things it now covers is parsing a handle to an online databases. For the SwissProt example I was guided by the existing tutorial code and used Bio.WWW.ExPASy.get_sprot_raw() which works fine (but interestingly only fetches one record). I then added an example fetching GenBank records from the NCBI, based on the existing tutorial code which uses Bio.GenBank to do some searches and retrieve records by their GI number. I decided to use Bio.GenBank.download_many() with Bio.SeqIO.parse() in the new example - and this works nicely. Now, looking over the code, the "online" parts of Bio.GenBank are using Bio.EUtils, a complex bit of code dated 2003 by Andrew Dalke. There is another (older and much smaller) module Bio.WWW.NCBI dated 1999-2000 by Jeffrey Chang, which also offers an EUtils interface. This does make an appearance in the tutorial in the "Connecting with biological databases" section. Bio.WWW.NCBI seems to just build EntreZ URLs, and returns raw data as provided by the NCBI. Bio.EUtils says it also does this, and offers a higher level interface supporting history tracking and parsing of query results (in XML). Is anyone here very familiar with either of these modules? Should we depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update its documentation to recommend using that instead? Peter From mdehoon at c2b2.columbia.edu Tue Aug 14 10:40:53 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 14 Aug 2007 06:40:53 -0400 Subject: [Biopython-dev] Which NCBI / Entrez module? References: <46C0E25E.6060007@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B603@mail2.exch.c2b2.columbia.edu> Peter wrote: > Is anyone here very familiar with either of these modules? Should we > depreciate Bio.WWW.NCBI in favour of Bio.EUtils - or perhaps just update > its documentation to recommend using that instead? As Bio.EUtils is more advanced than Bio.WWW.NCBI, I'd be in favor of deprecating Bio.WWW.NCBI if there is sufficient documentation for Bio.EUtils. Currently though, we have some documentation for Bio.WWW.NCBI but little for Bio.EUtils. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2912 bytes Desc: not available URL: From sbassi at gmail.com Tue Aug 14 19:32:14 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 16:32:14 -0300 Subject: [Biopython-dev] Error in doc. Message-ID: I don't know if it worthwhile to open a bug on bugzilla for this little mistake. On http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial002.html There is a line: >>> from Bio.Tools import Translate The correct line should be: >>> from Bio import Translate I guess that this document may not be the latest official document, but is under bioinformatics.org domain and is returned by Google on the top searchers. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython-dev at maubp.freeserve.co.uk Tue Aug 14 20:30:49 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Aug 2007 21:30:49 +0100 Subject: [Biopython-dev] Error in doc. In-Reply-To: References: Message-ID: <46C210F9.1050905@maubp.freeserve.co.uk> Sebastian Bassi wrote: > I don't know if it worthwhile to open a bug on bugzilla for this little mistake. > > On http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial002.html > There is a line: >>>> from Bio.Tools import Translate > The correct line should be: >>>> from Bio import Translate > > I guess that this document may not be the latest official document, > but is under bioinformatics.org domain and is returned by Google on > the top searchers. That document is an old version of the Biopython tutorial, the latest version is here and that line has been fixed: http://biopython.org/DIST/docs/tutorial/Tutorial.html I presume the page you found belongs to Brad Chapman - one of the core contributors to Biopython some years back. Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 16 21:15:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 17:15:11 -0400 Subject: [Biopython-dev] [Bug 2348] New: Slicing the Seq object (returns a string when use a stride) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2348 Summary: Slicing the Seq object (returns a string when use a stride) Product: Biopython Version: 1.43 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I think this is a bug introduced to changes in how python deals with splicing. Currently we have the following in Bio/Seq/Seq.py: class Seq: ... def __getitem__(self, i): return self.data[i] # Seq API requirement def __getslice__(self, i, j): # Seq API requirement i = max(i, 0); j = max(j, 0) return Seq(self.data[i:j], self.alphabet) Quoting: http://docs.python.org/ref/sequence-methods.html > __getslice__ > Deprecated since release 2.0. Support slice objects as parameters > to the > __getitem__() method. Here is an example of how the current code can fail on any Python 2.x version. These all work: from Bio.Seq import Seq x = Seq('ACTATCGTAGTACGGCT') assert isinstance(x[0], str) assert isinstance(x[1], str) assert isinstance(x[2], str) assert isinstance(x[-1], str) assert isinstance(x[1:5], Seq) assert isinstance(x[0:-1], Seq) assert isinstance(x[:], Seq) But, the following variants using a stride will give a string because they are handled by our old fashioned __getitem__ method: x[1:2:3] x[slice(1, 2)] x[slice(1, 2, 3)] x[slice(None)] x[slice(None, None)] x[slice(None, None, None)] x[::-1] x[slice(None, None, -1)] The last two return a reversed string (rather than a reversed Seq) I propose we remove the Seq object's __getslice__ method, and replace the __getitem__ method with a slice aware version. I'll prepare a patch... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 21:33:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 17:33:51 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708162133.l7GLXp8k012237@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #721 is|0 |1 obsolete| | ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-16 17:33 EST ------- Created an attachment (id=730) --> (http://bugzilla.open-bio.org/attachment.cgi?id=730&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method (v2) Updated patch, two changes: - requesting all/part of a single row returns a Seq, not an alignment object - returns Seq objects for all/part of a single row or column (not strings) Recap: align[r,c] gives a single character as a string align[r] gives a row as a SeqRecord align[r,:] or align[r,c1:c2] gives all or part of a row as a Seq align[:,c] or align[r1:r2,c] gives all or part of a column as a Seq align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 NOTE - I am deliberately not attempting to implement __setslice__ at this point. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 16 22:07:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 16 Aug 2007 18:07:15 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708162207.l7GM7FuS015736@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-16 18:07 EST ------- Created an attachment (id=731) --> (http://bugzilla.open-bio.org/attachment.cgi?id=731&action=view) Patch to Bio/Seq.py to fix slicing with strides This passes test_seq.py (and I think all the other unit tests) but I would like someone who uses MutableSeq objects to double check that bit just in case. The "mini self test" added to the end of Bio/Seq.py in the patch would probably be better off being added to the test_seq.py unit test. P.S. I found this issue during work on bug 1944 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 17 04:56:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Aug 2007 00:56:28 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708170456.l7H4uSDG019960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-08-17 00:56 EST ------- I tried this patch on MutableSeqs, and found no problems with it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 17 14:38:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 17 Aug 2007 10:38:50 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708171438.l7HEco2h028751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-17 10:38 EST ------- Are you happy for me to check this in Michiel (with the little test moved to test_seq.py)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 14:25:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 10:25:19 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708181425.l7IEPJJC004760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-08-18 10:25 EST ------- > Are you happy for me to check this in Michiel > (with the little test moved to test_seq.py)? That is fine with me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 16:35:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 12:35:35 -0400 Subject: [Biopython-dev] [Bug 2348] Slicing the Seq object (returns a string when use a stride) In-Reply-To: Message-ID: <200708181635.l7IGZZbu015100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2348 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-18 12:35 EST ------- Fixed in CVS, Bio/seq.py revision: 1.12 Tests/test_seq.py revision: 1.3 Tests/output/test_seq revision: 1.3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Aug 18 20:19:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 18 Aug 2007 16:19:36 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200708182019.l7IKJaHg005402@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-18 16:19 EST ------- I've marked this bug as closed since the code and the unit test have been checked in. I've put a little example on the wiki here, http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum We should still add something similar to the tutorial (perhaps based on Sebastian's comment 7 examples?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 14:34:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 10:34:09 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708191434.l7JEY9PW016605@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-08-19 10:34 EST ------- Whereas I am largely happy with this patch, one thing keeps bothering me: > align[r,:] or align[r,c1:c2] gives all or part of a row as a Seq > align[:,c] or align[r1:r2,c] gives all or part of a column as a Seq A Seq is a string with an alphabet attached. I think it is not advisable to require that all sequences in an alignment have the same alphabet. For example, one sequence may be IUPACUnambiguousDNA, another one IUPACAmbiguousDNA. Or, one is IUPACProtein, and an another one the generic Alphabet because the user did not explicitly specify the alphabet when creating the Seq object. I don't see anything fundamentally wrong with that. So, if we cannot guarantee that all rows in the alignment have the same alphabet, then we cannot really return a column of the alignment as a Seq -- we won't know the appropriate alphabet. From this viewpoint, align[:,c] or align[r1:r2,c] returning a string seems more natural, and then I'd expect align[r,:] or align[r,c1:c2] also to return a string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 16:01:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 12:01:00 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708191601.l7JG10Ho022418@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-19 12:01 EST ------- I thought returning Seq objects rather than strings might be contentious *grin* >From an ideological point of view, returning strings undermines the use of the Seq object in the first place. > A Seq is a string with an alphabet attached. I think it is not > advisable to require that all sequences in an alignment have the > same alphabet. We don't have to require this. The alignment as a whole should have an alphabet even if it is the lowest common denominator (like the generic single letter alphabet). It would be reasonable for the user to create a "generic protein" alignment where some of the SeqRecords have a more precise alphabet such as IUPACProtein. Or, someone might have a "generic nucleotide" alignment where some SeqRecords are DNA and other RNA (this is a bit odd). > For example, one sequence may be IUPACUnambiguousDNA, another one > IUPACAmbiguousDNA. That would be fine - In this case the user should construct their alignment with any of IUPACAmbiguousDNA, generic DNA, generic nucleotide or even generic single letter. > Or, one is IUPACProtein, and an another one the generic Alphabet because > the user did not explicitly specify the alphabet when creating the Seq object. In this example, the only sensible choice of alphabet for the whole alignment would be a generic one. > I don't see anything fundamentally wrong with that. Neither do I. Its nicer (and probably normal) to have all the sequences in the same alignment with the same alphabet, but not essential. > So, if we cannot guarantee that all rows in the alignment have the same > alphabet, then we cannot really return a column of the alignment as a Seq > -- we won't know the appropriate alphabet. But we DO know an appropriate alphabet - whatever was specified for the entire aligment (even if this is the generic single letter alphabet). So in the patch I used that for any column or part column. For any given row or part row, we can take the specific alphabet of the associated SeqRecord (which may be more specific than the alphabet defined for the whole alignment). > From this viewpoint, align[:,c] or align[r1:r2,c] returning a string seems > more natural, and then I'd expect align[r,:] or align[r,c1:c2] also to > return a string. You haven't convinced me. Note that at the moment, when an alignment is created "by hand", you must specify an alphabet (defaulting to the generic single letter alphabet would be reasonable). The add sequence method currently only takes strings, so all the SeqRecords will be created with the same alphabet as specified for the whole alignment. I think the suggested append() method should accept SeqRecords, provided their alphabet matches that of the alignment or is a subclass of the alignment's alphabet. Using the SeqIO.to_alignment() function or otherwise assigning SeqRecords directly to the alignment._records private list this can be overcome. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Aug 19 22:10:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 19 Aug 2007 18:10:48 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708192210.l7JMAmco015359@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #730 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-19 18:10 EST ------- Created an attachment (id=732) --> (http://bugzilla.open-bio.org/attachment.cgi?id=732&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method (v3) Updated patch now that I have checked in a fix for bug 2348 Added explicit __iter__ method, this makes it very clear to anyone reading the code how iteratation over the rows as SeqRecord objects works, and is probably a bit faster than having python do this for us via __getitem__ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 23 14:07:18 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 23 Aug 2007 10:07:18 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200708231407.l7NE7ILq007046@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-08-23 10:07 EST ------- > You haven't convinced me. Then, let's use your current patch for now to address Marc's concerns in the original post. We can get back to the design of the Alignment class later, after rethinking the Seq/SeqRecord classes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 24 14:18:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 24 Aug 2007 10:18:19 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200708241418.l7OEIJmD026547@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-24 10:18 EST ------- Looking at Ed's patch, from the point of view of the pure python modules in Biopython, if we currently have something this trivial example: import Numeric m = Numeric.zeros([3,3], "f") This becomes: import Bio.numpy_wrapper m = numpy_wrapper.zeros([3,3], "f") where the Bio.numpy_wrapper and other similar classes act as proxies for the real Numeric, or numpy's backwards compatible numpy.oldnumeric, depending on what was used when numpy_selector.c was compiled. i.e. If Biopython was compiled with Numeric, then numpy_selector.c will tell the wrapper classes to import Numeric etc. If Biopython was compiled with NumPy support, then numpy_selector.c will tell the wrapper classes to import numpy.oldnumeric etc. This shouldn't matter for anyone compiling from source, but for Windows users I guess we'll have to provide two versions of the installer during the transition period from Numeric to NumPy. I imagine that Linux distributions will also have to handle the switch at some point too... Are the two C interfaces not binary compatible? Is is really not possible to make the choice at run time? Any progress on that 64bit problem? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 27 18:40:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Aug 2007 14:40:32 -0400 Subject: [Biopython-dev] [Bug 2351] New: Make SeqRecord subclass Seq subclass string? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Summary: Make SeqRecord subclass Seq subclass string? Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk We've started talking on the mailing list about making the SeqRecord class a subclass of the Seq object, and making that a subclass of the Python string. This bug is for holding patches - I suspect a lot of the discussion will continueon the mailing lists rather than here. I explicitly have left the "assign to" field pointing at the dev mailinglist. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 27 22:12:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 27 Aug 2007 18:12:53 -0400 Subject: [Biopython-dev] [Bug 2351] Make SeqRecord subclass Seq subclass string? In-Reply-To: Message-ID: <200708272212.l7RMCrbZ009227@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-27 18:12 EST ------- Created an attachment (id=735) --> (http://bugzilla.open-bio.org/attachment.cgi?id=735&action=view) Patch to Bio.Seq and Bio.SeqRecord (1) Makes __str__ return the full sequence as a string for Seq, Mutable and SeqRecord. I think this is essential for making the objects more interchangeable, but left as-is could cause some confusion to beginners because it is now a little bit harder to work out which type of object they are dealing with. We may want to mention something like this in the tutorial: print x.__class__ (2) Adds __iter__ to SeqRecord, which is passed to the Seq object, allowing iteration over the sequence as single character strings. Arguably this should be in a separate patch. (3) Updates docstrings - e.g. Seq and Mutable method .tostring() is considered deprecated. Still lots of things to discuss before we can implement the full subclassing hierachy, for example should SeqRecord splicing (__getitem__) return a Seq or a SeqRecord? If a SeqRecord, then how should the annotation be handled. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Aug 28 20:37:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 28 Aug 2007 16:37:20 -0400 Subject: [Biopython-dev] [Bug 2351] Make SeqRecord subclass Seq subclass string? In-Reply-To: Message-ID: <200708282037.l7SKbKaC019904@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #735 is|0 |1 obsolete| | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-28 16:37 EST ------- Created an attachment (id=736) --> (http://bugzilla.open-bio.org/attachment.cgi?id=736&action=view) Patch to Bio.Seq and Bio.SeqRecord (v2) More controversial patch, which in addition to changes in comment 1 also: (4) adds __len__ and count to SeqRecord (trivial) (5) adds __getitem__, __add__, __radd__ methods to SeqRecord (which try and be sensible with the meta-data) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 08:25:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 04:25:12 -0400 Subject: [Biopython-dev] [Bug 2353] New: Problem parsing Swissprot (UniProt) files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2353 Summary: Problem parsing Swissprot (UniProt) files Product: Biopython Version: 1.43 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ibdeno at gmail.com I installed biopython-py24-1.43-1001 via fink on an iBook G4. I have found that parsing a Uniprot database from the archaeon M.thermoautotrophicum (downloaded from Integr8) using Bio.SwissProt produces errors. For example, the code (in a file called testing.py): 8<-------------------------------------------- # reading a SwissProt entry from a file from Bio.SwissProt import SProt from sys import * handle = open(argv[1]) sp = SProt.Iterator(handle, SProt.RecordParser()) record = sp.next() print record.entry_name print record.sequence --------------------------------------------------->8 run as: python2.4 testing.py 27.M_thermoautotrophicum.dat gives: Traceback (most recent call last): File "testing.py", line 8, in ? record = sp.next() File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 172, in next return self._parser.parse(File.StringHandle(data)) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 296, in parse self._scanner.feed(handle, self._consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 338, in feed self._scan_record(uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 343, in _scan_record fn(self, uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 483, in _scan_sq self._scan_line('SQ', uhandle, consumer.sequence_header, exactly_one=1) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 365, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/sw/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'SQ': PE 3: Inferred from homology; I have found that this is due to the presence in this file of lines starting with "PE" (as in the example) or with "**". Once I eliminate these lines, there is no problem. In my opinion the parser should deal more elegantly with cases were the records don't have a recognized start... Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 09:42:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 05:42:30 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708290942.l7T9gU8x028892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 05:42 EST ------- Thanks for your report and the diagnosis. That bug has actually already been reported and fixed (just two weeks ago). You'll need to update /sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py with the latest version from CVS (or wait for the next release of Biopython, hopefully later this year). If you don't want to use CVS, then you can download the file from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SwissProt/SProt.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Make a backup of the old version, just in case ;) Peter *** This bug has been marked as a duplicate of bug 2340 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 09:42:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 05:42:38 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200708290942.l7T9gchg028914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ibdeno at gmail.com ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 05:42 EST ------- *** Bug 2353 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 11:49:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 07:49:36 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291149.l7TBnaoB003020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #2 from ibdeno at gmail.com 2007-08-29 07:49 EST ------- Created an attachment (id=739) --> (http://bugzilla.open-bio.org/attachment.cgi?id=739&action=view) example entry giving the described error -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 11:50:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 07:50:52 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291150.l7TBoqA8003174@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #3 from ibdeno at gmail.com 2007-08-29 07:50 EST ------- Dear Peter, Thank you. I'm very sorry to have submitted a duplicated bug... I searched the database with the "Line does not start with" present in the error message, but I realize now I should have done a better search. I have installed the file you recommended me (and removed the SProt.pyc file) and now I don't get the error on entries with "PE" records, but still it fails with the same message on entries having records starting with "**". I'm attaching one (asteriskexample.dat) as an example. Thank you again! Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 12:10:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 08:10:26 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291210.l7TCAQev005528@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|DUPLICATE | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-29 08:10 EST ------- You are right, its only a partial duplicate of bug 2340 (the PE lines problem). The ** lines are a new issue. I've reopened this bug report. Searching for ACDB_METTH on www.expasy.org gives a normal looking entry with no ** lines, http://www.expasy.org/uniprot/O27745.txt http://www.expasy.org/cgi-bin/get-sprot-entry?O27745 Where did you get the attached ACDB_METTH SwissProt file from? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Aug 29 12:17:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 29 Aug 2007 08:17:32 -0400 Subject: [Biopython-dev] [Bug 2353] Problem parsing Swissprot (UniProt) files In-Reply-To: Message-ID: <200708291217.l7TCHWHa005909@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #5 from ibdeno at gmail.com 2007-08-29 08:17 EST ------- In fact I have the complete data from M.thermoautotrophicum downloaded from Integr8: ftp://ftp.ebi.ac.uk/pub/databases/integr8/uniprot/proteomes/27.M_thermoautotrophicum.dat.gz I thought that this might be too particular to Integr8 and perhaps not worth hard-wiring in the SProt.py code. However, couldn't the parser end more gracefully on a record having unknown starter identifier? Perhaps giving a message, but not stopping. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 08:41:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 04:41:02 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708300841.l7U8f2ws031433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Problem parsing Swissprot |Swissprot (UniProt) files |(UniProt) files |with ** lines fail to parse ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 04:41 EST ------- Changed the summary. Based on a few of those files from the EBI, the ** lines appear to only be found between the FT (feature table) and SQ (sequence) sections, if present. It should be simple to update the parser to ignore these lines when present... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 08:43:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 04:43:11 -0400 Subject: [Biopython-dev] [Bug 2340] Swissprot 54 release (UniProt) files with PE lines (protein evidence) fail In-Reply-To: Message-ID: <200708300843.l7U8hBTA031497@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Summary|SProt.py fails to parse the |Swissprot 54 release |current Swiss-Prot version |(UniProt) files with PE |54.0 |lines (protein evidence) | |fail ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 04:43 EST ------- Edited the bug summary to try and help anyone searching for this problem, e.g. the partial duplicate Bug 2353 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 14:03:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:03:01 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301403.l7UE31Aq016525@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 ------- Comment #2 from tiagoantao at gmail.com 2007-08-30 10:03 EST ------- Created an attachment (id=740) --> (http://bugzilla.open-bio.org/attachment.cgi?id=740&action=view) Diff to tutorial.tex with FDist documentation -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 14:03:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:03:50 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301403.l7UE3oEd016668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED OS/Version|Linux |All Version|1.24 |Not Applicable -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 14:05:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 10:05:48 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200708301405.l7UE5m81016828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #532 is|0 |1 obsolete| | ------- Comment #3 from tiagoantao at gmail.com 2007-08-30 10:05 EST ------- (From update of attachment 532) A much newer version is on CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 30 14:20:32 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 15:20:32 +0100 Subject: [Biopython-dev] Bio.PopGen status Message-ID: <6d941f120708300720l41290267pdec912102899513f@mail.gmail.com> Hi! This is a small mail to inform all of the effort to create a Bio.PopGen. What is currently available doesn't still deserve to be called a Population Genetics module per se. But I think we are getting there... So what is available? There is code, test code and documentation for working with GenePop files, a format which I suppose is reasonably widely used in population genetics (at least when not considering sequence based data). I am thinking in closing the related bug. There is code, test code and documentation (in this case, under review) to work with Fdist. FDist is a moderately used selection detection application. The main purpose of this code is to serve as a "commit exercise" of moderate dimension before starting to commit more important stuff (therefore learning and making mistakes with a less important component). 3 important parts follow: Statistics, Coalescent Simulations and HapMap. For these parts there is already code written... Statistics: Ralph Haygood sent me code to deal with sequence based data. I have myself code to deal with no-sequence based data. I will work on merging both code bases. Documentation and test code will follow. At this point I think we could say that we have a bare bones Bio.PopGen module. Coalescent Simulations: There exists written (and published on a journal) code to work with simcoal2. Most documentation is also written. At this point I would guess Bio.PopGen would compare rather favorably with BioPerl. HapMap: Part of the code is written, but more will have to be done. This is the current status of things as I see it from here... Comments, corrections, discussion would be most welcome... -- http://www.tiago.org/ps From tiagoantao at gmail.com Thu Aug 30 14:31:42 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 15:31:42 +0100 Subject: [Biopython-dev] Jython and sqlite Message-ID: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Hi! Just a mail to ask about Jython and sqlite. About Jython: Is there any policy on Jython support? I am myself a Jython user, and in the code that I submitted to Bio.PopGen I tend to try to support Jython at least partially. I sometimes use older Python dialects (because Jython is still on 2.2) and avoid changing directories when calling external applications (the JVM doesn't support the concept of changing directories). The code that I have submitted (not the test code) supports Jython even when calling external applications. sqllite: On code that I intend to submit in the near future (HapMap related), I currently use the sqlite module. The major problem is, that module is Python 2.5 only. Therefore it requires a new version of Python (and probably won't be supported in Jython in the near future or ever). OTOH it is quite convenient: An embedded relational database (without the hassle of asking users to install a database server). Any ideas on this? Regards, Tiago -- http://www.tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Thu Aug 30 14:51:17 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 30 Aug 2007 15:51:17 +0100 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Message-ID: <46D6D965.4060101@maubp.freeserve.co.uk> Tiago Ant?o wrote: > Hi! > > Just a mail to ask about Jython and sqlite. > > About Jython: Is there any policy on Jython support? I am myself a > Jython user, and in the code that I submitted to Bio.PopGen I tend to > try to support Jython at least partially. I'm not aware of anyone trying Biopython on Jython (or even other variants like Iron Python). Trying Bio.PopGen on Jython sounds sensible - have you tried the Biopython unit tests to see what happens? > I sometimes use older Python > dialects (because Jython is still on 2.2) ... Biopython has a stated dependence on Python 2.3 or later, so writing to Python 2.2 should be fine. I myself use both Python 2.3 and 2.4, but some of the new stuff in 2.5 like generator expressions may tempt me to update my machines. > ... and avoid changing > directories when calling external applications (the JVM doesn't > support the concept of changing directories). The code that I have > submitted (not the test code) supports Jython even when calling > external applications. Good - I would agree with you avoiding changing the current directory is wise (especially as we support multiple OS). Leave that up to the user. > sqllite: On code that I intend to submit in the near future (HapMap > related), I currently use the sqlite module. The major problem is, > that module is Python 2.5 only. Therefore it requires a new version of > Python (and probably won't be supported in Jython in the near future > or ever). OTOH it is quite convenient: An embedded relational database > (without the hassle of asking users to install a database server). Any > ideas on this? I haven't used sqllite before, but my initial reaction is wanting to use a Python 2.5 only module would be bad. Have you looked at the existing BioSQL code in Biopython... Peter From bugzilla-daemon at portal.open-bio.org Thu Aug 30 16:34:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 12:34:30 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708301634.l7UGYUGr026431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 12:34 EST ------- I think I have fixed this now, please try Bio/SwissProt/SProt.py revision 1.40 from CVS, or from the webpage in about an hours time: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython Please post back to let us know if that worked. Thanks. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 18:19:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 14:19:49 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200708301819.l7UIJnVO031718@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|LATER | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 14:19 EST ------- Was "RESOLVED LATER", re-opening in order to mark as fixed... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 30 18:21:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 30 Aug 2007 14:21:11 -0400 Subject: [Biopython-dev] [Bug 2230] GenBank __init__.py: _Scanner import In-Reply-To: Message-ID: <200708301821.l7UILBEp031795@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2230 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-30 14:21 EST ------- Fixed in Bio/GenBank/__init__.py revision 1.69 Relevant unit tests all pass: test_GenBank, test_GenBankFormat and test_SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Aug 30 22:01:51 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 30 Aug 2007 23:01:51 +0100 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <46D6D965.4060101@maubp.freeserve.co.uk> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> <46D6D965.4060101@maubp.freeserve.co.uk> Message-ID: <6d941f120708301501l2c2f510ckc1bf0097daa27aef@mail.gmail.com> On 8/30/07, Peter wrote: > I'm not aware of anyone trying Biopython on Jython (or even other > variants like Iron Python). Trying Bio.PopGen on Jython sounds sensible > - have you tried the Biopython unit tests to see what happens? Nothing good. I personally use a customized version of Jython from CVS, based on 2.3. I tried with the supplied version on Kubuntu (Gutsy Gibbon) which is 2.1. And it doesn't even complete the import phase. I use Jython myself a bit (Most public software that I do runs inside Java Web Start), and, to be honest, it is still a bit behind CPython. My main question was to see if people would be stressed if I sometimes use old Python dialects because of the fact that I use part of the code inside Jython. Although I suppose that most of the code that I will be committing from now on will not be (even partially) developed inside Jython. > I haven't used sqllite before, but my initial reaction is wanting to use > a Python 2.5 only module would be bad. Have you looked at the existing > BioSQL code in Biopython... I had a brief look at BioSQL. It is not targeted at HapMap at all (I was not expecting it to be). I think I am going to subscribe to the BioSQL mailing list, throw the question about HapMap support and wait for answers, although my guess is that it probably won't happen. In that case probably the best solution would be something like anydbm. Any thoughts? Tiago -- http://www.tiago.org/ps From sbassi at gmail.com Thu Aug 30 22:25:53 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 30 Aug 2007 19:25:53 -0300 Subject: [Biopython-dev] Jython and sqlite In-Reply-To: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> References: <6d941f120708300731k78443be1x858513d0f56e07ca@mail.gmail.com> Message-ID: On 8/30/07, Tiago Ant?o wrote: > sqllite: On code that I intend to submit in the near future (HapMap > related), I currently use the sqlite module. The major problem is, > that module is Python 2.5 only. Therefore it requires a new version of There is also a sqlite module for previous version of Python. So I guess you could check python version at the beginning of your code and then set the import properly. The code will just run with python >=2.5. For older version, it will require the standalone sqlite executable and pysqlite2 (available from http://pysqlite.org/). This is what is happening now with reportlab and other external programs that are needed for some biopython modules. This will ad optional software requirement to Biopython, only when running with Python<=2.5 and when the user want to run your module. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bugzilla-daemon at portal.open-bio.org Fri Aug 31 07:22:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 03:22:11 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708310722.l7V7MBtL029808@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #8 from ibdeno at gmail.com 2007-08-31 03:22 EST ------- Hi, sorry for the late answer. I downloaded SProt.py and installed it as before. The parser doesn't fail now on the "**" lines. Strangely enough, though, it seems to fail again on "PE" (protein evidence) lines. See an example: Traceback (most recent call last): File "./molsprot.py", line 196, in ? main() File "./molsprot.py", line 144, in main for record in iterator: File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 172, in next return self._parser.parse(File.StringHandle(data)) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 296, in parse self._scanner.feed(handle, self._consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 338, in feed self._scan_record(uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 343, in _scan_record fn(self, uhandle, consumer) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 495, in _scan_sq self._scan_line('SQ', uhandle, consumer.sequence_header, exactly_one=1) File "/sw/lib/python2.4/site-packages/Bio/SwissProt/SProt.py", line 365, in _scan_line read_and_call(uhandle, event_fn, start=line_type) File "/sw/lib/python2.4/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'SQ': PE 1: Evidence at protein level; I didn't use CVS, just downloaded SProt.py from the link you sent. Perhaps I should download other files? Thank you! Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 07:30:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 03:30:55 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708310730.l7V7Utub030464@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #9 from ibdeno at gmail.com 2007-08-31 03:30 EST ------- Hi again. Adding to my previous post, I have identified the new culprit. They are not normal PE lines but rather look like this one: ** PROSITE; PS00591; GLYCOSYL_HYDROL_F10; FALSE_POS_1. That is, they have a "**" instead of the usual "DR" code. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 10:09:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:09:06 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311009.l7VA9622005171@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 06:09 EST ------- The Biopython Swiss-Prot parser expects the different line types in a certain order - my change meant it would allow a ** line between the FT and SQ lines (as in your attachment 739 for ACDB_METTH). I had assumed these ** lines would allways be at the same position, but from your description, some entries have ** lines further up (near the DR line). Could you attach the entry causing trouble please? And if you can spot any other variations by eye, that would be good. I may need to make a more general fix to ignore ** lines in multiple places... but I would prefer minimal changes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 10:37:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:37:01 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311037.l7VAb1pp006296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #11 from ibdeno at gmail.com 2007-08-31 06:37 EST ------- Created an attachment (id=742) --> (http://bugzilla.open-bio.org/attachment.cgi?id=742&action=view) Entry with "**" line in different position Here goes attached an entry giving problems with a "**" line in a position different from the previous one. Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 10:38:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 06:38:49 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311038.l7VAcnXd006390@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #12 from ibdeno at gmail.com 2007-08-31 06:38 EST ------- Created an attachment (id=743) --> (http://bugzilla.open-bio.org/attachment.cgi?id=743&action=view) context of "**" lines Hi again. This one is the output of: grep -C 2 "^\*\*" 27.M_thermoautotrophicum.dat > asterisklines So you can see the context where these lines appear (in case there are various possibilities) Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 11:52:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 07:52:24 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311152.l7VBqOJS009597@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 07:52 EST ------- That grep output was a nice idea. It looks like while most cases are FR, **, SQ there are several variations to cover. Please try Bio/SwissProt/SProt.py revision 1.41, which should cope with the ** lines anywhere (except some positions within references). This is the only file changed, so you don't need to worry about updating anything else in Biopython. I have tried this on the entire files 27.M_thermoautotrophicum.dat and 121.T_whipplei_Twist.dat from ftp://ftp.ebi.ac.uk/pub/databases/integr8/uniprot/proteomes/ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 13:38:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 09:38:15 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311338.l7VDcFc8016806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 ------- Comment #14 from ibdeno at gmail.com 2007-08-31 09:38 EST ------- Excellent! No errors now. Thank you very much for your help. Best, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Aug 31 13:44:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 31 Aug 2007 09:44:21 -0400 Subject: [Biopython-dev] [Bug 2353] Swissprot (UniProt) files with ** lines fail to parse In-Reply-To: Message-ID: <200708311344.l7VDiLlY017155@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2353 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2007-08-31 09:44 EST ------- Great. I'm marking this bug as fixed - please reopen it if you manage to find any other ** files which break. Thanks for the report. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.