From bugzilla-daemon at portal.open-bio.org Tue Oct 2 05:09:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Oct 2007 05:09:48 -0400 Subject: [Biopython-dev] [Bug 2362] test_copen fails on Windows XP as tries os.fork() In-Reply-To: Message-ID: <200710020909.l9299moD015903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2362 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-10-02 05:09 EST ------- I removed test_copen.py from CVS and deprecated the Bio.MultiProc code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 2 05:06:54 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 05:06:54 -0400 Subject: [Biopython-dev] [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Hi everybody, Since no users of Bio.MultiProc came forward, I deprecated it for the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon Sent: Tue 9/11/2007 10:37 AM To: BioPython Developers List; biopython at biopython.org Subject: [BioPython] Bio.MultiProc Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Tue Oct 2 12:00:41 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 2 Oct 2007 09:00:41 -0700 Subject: [Biopython-dev] [BioPython] Bio.MultiProc In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From biopython-dev at maubp.freeserve.co.uk Tue Oct 2 12:55:53 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 02 Oct 2007 17:55:53 +0100 Subject: [Biopython-dev] Bio.MultiProc / Bio.FormatIO In-Reply-To: References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <47027819.1010207@maubp.freeserve.co.uk> Iddo Friedberg wrote: > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is sort of what Michiel did - he's just added a deprecation warning, but not touched the code itself. This isn't an option for some of the more "integrated" bits of code like Bio.FormatIO which I suggested removing in Bug 2361 (see also my email to the main list on 19 September): http://bugzilla.open-bio.org/show_bug.cgi?id=2361#c27 Peter From rhaygood at duke.edu Tue Oct 2 19:59:43 2007 From: rhaygood at duke.edu (Ralph Haygood) Date: Tue, 2 Oct 2007 19:59:43 -0400 (EDT) Subject: [Biopython-dev] Statistics code In-Reply-To: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> Message-ID: Tiago, Sorry to be so long replying---I've been almost drowning in work. Use anything you find useful in my code. If you do write an article about it, I'd be glad to be a coauthor, not just in name but actually to help with writing the discussion of sequence statistics. There *is* a lot of stuff in my code, not all of it generally important. For example, few people will care about indel statistics, beyond counting them and maybe getting the frequency distribution of their lengths. The things most people will care about are K (the number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing. As for ambiguous nucleotides, my code handles them in one of two ways, at the programmer's option. By default, a site at which any sequence in the alignment contains an ambiguous nucleotide is ignored; for example, ACRGTY ACAGTC is effectively equivalent to ACGT ACGT . However, if the 'expand_diplotypes' option is specified when the Sample object is constructed, each sequence in the alignment is interpreted as a diplotype and converted into a pair of pseudo- haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K) being interpreted as heterozygous; for example, ACRGTY ACAGTC is effectively equivalent to ACAGTC ACGGTT ACAGTC ACAGTC . In expand_diplotypes mode, sites containing three- or four-fold ambiguous nucleotides are still ignored. Also, you'll get a warning if you request a statistic that depends on correct SNP phasing, which most statistics don't. So far, I've found these two operating modes sufficient for my needs. I think your plan sounds very reasonable, just adding sequence statistics at a pace that's comfortable for you. Any time you have questions, feel free to ask me, and I'll give you whatever benefit there is in my opinion and experience. I'm happy for all this to happen on biopython-dev, so that other people (e.g., Alex Lancaster) can add to it. I'll leave it to the core developers to tell us if we're too noisy. (I'd recommend still sending messages to me with copies to biopython-dev, however, so that I don't accidentally miss them on biopython-dev, which I don't always read carefully.) Ralph On Sat, 29 Sep 2007, Tiago Ant?o wrote: > Hi Ralph, > > Hope all is good with you. I am now finally starting to commit > statistics code to Biopython. But before I go ahead I would like to > ask some advice to you (plus some extra comments): > > About code merging and authorship: > > I am finally looking to your code. There is really lots of stuff > there! Would it be OK with you if I merged your code with mine into > Bio.PopGen.Stats? Obviously the copyright/authorship for the module > would be co-shared as would any authorship of any article deriving > from it... > > About a strategy to advance: > > 1. I personally don't have any experience, really, with working with > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and > that sort of stuff) > 2. Starting on Monday I am beginning a PhD which will require, part > time, sequence analysis > 3. What I mean from 1 and 2 is that I currently don't have maturity to > architect and design a good framework for sequence analysis but I will > gain it with time. > My plan is then to defer all sequence code until I fell I know what I > am doing (although I was still thinking in providing something like > BioPerl's facility of extracting all SNPs from sequences) > If this is OK with you I plan to start committing code the week > starting on this Monday, > > About request for insight: > > If you have any comments to offer on issues regarding representing > indels and ambiguous data (ie ambiguous nucleotides) they might be > useful, as I suppose that is the biggest issue that makes me afraid of > sequence code. > > > Finally: I would summarize our discussion here on biopython-dev (I am > not taking it there directly just because you might not want your code > on Biopython or might want it in other terms). > > Thanks, > Tiago > From mdehoon at c2b2.columbia.edu Tue Oct 2 20:18:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 20:18:59 -0400 Subject: [Biopython-dev] [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu> > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is what I did. > 3) Leave an option of fixing and commenting the code back in (i.e. it is not > lost forever). Even after removing the code in some future release, the code will not be lost forever. It can always be retrieved from CVS and from older Biopython releases. > Also, is it possible to track down the original author? That would be Jeff Chang. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Iddo Friedberg [mailto:idoerg at gmail.com] Sent: Tue 10/2/2007 12:00 PM To: Michiel De Hoon Cc: BioPython Developers List; biopython at biopython.org Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From tiagoantao at gmail.com Wed Oct 3 06:14:33 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 11:14:33 +0100 Subject: [Biopython-dev] Coalescent code Message-ID: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> Hi, I had a plan of starting to commit statistical related code this weekend, but (contrary to my expectations) I am having requests for the coalescent code. As such, I am planning to commit the coalescent code instead. It is quite straightforward code, with only one issue that I would require advice: Some of the code (regarding modeling demographies) requires some templates (very small text files, circa 10 of around 700 bytes each) to go along. Where should I put the files in Biopython? Also, on installation those files have to be put somewhere... Tiago -- http://www.tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Wed Oct 3 10:18:21 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 03 Oct 2007 15:18:21 +0100 Subject: [Biopython-dev] Coalescent code In-Reply-To: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> Message-ID: <4703A4AD.7030008@maubp.freeserve.co.uk> Tiago Ant?o wrote: > It is quite straightforward code, with only one issue that I would > require advice: Some of the code (regarding modeling demographies) > requires some templates (very small text files, circa 10 of around 700 > bytes each) to go along. Where should I put the files in Biopython? > Also, on installation those files have to be put somewhere... There is a similar precedent with Bio/EUtils/DTDs (where the data files are XML DTD files). I guess you could have the 10 plain text data files in with the python files (or under a subdirectory). Opinions? I should really refresh myself on current python packaging guidelines... Peter From tiagoantao at gmail.com Wed Oct 3 11:37:17 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 16:37:17 +0100 Subject: [Biopython-dev] Statistics code In-Reply-To: References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> Message-ID: <6d941f120710030837k1aa2d4ak7eca8e6e27e35fdd@mail.gmail.com> Ralph, Thanks for the detailed explanation. Because of a couple of requests I had, I am going to commit first the coalescent code, but after the coalescent code is in, I will pick this up. Tiago On 10/3/07, Ralph Haygood wrote: > Tiago, > > Sorry to be so long replying---I've been almost drowning in work. > > Use anything you find useful in my code. If you do write an article > about it, I'd be glad to be a coauthor, not just in name but actually > to help with writing the discussion of sequence statistics. > > There *is* a lot of stuff in my code, not all of it generally > important. For example, few people will care about indel statistics, > beyond counting them and maybe getting the frequency distribution of > their lengths. The things most people will care about are K (the > number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu > and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing. > > As for ambiguous nucleotides, my code handles them in one of two ways, > at the programmer's option. By default, a site at which any sequence > in the alignment contains an ambiguous nucleotide is ignored; for > example, > > ACRGTY > ACAGTC > > is effectively equivalent to > > ACGT > ACGT . > > However, if the 'expand_diplotypes' option is specified when the > Sample object is constructed, each sequence in the alignment is > interpreted as a diplotype and converted into a pair of pseudo- > haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K) > being interpreted as heterozygous; for example, > > ACRGTY > ACAGTC > > is effectively equivalent to > > ACAGTC > ACGGTT > ACAGTC > ACAGTC . > > In expand_diplotypes mode, sites containing three- or four-fold > ambiguous nucleotides are still ignored. Also, you'll get a warning > if you request a statistic that depends on correct SNP phasing, which > most statistics don't. So far, I've found these two operating modes > sufficient for my needs. > > I think your plan sounds very reasonable, just adding sequence > statistics at a pace that's comfortable for you. Any time you have > questions, feel free to ask me, and I'll give you whatever benefit > there is in my opinion and experience. > > I'm happy for all this to happen on biopython-dev, so that other > people (e.g., Alex Lancaster) can add to it. I'll leave it to the > core developers to tell us if we're too noisy. (I'd recommend still > sending messages to me with copies to biopython-dev, however, so that > I don't accidentally miss them on biopython-dev, which I don't always > read carefully.) > > Ralph > > On Sat, 29 Sep 2007, Tiago Ant?o wrote: > > > Hi Ralph, > > > > Hope all is good with you. I am now finally starting to commit > > statistics code to Biopython. But before I go ahead I would like to > > ask some advice to you (plus some extra comments): > > > > About code merging and authorship: > > > > I am finally looking to your code. There is really lots of stuff > > there! Would it be OK with you if I merged your code with mine into > > Bio.PopGen.Stats? Obviously the copyright/authorship for the module > > would be co-shared as would any authorship of any article deriving > > from it... > > > > About a strategy to advance: > > > > 1. I personally don't have any experience, really, with working with > > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and > > that sort of stuff) > > 2. Starting on Monday I am beginning a PhD which will require, part > > time, sequence analysis > > 3. What I mean from 1 and 2 is that I currently don't have maturity to > > architect and design a good framework for sequence analysis but I will > > gain it with time. > > My plan is then to defer all sequence code until I fell I know what I > > am doing (although I was still thinking in providing something like > > BioPerl's facility of extracting all SNPs from sequences) > > If this is OK with you I plan to start committing code the week > > starting on this Monday, > > > > About request for insight: > > > > If you have any comments to offer on issues regarding representing > > indels and ambiguous data (ie ambiguous nucleotides) they might be > > useful, as I suppose that is the biggest issue that makes me afraid of > > sequence code. > > > > > > Finally: I would summarize our discussion here on biopython-dev (I am > > not taking it there directly just because you might not want your code > > on Biopython or might want it in other terms). > > > > Thanks, > > Tiago > > -- http://www.tiago.org/ps From tiagoantao at gmail.com Wed Oct 3 12:04:07 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 17:04:07 +0100 Subject: [Biopython-dev] Coalescent code In-Reply-To: <4703A4AD.7030008@maubp.freeserve.co.uk> References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> <4703A4AD.7030008@maubp.freeserve.co.uk> Message-ID: <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com> Hi On 10/3/07, Peter wrote: > There is a similar precedent with Bio/EUtils/DTDs (where the data files > are XML DTD files). I guess you could have the 10 plain text data files > in with the python files (or under a subdirectory). Opinions? In the mean time, I will start committing the code (I can easily accommodate the details of the places to put the files later, when there is a decision). Michiel, please, please don't include SimCoal code that I will be committing on the next public version. Regards, Tiago From mdehoon at c2b2.columbia.edu Wed Oct 3 20:39:47 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 3 Oct 2007 20:39:47 -0400 Subject: [Biopython-dev] Coalescent code References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com><4703A4AD.7030008@maubp.freeserve.co.uk> <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62E@mail2.exch.c2b2.columbia.edu> > Michiel, please, please don't include SimCoal code that I will be > committing on the next public version. To avoid confusion, please don't commit code to CVS that you don't want to be included in the next Biopython release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Tiago Ant?o Sent: Wed 10/3/2007 12:04 PM To: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] Coalescent code Hi On 10/3/07, Peter wrote: > There is a similar precedent with Bio/EUtils/DTDs (where the data files > are XML DTD files). I guess you could have the 10 plain text data files > in with the python files (or under a subdirectory). Opinions? In the mean time, I will start committing the code (I can easily accommodate the details of the places to put the files later, when there is a decision). Michiel, please, please don't include SimCoal code that I will be committing on the next public version. Regards, Tiago _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Wed Oct 3 22:10:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Oct 2007 22:10:13 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710040210.l942ADGF030763@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-10-03 22:10 EST ------- Looking at the patch for Bio.FormatIO: ------------------------- #Would like to have just issued a deprecation warning, and removed this #module later. However, due to the FormatIO code in Bio/SeqRecord.py the #deprecation warning would be triggered whenever someone used the SeqRecord. raise ImportError, "Bio.FormatIO has been removed. Please try Bio.SeqIO instead" ------------------------- Since the patch for Bio/SeqRecord.py removes its dependence on Bio.FormatIO, is it still necessary to raise an ImportError instead of issuing a DeprecationWarning? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 05:44:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 05:44:09 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710050944.l959i9BX029760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-05 05:44 EST ------- In terms of typical usage, SeqRecord does not depend on FormatIO However, from a code perspective, FormatIO and SeqRecord "depend" on each other. If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken module, I wanted to remove it. A DeprecationWarning doesn't seem right if FormatIO is removed, which is why I suggested an ImportError. We might be able instead to MOVE the FormatIO hooks out of SeqRecord and then issue a DeprecationWarning for FormatIO ... but it looks rather complicated, and probably means tackling the Bio.config code as well. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 07:05:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 07:05:49 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710051105.l95B5nXW001755@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2007-10-05 07:05 EST ------- > If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not > depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken > module, I wanted to remove it. A DeprecationWarning doesn't seem right if > FormatIO is removed, which is why I suggested an ImportError. OK, I see. As far as I'm concerned, your patch is fine then. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 09:46:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 09:46:51 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200710051346.l95Dkpc2010074@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from tiagoantao at gmail.com 2007-10-05 09:46 EST ------- It is implemented, documented and with test code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Fri Oct 5 10:26:43 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Oct 2007 15:26:43 +0100 Subject: [Biopython-dev] Configuration files Message-ID: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> Hi, Is there any (Biopython standard) way to configure Biopython during runtime? When writing code sometimes I think it would be very convenient (especially to the programmer using Biopython) to abstract some configuration parameters away from the code. Things like the location of binaries, hosts, user names (and maybe passwords) of databases, timeout parameters, etc. These could be stored on a configuration file (or registry entry, or whatever) thus saving users to have to deal in the code with supplying these... Just an idea... Tiago -- http://www.tiago.org/ps From bugzilla-daemon at portal.open-bio.org Mon Oct 8 07:14:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Oct 2007 07:14:30 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710081114.l98BEUZh019757@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #759 is|0 |1 obsolete| | ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:14 EST ------- (From update of attachment 759) Applied these changes to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 8 06:52:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 08 Oct 2007 11:52:48 +0100 Subject: [Biopython-dev] Configuration files In-Reply-To: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> Message-ID: <470A0C00.50505@maubp.freeserve.co.uk> Tiago Ant?o wrote: > Hi, > > Is there any (Biopython standard) way to configure Biopython during > runtime? When writing code sometimes I think it would be very > convenient (especially to the programmer using Biopython) to abstract > some configuration parameters away from the code. Things like the > location of binaries, hosts, user names (and maybe passwords) of > databases, timeout parameters, etc. These could be stored on a > configuration file (or registry entry, or whatever) thus saving users > to have to deal in the code with supplying these... > Just an idea... This sounds like a fairly general thing (i.e. for all of python) rather than being Biopython specific. For example, I find a lot of my scripts have a few if statements at the top setting locations of files and executables based on which user/machine I'm running on (I use both Windows and a couple of Linux boxes with different user names). e.g. Where are the blast executables, the blast databases, and my genome collection, ... Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 8 07:30:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Oct 2007 07:30:03 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710081130.l98BU36u021016@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:30 EST ------- Recap, most of the issues were resolved by switching Bio.Fasta from Martel to pure python. Additionally: test_Fasta - 'fixed' by deprecating the Mindy indexing functions test_KEGG - fixed by switching from Martel to pure python test_format_registry - 'fixed' by removing FormatIO test_geo - fixed by switching from Martel to pure python test_GenBankFormat - this entire test is for the little-used Martel GenBank expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 9 00:34:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 9 Oct 2007 00:34:28 -0400 Subject: [Biopython-dev] Output of Biopython tests Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> Hi everybody, With the help of several Biopython developers, especially Peter, the problems with Martel and the new mxTextTools release have now been solved (in the sense that all unit tests now succeed). So we're a lot closer to a new Biopython release. Thanks everybody! When I was running the Biopython tests, one thing bothered me though. All Biopython tests now have a corresponding output file that contains the output the test should generate if it runs correctly. For some tests, this makes perfect sense, particularly if the output is large. For others, on the other hand, having the test output explicitly in a file doesn't actually add much information. For example, the output for test_psw is test_psw test_AlignmentColumn_assertions (test_psw.TestPSW) ... ok test_AlignmentColumn_full (test_psw.TestPSW) ... ok test_AlignmentColumn_kinds (test_psw.TestPSW) ... ok test_AlignmentColumn_repr (test_psw.TestPSW) ... ok test_Alignment_assertions (test_psw.TestPSW) ... ok test_Alignment_normal (test_psw.TestPSW) ... ok test_ColumnUnit (test_psw.TestPSW) ... ok Doctest: Bio.Wise.psw.parse_line ... ok ---------------------------------------------------------------------- Ran 8 tests in 0.002s OK For comparison, this is the test output if test_psw.py fails: test_AlignmentColumn_assertions (__main__.TestPSW) ... ok test_AlignmentColumn_full (__main__.TestPSW) ... ok test_AlignmentColumn_kinds (__main__.TestPSW) ... FAIL test_AlignmentColumn_repr (__main__.TestPSW) ... ok test_Alignment_assertions (__main__.TestPSW) ... ok test_Alignment_normal (__main__.TestPSW) ... ok test_ColumnUnit (__main__.TestPSW) ... ok Doctest: Bio.Wise.psw.parse_line ... ok ====================================================================== FAIL: test_AlignmentColumn_kinds (__main__.TestPSW) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_psw.py", line 47, in test_AlignmentColumn_kinds self.assertEqual(ac.kind, "some_funny_output_I_made_up_instead_of_INSERT") AssertionError: 'INSERT' != 'some_funny_output_I_made_up_instead_of_INSERT' ---------------------------------------------------------------------- Ran 8 tests in 0.000s The point is that for this test, having the output explicitly is not needed in order to identify the problem. Now, for some tests having the output explicitly actually causes a problem. I'm thinking about those unit tests that only run if some particular software is installed on the system (for example, SQL). In those cases, we need to distinguish failure due to missing software from a true failure (the former may not bother the user much if he's not interested in that particular part of Biopython). If a test cannot be run because of missing prerequisites, currently a unit test generates an ImportError, which is then caught inside run_tests. Hence, we get the following output when running the Biopython tests: test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests -- enable tests in Tests/test_BioSQL.py ok When you look inside test_BioSQL.py, you'll see that the actual error is not an ImportError. In addition, if a true ImportError occurs during the test, the test will inadvertently be treated as skipped. My solution would be to skip tests inside test_BioSQL if the prerequisites are not met. However, in that case the test output no longer agrees with the expected test output, generating a failure message. I'd therefore like to suggest the following: 1) Keep the test output, but let each test_* script (instead of run_tests.py) be responsible of comparing the test output with the expected output. 2) If the expected output is trivial, simply use the assert statements to verify the test output instead of storing them in a file and reading them from there. Any objections? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mhobbs_of_lawson at bigpond.com Mon Oct 8 22:18:39 2007 From: mhobbs_of_lawson at bigpond.com (mhobbs_of_lawson) Date: Tue, 9 Oct 2007 12:18:39 +1000 Subject: [Biopython-dev] translate Message-ID: <5496247.1191896319102.JavaMail.root@web06sl> Hi, Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet. Thanks, Matthew >>> from Bio import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio import Translate >>> s = "NNNTCAAAAAGGTGCATCTAGATG" >>> dna = Seq.Seq(s, IUPAC.ambiguous_dna) >>> trans = Translate.ambiguous_dna_by_id[1] >>> print trans.translate(dna) Traceback (most recent call last): File "", line 1, in File "/cygdrive/c/Python24/Lib/site-packages/Bio/Translate.py", line 20, in translate append(get(s[i:i+3], stop_symbol)) File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 544, in get return self.__getitem__(codon) File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 577, in __getitem__ raise TranslationError, codon # does not code Bio.Data.CodonTable.TranslationError: NNN From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 07:54:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 09 Oct 2007 12:54:29 +0100 Subject: [Biopython-dev] translate In-Reply-To: <5496247.1191896319102.JavaMail.root@web06sl> References: <5496247.1191896319102.JavaMail.root@web06sl> Message-ID: <470B6BF5.607@maubp.freeserve.co.uk> mhobbs_of_lawson wrote: > Hi, > > Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet. A very reasonable request. I assume you expect just an X for an NNN codon? I have the general impression that some of Biopython's handling of ambiguous sequences isn't all wonderful... something I have started to tackle in bug 2356: http://bugzilla.open-bio.org/show_bug.cgi?id=2366 Obviously sequence manipulation is a core bit of functionality - and I would like at least one other person to comment on that code before I risk committing it ;) Translation of ambiguous codons would be next on my hit list... as right now it doesn't seem to do what I would expect at all. In the short term, manually adding additional mappings to the forward table (a python dictionary) would probably "fix" your specific issue. While we are on this topic, we use "*" for stop codons and "X" for an ambiguous amino acid - but is anyone aware of a character convention for something that might be either a stop codon or an amino acid? (other than just using "X" for this too)? Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 07:44:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 09 Oct 2007 12:44:01 +0100 Subject: [Biopython-dev] Output of Biopython tests In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> Message-ID: <470B6981.3020707@maubp.freeserve.co.uk> Michiel De Hoon wrote: > When I was running the Biopython tests, one thing bothered me though. > All Biopython tests now have a corresponding output file that > contains the output the test should generate if it runs correctly. > For some tests, this makes perfect sense, particularly if the output > is large. For others, on the other hand, having the test output > explicitly in a file doesn't actually add much information. Is this actually a problem? It gives us a simple unified test framework where developers can use whatever fancy test frameworks they want to. Personally I have tried to write simple scripts with meaningful output (plus often additional assertions). I think that because these are very simple, they can double as examples/documentation for the curious. My personal view is that some of the "fancy frameworks" used in some test cases are very intimidating to a beginner (and act as a barrier to taking the code and modifying it for their own use). > The point is that for this test, having the output explicitly is not > needed in order to identify the problem. True. I would have written that particular test to give some meaningful output; I find it makes it easier to start debugging why a test fails. > Now, for some tests having the output explicitly actually causes a > problem. I'm thinking about those unit tests that only run if some > particular software is installed on the system (for example, SQL). In > those cases, we need to distinguish failure due to missing software > from a true failure (the former may not bother the user much if he's > not interested in that particular part of Biopython). If a test > cannot be run because of missing prerequisites, currently a unit test > generates an ImportError, which is then caught inside run_tests. > ... > When you look inside test_BioSQL.py, you'll see that the actual error > is not an ImportError. In addition, if a true ImportError occurs > during the test, the test will inadvertently be treated as skipped. Perhaps we should introduce a MissingExternalDependency error instead, used for this specific case, and catch that in run_tests.py, while treating ImportError as a real error. As you say, if we have done some dramatic restructuring (such as removing a module) there could be some REAL ImportErrors which we might risk ignoring. > I'd therefore like to suggest the following: > 1) Keep the test output, but let each test_* script (instead of > run_tests.py) be responsible of comparing the test output with the > expected output. I'm not keen on that - it means duplication of code (or at least some common functionality to call) and makes writing simple tests that little bit harder. I like the fact that the more verbose test scripts can be run on their own as an example of what the module can do. > 2) If the expected output is trivial, simply use the assert > statements to verify the test output instead of storing them in a > file and reading them from there. By all means, test trivial output with assertions. I already do this within many of my "verbose" tests where I want to keep the console output reasonably short. Peter From tiagoantao at gmail.com Tue Oct 9 10:27:18 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Oct 2007 15:27:18 +0100 Subject: [Biopython-dev] Configuration files In-Reply-To: <470A0C00.50505@maubp.freeserve.co.uk> References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> <470A0C00.50505@maubp.freeserve.co.uk> Message-ID: <6d941f120710090727m787c08abn13665c662727446c@mail.gmail.com> Would it be interesting to have something like config = Bio.Config.getConfig() fdist_path = config['PopGen.FDistDir'] Something that: 1. Would allow for a standard configuration mechanism (as opposed to having different styles for each module/author) 2. Would abstract away how the configuration is stored (registry, conf file, ...) If there was an agreement on doing this (or something along these lines), I would volunteer the time to do it. On 10/8/07, Peter wrote: > Tiago Ant?o wrote: > > Hi, > > > > Is there any (Biopython standard) way to configure Biopython during > > runtime? When writing code sometimes I think it would be very > > convenient (especially to the programmer using Biopython) to abstract > > some configuration parameters away from the code. Things like the > > location of binaries, hosts, user names (and maybe passwords) of > > databases, timeout parameters, etc. These could be stored on a > > configuration file (or registry entry, or whatever) thus saving users > > to have to deal in the code with supplying these... > > Just an idea... > > This sounds like a fairly general thing (i.e. for all of python) rather > than being Biopython specific. > > For example, I find a lot of my scripts have a few if statements at the > top setting locations of files and executables based on which > user/machine I'm running on (I use both Windows and a couple of Linux > boxes with different user names). > > e.g. Where are the blast executables, the blast databases, and my genome > collection, ... > > Peter > > -- http://www.tiago.org/ps From mhobbs_of_lawson at bigpond.com Tue Oct 9 19:07:43 2007 From: mhobbs_of_lawson at bigpond.com (Matthew Hobbs) Date: Wed, 10 Oct 2007 09:07:43 +1000 Subject: [Biopython-dev] translate In-Reply-To: <470B6BF5.607@maubp.freeserve.co.uk> References: <5496247.1191896319102.JavaMail.root@web06sl> <470B6BF5.607@maubp.freeserve.co.uk> Message-ID: <470C09BF.8050906@bigpond.com> Thanks Peter for your reply. Peter wrote: > mhobbs_of_lawson wrote: >> Please can someone tell me what is wrong here. I simply want to be >> able to translate ambiguous DNA which includes an 'NNN' triplet. > > A very reasonable request. I assume you expect just an X for an NNN codon? yep > In the short term, manually adding additional mappings to the forward > table (a python dictionary) would probably "fix" your specific issue. OK - so this works: from Bio import Seq from Bio.Alphabet import IUPAC from Bio import Translate s = "NNNTCAAAAAGGTGCATCTAGATG" dna = Seq.Seq(s, IUPAC.ambiguous_dna) trans = Translate.ambiguous_dna_by_id[1] trans.table.forward_table.forward_table['NNN'] = 'X' print trans.translate(dna) > While we are on this topic, we use "*" for stop codons and "X" for an > ambiguous amino acid - but is anyone aware of a character convention for > something that might be either a stop codon or an amino acid? (other > than just using "X" for this too)? No I don't know Thanks, Matthew From mdehoon at c2b2.columbia.edu Thu Oct 11 06:31:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:31:59 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> > Perhaps we should introduce a MissingExternalDependency error instead, > used for this specific case, and catch that in run_tests.py, while > treating ImportError as a real error. OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, and modified BioSQL, Bio.GFF, and some test scripts accordingly. When MissingExternalDependencyError occurs in a test, a warning is printed but it is not counted as a failure. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Oct 11 06:44:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:44:56 -0400 Subject: [Biopython-dev] function enumerate in Bio/GFF/GenericTools.py; Bio/DocSQL.py Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B637@mail2.exch.c2b2.columbia.edu> Do we still need the function "enumerate" in Bio/GFF/GenericTools.py and Bio/DocSQL.py? AFAICT, this function does exactly the same as the Python built-in enumerate function. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Oct 11 06:31:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:31:59 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> > Perhaps we should introduce a MissingExternalDependency error instead, > used for this specific case, and catch that in run_tests.py, while > treating ImportError as a real error. OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, and modified BioSQL, Bio.GFF, and some test scripts accordingly. When MissingExternalDependencyError occurs in a test, a warning is printed but it is not counted as a failure. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2910 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071011/fc06d7c7/attachment.bin From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 16:44:46 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Oct 2007 21:44:46 +0100 Subject: [Biopython-dev] Revised tutorial Message-ID: <470E8B3E.6080709@maubp.freeserve.co.uk> In anticipation of the next release, I've done some more work on the tutorial today -- in particular the section on the Seq object which I have turned into a new chapter. If anyone has the time to go over this soon that would be great. I'll be away tomorrow (Friday) but will probably have time to make any revisions needed at the weekend. Its here in CVS: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython This is a LaTeX file which gets turned into the PDF and HTML versions of the tutorial using pdflatex and hevea. If you want to proof read but don't know anything about LaTeX then I can probably email you the PDF version for comment (half a megabyte). Peter From sbassi at gmail.com Thu Oct 11 18:48:39 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 11 Oct 2007 19:48:39 -0300 Subject: [Biopython-dev] Revised tutorial In-Reply-To: <470E8B3E.6080709@maubp.freeserve.co.uk> References: <470E8B3E.6080709@maubp.freeserve.co.uk> Message-ID: Hello, I can't resolve all the dependencies to install hevea so I can't generate the dvi from the tex file. Could you please send me by email the final PDF? Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Thu Oct 11 21:53:19 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 21:53:19 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> <470E3E7E.1000301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu> Peter wrote: > Michiel De Hoon wrote: > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > > MissingExternalDependencyError occurs in a test, a warning is printed but it > > is not counted as a failure. > > I might have defined the exception within the test framework rather than > Bio/__init__.py, but now that it's there we can start to use in things > like modules that wrap external tools. That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using this exception (outside of the testing framework). > I've updated Tests/requires_internet.py and Test/requires_wise.py to > match (I don't have wise on my machine which is why I noticed it still > threw an ImportError). Thanks! I missed those. > Is there anything I can do to help get things ready for the release of > Biopython 1.44? At some point, somebody will need to go through the documentation to check if everything documented there still works with the Biopython in CVS, and to remove sections in the documentation describing deprecated code. But it's probably better to wait until after we decide what to do with test_GenBankFormat. > If you do have time to give the patch on bug 2366 a check, I think it > would be worth including before the next release. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2366 No time to check it. But I'd be happy to rely on your judgement and include it. --Michiel. From mdehoon at c2b2.columbia.edu Thu Oct 11 21:53:19 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 21:53:19 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> <470E3E7E.1000301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu> Peter wrote: > Michiel De Hoon wrote: > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > > MissingExternalDependencyError occurs in a test, a warning is printed but it > > is not counted as a failure. > > I might have defined the exception within the test framework rather than > Bio/__init__.py, but now that it's there we can start to use in things > like modules that wrap external tools. That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using this exception (outside of the testing framework). > I've updated Tests/requires_internet.py and Test/requires_wise.py to > match (I don't have wise on my machine which is why I noticed it still > threw an ImportError). Thanks! I missed those. > Is there anything I can do to help get things ready for the release of > Biopython 1.44? At some point, somebody will need to go through the documentation to check if everything documented there still works with the Biopython in CVS, and to remove sections in the documentation describing deprecated code. But it's probably better to wait until after we decide what to do with test_GenBankFormat. > If you do have time to give the patch on bug 2366 a check, I think it > would be worth including before the next release. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2366 No time to check it. But I'd be happy to rely on your judgement and include it. --Michiel. From bugzilla-daemon at portal.open-bio.org Thu Oct 11 22:32:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Oct 2007 22:32:05 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710120232.l9C2W5e9022504@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2007-10-11 22:32 EST ------- > test_GenBankFormat - this entire test is for the little-used Martel GenBank > expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0 If it's little-used, should we include it for the next release or can it be removed? If we remove the test, should we then also remove the corresponding module? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 16:37:52 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Oct 2007 21:37:52 +0100 Subject: [Biopython-dev] Output of Biopython tests In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> Message-ID: <470E89A0.1010502@maubp.freeserve.co.uk> Michiel De Hoon wrote: >> Perhaps we should introduce a MissingExternalDependency error instead, >> used for this specific case, and catch that in run_tests.py, while >> treating ImportError as a real error. > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > MissingExternalDependencyError occurs in a test, a warning is printed but it > is not counted as a failure. I might have defined the exception within the test framework rather than Bio/__init__.py, but not that its there we can start to use in things like modules that wrap external tools. I've updated Tests/requires_internet.py and Test/requires_wise.py to match (I don't have wise on my machine which is why I noticed it still threw an ImportError). This means run_tests.py now runs without errors using CVS on my 64 bit Linux machine (bar the mxTextTools 3.0 issue with test_GenBankFormat.py (bug 2361). Is there anything I can do to help get things ready for the release of Biopython 1.44? If you do have time to give the patch on bug 2366 a check, I think it would be worth including before the next release. http://bugzilla.open-bio.org/show_bug.cgi?id=2366 Peter From fennan at gmail.com Mon Oct 15 05:48:45 2007 From: fennan at gmail.com (Fernando) Date: Mon, 15 Oct 2007 11:48:45 +0200 Subject: [Biopython-dev] Database into variables Message-ID: <7b13e61d0710150248v72a550d6h38e1467edf5073eb@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From fennan at gmail.com Mon Oct 15 06:28:56 2007 From: fennan at gmail.com (Fernando) Date: Mon, 15 Oct 2007 12:28:56 +0200 Subject: [Biopython-dev] Precompute database information Message-ID: <7b13e61d0710150328l354bfb5eu1b76ed05024a65c4@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From bugzilla-daemon at portal.open-bio.org Mon Oct 15 07:11:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Oct 2007 07:11:26 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710151111.l9FBBQOE012625@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiagoantao at gmail.com ------- Comment #3 from tiagoantao at gmail.com 2007-10-15 07:11 EST ------- I had a look at the test code and tried to find which test case is changing the ambiguous_dna dict. I used this little script (putting it here as it might be useful for detecting these types of problems): for i in test_*py; do python run_tests.py $i; done It turns out that it is text_Nexus.py. A further inspection to the code seems to reveal that is not the test case that pollutes the dictionary but the Nexus modules itself. Maybe it makes sense to raise a bug on the Nexus module... Any comments on these findings? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 15 10:16:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Oct 2007 10:16:00 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710151416.l9FEG01A023797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-15 10:16 EST ------- Thanks for that Tiago, I guess we should file a bug on Bio.Nexus on the alphabet issue; It may be that it should create a copy or subclass of the ambiguous DNA alphabet in order to include "?" (I imagine that Nexus uses this rather than "N"), and see if it is using the Gapped() alphabet system or not. Did you have any comments on this patch for (reverse) complements? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Mon Oct 15 20:08:13 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Mon, 15 Oct 2007 19:08:13 -0500 Subject: [Biopython-dev] Biopython status Message-ID: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Hi all, I've just started using Biopython and I am wondering about the status of the group, since I've heard rumors that its dying. So far I have found the library very useful, if not at times frustrating, though I will admit I am fairly new to developing python as well. I have been hesitant to make changes to existing code, however I have found that in a few cases it has been by far the best way to accomplish what I need, and have only done so in cases where it seems to be the *right* thing to do. With that in mind, I have a few questions I was hoping you all could answer. First, how might I put these changes up for review in order to contribute back to the code base? The main changes have been to the AlignAce parser, since as it was it just ignored information contained in the alignace file regarding the motif instances (namely which input sequence they came from, where they started in the sequence, and what strand they were on). I have also needed to create a modified FASTA parser so that I can read things like quality score files. I would be happy to submit the changes to the group or an individual for inspection, but I would like to avoid having to maintain my own separate version of Biopython if possible. I am also wondering how it would be received if I did something like add a to_fasta method to SeqRecord instead of having to go through writing it to a file using a SeqIO when all I want is the string. Finally, are there plans to move to a subversion repository at any point? Thanks! Jared Flatow From sbassi at gmail.com Tue Oct 16 01:09:16 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 16 Oct 2007 02:09:16 -0300 Subject: [Biopython-dev] Biopython status In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: On 10/15/07, Jared Flatow wrote: > I've just started using Biopython and I am wondering about the status > of the group, since I've heard rumors that its dying. So far I have You could subscribe to the rss feed of the CVS and you will see a lot of activity. The developers list and the bug tracking program (bugzilla) is also pretty busy, that doesn't look as a dying group to me :) -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Tue Oct 16 01:37:14 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 01:37:14 -0400 Subject: [Biopython-dev] Biopython status References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> Hi Jared, > I've just started using Biopython and I am wondering about the status > of the group, since I've heard rumors that its dying. >From looking at the activity on the Biopython mailing lists in recent months, it doesn't seem to be dying :-). > So far I have found the library very useful, if not at times frustrating, > though I will admit I am fairly new to developing python as well. One thing to keep in mind is that Biopython started about eight years ago, and some approaches that seemed to be a good idea at that time may not seem to be so now. Nevertheless, I feel that Biopython is moving in the right direction in terms of ease-of-use. > First, how might I put these changes up for review in order > to contribute back to the code base? The main changes have been to > the AlignAce parser, since as it was it just ignored information > contained in the alignace file regarding the motif instances (namely > which input sequence they came from, where they started in the > sequence, and what strand they were on). In this case, it is a good idea to contact the current maintainer of Bio.AlignAce, either via the mailing list or directly. From the Biopython CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce, so it would be a good idea to discuss with him. > I have also needed to create a modified FASTA parser so that I > can read things like quality score files. At some point, Biopython had several (two or three?) Fasta parsers, two Fasta formats, etc. This is a situation we should definitely avoid. So if your modifications fit in well with the existing Fasta parser in Bio.SeqIO, it may very well be accepted into Biopython. Otherwise, it's better to leave it out. This is just my opinion though. > I am also wondering how it would be received if I did something like > add a to_fasta method to SeqRecord instead of having to go through > writing it to a file using a SeqIO when all I want is the string. This sounds like feature creep to me, so I would be against it. It's easy to add code to Biopython, it's much harder to remove stuff. Code bloat is a real problem in Biopython. > Finally, are there plans to move to a subversion repository at any > point? There were some plans at some point, but I don't know the current status. Best, --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Jared Flatow Sent: Mon 10/15/2007 8:08 PM To: biopython-dev at lists.open-bio.org Subject: [Biopython-dev] Biopython status Hi all, I've just started using Biopython and I am wondering about the status of the group, since I've heard rumors that its dying. So far I have found the library very useful, if not at times frustrating, though I will admit I am fairly new to developing python as well. I have been hesitant to make changes to existing code, however I have found that in a few cases it has been by far the best way to accomplish what I need, and have only done so in cases where it seems to be the *right* thing to do. With that in mind, I have a few questions I was hoping you all could answer. First, how might I put these changes up for review in order to contribute back to the code base? The main changes have been to the AlignAce parser, since as it was it just ignored information contained in the alignace file regarding the motif instances (namely which input sequence they came from, where they started in the sequence, and what strand they were on). I have also needed to create a modified FASTA parser so that I can read things like quality score files. I would be happy to submit the changes to the group or an individual for inspection, but I would like to avoid having to maintain my own separate version of Biopython if possible. I am also wondering how it would be received if I did something like add a to_fasta method to SeqRecord instead of having to go through writing it to a file using a SeqIO when all I want is the string. Finally, are there plans to move to a subversion repository at any point? Thanks! Jared Flatow _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 04:16:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 09:16:01 +0100 Subject: [Biopython-dev] Biopython status In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: <47147341.4020708@maubp.freeserve.co.uk> Jared Flatow wrote: > I have also needed to create a modified FASTA parser so that I can > read things like quality score files. Could you be a little more specific - what exactly do you mean by a quality score files (links and/or examples). It may be that this warrants setting up a new file format in Bio.SeqIO > I would be happy to submit the changes to the group or an individual > for inspection, but I would like to avoid having to maintain my own > separate version of Biopython if possible. As has already been said - please file some (enhancement) bugs and attach your patches, or raise specific issues for discussion on this mailing list. Depending on the nature of your changes, you might be able to achieve some of them by subclassing Biopython's objects - rather than literally maintaining your own branch of the project. > I am also wondering how it would be received if I did something like > add a to_fasta method to SeqRecord instead of having to go through > writing it to a file using a SeqIO when all I want is the string. Out of interest, why do you want to create a FASTA record as a string? Did you know you can write to a string using any Bio.SeqIO supported file format using StringIO? Perhaps we should spell this out more explicitly in the documentation, but a motivating example would help. I would suggest rather than adding a to_fasta method to the SeqRecord, simply write your own "seqrecord_to_string" function (or create a subclass of SeqRecord with this method). > Finally, are there plans to move to a subversion repository at any > point? It was raised a while ago, and our cunning plan was to let BioPerl try the move first. Once that has been proven, it should be fairly easy for the OBF guys to also move us over. I should email them to see how things stand... Peter From bartek at rezolwenta.eu.org Tue Oct 16 05:11:01 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 16 Oct 2007 11:11:01 +0200 Subject: [Biopython-dev] Biopython status In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> Message-ID: <1192525861.4714802535dae@imp.rezolwenta.eu.org> Michiel De Hoon wrote: > > First, how might I put these changes up for review in order > > to contribute back to the code base? The main changes have been to > > the AlignAce parser, since as it was it just ignored information > > contained in the alignace file regarding the motif instances (namely > > which input sequence they came from, where they started in the > > sequence, and what strand they were on). > > In this case, it is a good idea to contact the current maintainer of > Bio.AlignAce, either via the mailing list or directly. From the Biopython > CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce, > so it would be a good idea to discuss with him. I'm not dying either ;). I'm the author of the Bio.AlignAce module and if you have any new code to contribute to it, I'll be glad to help you. The best way to do it would be to submit an enhancement bug report in bugzilla. If the changes are smaller, you can just send them (as a diff) to the list and I'll try to fit them to the current cvs version of Bio.AlignAce Bartek Wilczynski From bugzilla-daemon at portal.open-bio.org Tue Oct 16 05:55:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 05:55:37 -0400 Subject: [Biopython-dev] [Bug 2380] New: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2380 Summary: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This issue was raised in Bug 2366 where a unit test was found to be "polluting" ambiguous_dna_values, later identified as Bio.Nexus via test_Nexus.py Need to see if Bio.Nexus should be making a copy of this dict, or perhaps defining a subclass of the alphabet (using the Gapped() class maybe). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 05:56:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 05:56:37 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710160956.l9G9ub18007735@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 05:56 EST ------- Fix committed (after Michiel's OK on the mailing list), marking as fixed. Checking in Tests/test_seq.py; /home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py new revision: 1.6; previous revision: 1.5 done Checking in Tests/output/test_seq; /home/repository/biopython/biopython/Tests/output/test_seq,v <-- test_seq new revision: 1.6; previous revision: 1.5 done Checking in Bio/Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.17; previous revision: 1.16 done I've filed Bug 2380 for the Nexus issue: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:11:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:11:09 -0400 Subject: [Biopython-dev] [Bug 2381] New: translate and transcibe method for the the Seq object (in Bio.Seq) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2381 Summary: translate and transcibe method for the the Seq object (in Bio.Seq) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Biopython has translation and transcription modules (Bio/Translate.py and Bio/Transcibe.py) but I find them a little bit complicated to use. There are module level functions translate, transcribe, and back_transcribe in Bio/Seq.py which take either a string, a Seq object or a MutableSeq object. I would like to add similar methods to the Seq object (also defined Bio/Seq.py) to make this functionality more accessable from a Seq object. NOTE: Python strings have a translate method of their own which is rather different. Having the Seq translate method doing a biological translation makes sense. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:13:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:13:35 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161013.l9GADZtJ008751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|translate and transcibe |translate and transcibe |method for the the Seq |methods for the Seq object |object (in Bio.Seq) |(in Bio.Seq) ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:13 EST ------- fixed typo in the bug summary -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:26:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:26:44 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161026.l9GAQixw009268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #2 from dalloliogm at gmail.com 2007-10-16 06:26 EST ------- I find difficult to translate a sequence in the 6 reading frames with a single command. Actually I use something like this: for i in xrange(2): translate(Seq[i:]) which is not very nice. It would be nice to add a parameter to the translate function like in the emboss application transeq (http://emboss.sourceforge.net/apps/cvs/emboss/apps/transeq.html), something like this: >>> a = Seq('CAGCTAGCT') >>> a.translate() [(translation of a in the frame 0)] >>> a.translate(1) [(translation of a in the frame 1)] >>> a.translate(F) [(translation of a in the 3 forward frames)] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:46:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:46:47 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161046.l9GAklI6010391@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:46 EST ------- Doing a three/six frame translation is however fairly common, and perhaps warrents an "official" implementation in Bio.SeqUtils My current inclination is try and keep the Bio.Seq translation function as simple as possible. There are lots of possible options to worry about... catering to them all could make the translate method rather daunting. Perhaps things like the frame (or even the starting nucleotide) could be done in Bio.Translate only. Another "special case" example I personally would like is an option to check the first codon is a valid start codon for the specified codon table, and to translate it as methionine (M). Then there is the question of if Bio.Translate's "translate_to_stop" functionality should be exposed in a Seq method. Note there is yet another (!) translation function Bio.SeqUtils.translate() which is frame aware [personally I would mark a lot of this module as deprecated]. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Tue Oct 16 12:02:19 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 11:02:19 -0500 Subject: [Biopython-dev] Biopython status In-Reply-To: <47147341.4020708@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> Message-ID: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Please forgive me for ever doubting your health, it seems the group is very much alive! On Oct 16, 2007, at 3:16 AM, Peter wrote: > Jared Flatow wrote: >> I have also needed to create a modified FASTA parser so that I can >> read things like quality score files. > > Could you be a little more specific - what exactly do you mean by a > quality score files (links and/or examples). It may be that this > warrants setting up a new file format in Bio.SeqIO That is what I did. The quality score files I meant are simply FASTA- like records that indicate the quality of each base pair read from a sequencing machine, on a scale of something like 1 to 64. The values are tab separated and correspond to 'reads' in another FASTA file that contain the actual sequences read. This is the way the 454 GSFlex machines output their sequencing reads, so for every set of reads there will be a pair of 454Reads.fna, 454Reads.qual files. The only difference between a parser that processes these qual files and one that processes the sequence files is that it shouldn't get rid of spaces, and the newlines should not to be stripped but converted into spaces (when 454 writes a newline of scores they omit the space). Essentially I have made a duplicate of FastaIOs iterator, named it something else, made these two small changes and put an entry for it in the SeqIO file. 16,17c16,17 < def GSQualIterator(handle, alphabet = single_letter_alphabet, title2ids = None) : < """Generator function to iterate over GSFlex quality records (as SeqRecord objects). --- > def FastaIterator(handle, alphabet = single_letter_alphabet, title2ids = None) : > """Generator function to iterate over Fasta records (as SeqRecord objects). 54c54 < lines.append(line.rstrip()) # .replace(" ","")) leave off the replacing internal spaces so we can process qscore files (jf) --- > lines.append(line.rstrip().replace(" ","")) 58c58 < yield SeqRecord(Seq(" ".join(lines), alphabet), --- > yield SeqRecord(Seq("".join(lines), alphabet), 63a64,199 As you can see a parser like this might be useful for other FASTA- like formats as well and is in no way specific to the GS quality files (its just a space preserving parser). If it were to be implemented in Biopython you might call it something else. > >> I would be happy to submit the changes to the group or an individual >> for inspection, but I would like to avoid having to maintain my own >> separate version of Biopython if possible. > > As has already been said - please file some (enhancement) bugs and > attach your patches, or raise specific issues for discussion on this > mailing list. > > Depending on the nature of your changes, you might be able to achieve > some of them by subclassing Biopython's objects - rather than > literally > maintaining your own branch of the project. > >> I am also wondering how it would be received if I did something like >> add a to_fasta method to SeqRecord instead of having to go >> through writing it to a file using a SeqIO when all I want is the >> string. > > Out of interest, why do you want to create a FASTA record as a string? I am serving the fasta from a database of sequences dynamically via a web server. > > Did you know you can write to a string using any Bio.SeqIO supported > file format using StringIO? Perhaps we should spell this out more > explicitly in the documentation, but a motivating example would help. This is what I do now, but it seems like a hack to me to go this route. To always have to write to a file feels strange, but I see that it would be messy to go OO since there are so many formats. However, giving preference to fasta over other formats by making it innate doesn't seem like such a terrible idea. I do have mixed feelings about 'bloating' the code which is why I asked, and you have convinced me that this is not quite appropriate given existing convention. However the idea would be to put the to_fasta or to_format method inside the SeqRecord, then to call it from the IO when needed to actually write to a file, but call it directly when all that is wanted is a string... > > I would suggest rather than adding a to_fasta method to the > SeqRecord, simply write your own "seqrecord_to_string" function (or > create a subclass of SeqRecord with this method). > I'll leave it alone for now until I can come up with a real proposal =) >> Finally, are there plans to move to a subversion repository at any >> point? > > It was raised a while ago, and our cunning plan was to let BioPerl try > the move first. Once that has been proven, it should be fairly > easy for > the OBF guys to also move us over. I should email them to see how > things stand... BioPerl seems to be the guinea pigs for everything. Leading the way on this might put a stop to those nasty rumors about Biopython. Best Regards, Jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:47:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:47:48 +0100 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EB34.8000207@maubp.freeserve.co.uk> Jared wrote: > Leading the way on this ... [CVS to SVN] I would say one reason why we aren't charging ahead with a move from CVS to subversion is only a few posters on this mailing list actively WANT to move to subversion, and no-one has really championed the move (yet). I'm sure if we as a group wanted to this, then the OBF would be happy to assist. After all, moving us rather than BioPerl as the first CVS/SVN migration should be easier as we have a smaller code base. Peter From jflatow at northwestern.edu Tue Oct 16 14:46:53 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 13:46:53 -0500 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <4714EBC7.1040504@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> Message-ID: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> Hi Peter, >>>> I have also needed to create a modified FASTA parser so that I >>>> can read things like quality score files. >>> >>> Could you be a little more specific - what exactly do you mean by a >>> quality score files (links and/or examples). It may be that this >>> warrants setting up a new file format in Bio.SeqIO >> That is what I did. The quality score files I meant are simply >> FASTA- like records that indicate the quality of each base pair >> read from a sequencing machine, on a scale of something like 1 to >> 64. The values are tab separated and correspond to 'reads' in >> another FASTA file that contain the actual sequences read. This >> is the way the 454 GSFlex machines output their sequencing reads, >> so for every set of reads there will be a pair of 454Reads.fna, >> 454Reads.qual files. The only difference between a parser that >> processes these qual files and one that processes the sequence >> files is that it shouldn't get rid of spaces, and the newlines >> should not to be stripped but converted into spaces (when 454 >> writes a newline of scores they omit the space). Essentially I >> have made a duplicate of FastaIOs iterator, named it something >> else, made these two small changes and put an entry for it in the >> SeqIO file. > > Patches and emails don't do well together. Could you file an > enhancement bug, and then upload your code as an attachment? If > you have a few examples of matched pairs of FASTA files and quality > files which you can contribute that would be very helpful too. > Yes I'll get on that. > It looks like you are trying to construct a "sequence" of numerical > values (rather than a sequence of letters like nucleotides/amino > acids). As written I don't think it would work for element access/ > splicing etc. However, with some extra work I suppose we could > stretch the Seq object in this way - and define a new > "IntegerAlphabet". > > But on balance, I don't think "lists of quality values" should be > treated in the same way as sequences (and thus it doesn't seem to > belong in Bio.SeqIO). > I agree. > Alternatively you could regard the quality scores as sequence meta- > data or annotation. One idea would be to generate SeqRecord > objects containing dummy sequences of the correct length made up of > the ambiguous character "N", with the associated quality scores > held as a list of integers in the SeqRecord's annotation > dictionary. Then it would fit into the Bio.SeqIO framework [I was > thinking of something similar for PTT files, NCBI Protein tables, > where again we have annotation but not the actual sequence]. I agree, and this way is most flexible. > > Maybe there should just be a separate parser for GSFlex quality > records which returns iterator giving each record name with a list > of integers. A more elegant scheme would read in the pair of files > together (the FASTA file and the quality file) and generate nicely > annotated SeqRecords with the sequence and the quality. This isn't > really possible with the Bio.SeqIO framework. > Yes, at first I liked this idea best, but it puts some constraints on the way these things are read in. Like if it is to be an iterator, you must have a guarantee that these files contain exactly the same sequences in exactly the same order. This seems like it could potentially be fine for the GSFlex files, but I wonder if there might somewhere down the line be use for quality information about sequences in other cases. If I am not mistaken, some sources use upper/lower case letters now to indicate a bistable degree of confidence in a sequence letter. In any event, this seems like an unnecessary restriction. The way I do it now is I load the reads into a database, then update the database when I read in a quality score file. I think Biopython should have a simple way of implementing something similar which can solve both our metadata problems. In Bio.Fasta there are Parsers which really belong in Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more general Fasta reader, nothing to do with sequences. It can iterate over a FASTA file using the '>' as the record separator, creating Record objects, much like it does now, except without processing them at all or assuming they are sequences. >Record.header Record.data Now Bio.SeqIO.FastaIO can use Bio.Fasta to iterate over the Record objects in a file and transform them into SeqRecord object. If you like, you can provide it with a function header_todict, which takes a string (in this case Record.header) and returns a dictionary, which gets unpacked and passed to the SeqRecord initializer. Basically the Bio.SeqIO.FastaIO returns a generator that looks something like this: (SeqRecord(seq=cleanup(record.data), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) I can also use the Bio.Fasta.parse function now to parse my quality files and add them as metadata: # I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, my_header_todict)) # Then I iterate over the sequences in the qual file and look them up in the seq_dict using the same header parsing function # I passed to create my initial SeqRecords, setting the quality scores as I find them them for record in Bio.Fasta.parse(qual_file): seq_dict[my_header_todict(record.header)['id']].quality = my_qualitycleanup(record.data) I hope that makes sense. The advantage to doing it this way is that I can reuse my header parsing function for both the sequence and the metadata, and I can do whatever I want with the fasta record data without writing a whole new parser. The SeqIO fasta parsing functions just makes some default assumptions (like the data is a sequence). Let me know what you think. Jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:50:15 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:50:15 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EBC7.1040504@maubp.freeserve.co.uk> Hi Jared, >>> I have also needed to create a modified FASTA parser so that I can >>> read things like quality score files. >> >> Could you be a little more specific - what exactly do you mean by a >> quality score files (links and/or examples). It may be that this >> warrants setting up a new file format in Bio.SeqIO > > That is what I did. The quality score files I meant are simply FASTA- > like records that indicate the quality of each base pair read from a > sequencing machine, on a scale of something like 1 to 64. The values > are tab separated and correspond to 'reads' in another FASTA file > that contain the actual sequences read. This is the way the 454 > GSFlex machines output their sequencing reads, so for every set of > reads there will be a pair of 454Reads.fna, 454Reads.qual files. The > only difference between a parser that processes these qual files and > one that processes the sequence files is that it shouldn't get rid of > spaces, and the newlines should not to be stripped but converted into > spaces (when 454 writes a newline of scores they omit the space). > Essentially I have made a duplicate of FastaIOs iterator, named it > something else, made these two small changes and put an entry for it > in the SeqIO file. Patches and emails don't do well together. Could you file an enhancement bug, and then upload your code as an attachment? If you have a few examples of matched pairs of FASTA files and quality files which you can contribute that would be very helpful too. It looks like you are trying to construct a "sequence" of numerical values (rather than a sequence of letters like nucleotides/amino acids). As written I don't think it would work for element access/splicing etc. However, with some extra work I suppose we could stretch the Seq object in this way - and define a new "IntegerAlphabet". But on balance, I don't think "lists of quality values" should be treated in the same way as sequences (and thus it doesn't seem to belong in Bio.SeqIO). Alternatively you could regard the quality scores as sequence meta-data or annotation. One idea would be to generate SeqRecord objects containing dummy sequences of the correct length made up of the ambiguous character "N", with the associated quality scores held as a list of integers in the SeqRecord's annotation dictionary. Then it would fit into the Bio.SeqIO framework [I was thinking of something similar for PTT files, NCBI Protein tables, where again we have annotation but not the actual sequence]. Maybe there should just be a separate parser for GSFlex quality records which returns iterator giving each record name with a list of integers. A more elegant scheme would read in the pair of files together (the FASTA file and the quality file) and generate nicely annotated SeqRecords with the sequence and the quality. This isn't really possible with the Bio.SeqIO framework. Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 15:33:54 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 20:33:54 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> Message-ID: <47151222.1060502@maubp.freeserve.co.uk> > In Bio.Fasta there are Parsers which really belong in > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more > general Fasta reader, nothing to do with sequences. ... In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was thinking in a few releases time of suggesting its deprecation (but not just yet as for several years it was the best documented and most used parser in Biopython). If we do decided keep Bio.Fasta (or extend it), then perhaps Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta I'm still digressing your ideas to turn Bio.Fasta into a generic parser that copes with sequences, qualities scores, or anything else. Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 16 15:57:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 15:57:35 -0400 Subject: [Biopython-dev] [Bug 2382] New: Generic FASTA parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2382 Summary: Generic FASTA parser Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: jflatow at northwestern.edu I would like to be able read in and iterate over records in generic fasta files of the format: >header data >header data ... This iterator should return Bio.Fasta.Record objects with the corresponding header and data fields. I suggest putting this inside the existing Bio.Fasta module and updating Bio.SeqIO.Fasta to use this iterator and transform the records returned into Bio.SeqRecord objects. This should make it easier to add metadata to SeqRecord objects parsed in from FASTA. Consider the following example for illustration. I have data from a genome sequencing machine that outputs pairs of files. One contains the sequence reads which look like this, the other contains estimates of the quality of each base call in the sequence. The sequence file might look something like this (only with hundreds of thousands more entries): >ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run=R_runname CAATATAATTTCTCTTAAAATTATTCCCATGGCCAGGTGTGGTGGCTCACACCTGTAGTC CCGGCACTTTGGGAGGCCAAGGCACACAGGGGATAGG >ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname GGTCTCCAGTGCCCTGTCTCCCCATATTTCTGACACACCTTCTCACAGCCTGGCCCATCT TGCTGGGTCCCTCTTCTCCTCCCTTCCTGCTCCATTTGTCAACACTGCTGGGACATTAGA ATTCAGATCTCCCGGGTCACCG >ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname AAAGTGACTAAAGAATCAATTTACATTAATATTCTATGTGAACAGGCAAAATACTTACAA AGAAGTAGAGAAAATATGAATTCAGTACAGAATTCAGATCTCCCGGGTCACCG The corresponding quality score file might look something like this: >ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run= R_runname 27 28 21 27 27 27 28 22 28 25 3 27 27 27 28 21 33 31 20 6 28 21 26 26 18 28 25 2 26 25 29 23 31 24 27 29 22 27 27 27 29 23 27 31 25 27 27 27 27 27 27 32 26 27 27 27 27 26 27 33 30 12 32 26 27 27 27 33 30 12 33 30 12 26 31 25 33 27 32 28 33 28 27 27 27 27 27 26 33 32 20 7 27 27 27 32 26 >ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname 28 9 26 24 27 27 20 26 18 25 27 32 29 10 26 26 27 18 25 32 30 17 1 25 27 22 32 30 12 27 27 22 26 25 27 23 25 28 21 32 27 27 27 25 26 27 26 25 27 20 26 26 19 28 25 3 25 27 22 27 19 24 24 24 32 29 11 24 34 31 17 23 23 30 23 27 25 30 23 27 33 31 17 27 20 28 21 27 25 26 26 30 24 27 33 31 13 26 27 27 31 25 27 25 23 26 16 26 27 30 27 7 27 27 27 32 27 26 26 32 27 30 26 27 27 27 27 27 27 27 30 27 6 34 31 17 27 21 27 32 28 18 >ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname 29 26 5 25 27 24 27 27 27 30 27 7 26 27 19 25 26 31 26 34 32 16 20 27 26 32 27 32 28 27 25 26 18 27 25 27 26 26 24 27 31 25 27 27 31 26 26 34 32 23 11 26 22 27 32 26 27 26 32 30 11 26 31 24 27 27 25 23 27 27 33 30 19 4 17 26 25 26 31 27 30 26 27 26 22 26 18 24 27 26 32 26 32 28 27 27 25 27 25 24 25 31 28 10 34 31 15 27 21 27 28 21 27 I would like to be able to do the following: # create a function to parse the header line and return a dictionary def parse_gsflex_header(gs_header): parts = gs_record.description.split(' ') assert len(parts) == 5 xy = parts[2].split('=')[1].split('_') return {'letters': gs_record.seq.tostring(), 'name': parts[0], 'length': parts[1].split('=')[1], 'xpos': xy[0], 'ypos': xy[1], 'region': parts[3].split('=')[1], 'run': parts[4].split('=')[1]} # Bio.SeqIO.FastaIO wraps the Bio.Fasta parser, might look something like this class Fasta(): # or however its organized def data_toseq(data): # do some parsing of the data return Seq(...) def parse(file, header_todict): return (SeqRecord(seq=data_toseq(record.data), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) # I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, parse_gsflex_header)) # Then I iterate over the sequences in the qual file and look them up in the seq_dict # setting the quality scores as I find them them for record in Bio.Fasta.parse(qual_file): seq_dict[my_header_todict(record.header)['id']].quality = my_qualitycleanup(record.data) This would work well for parsing all kinds of FASTA-like files and provides a simple mechanism for dealing with them record by record. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 16:03:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 16:03:33 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162003.l9GK3XmF007588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #1 from jflatow at northwestern.edu 2007-10-16 16:03 EST ------- My mistake, the parse_gsflex_header function should look something like this: def parse_gsflex_header(gs_header): parts = re.split('[,|]?\s+', header, maxsplit=1) assert len(parts) == 2 return {'id': parts[0], 'description': header} def my_qualitycleanup(data): return [int x for x in data.replace('\n', '').split(' ')] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Tue Oct 16 16:11:04 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 15:11:04 -0500 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <47151222.1060502@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk> Message-ID: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> On Oct 16, 2007, at 2:33 PM, Peter wrote: > > In Bio.Fasta there are Parsers which really belong in > > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more > > general Fasta reader, nothing to do with sequences. ... > > In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was > thinking in a few releases time of suggesting its deprecation (but > not just yet as for several years it was the best documented and > most used parser in Biopython). > I see, it looks like its meant to be deprecated, I was just saying its actually doing SeqIO functionality. > If we do decided keep Bio.Fasta (or extend it), then perhaps > Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta > > I'm still digressing your ideas to turn Bio.Fasta into a generic > parser that copes with sequences, qualities scores, or anything else. I'm not quite sure you're meaning of digressing, if you mean thinking it over, then great =) Otherwise I hope you'll seriously consider it anyway. Either way, I think I posted a more coherent message on bugzilla with some example data and motivation. jared From jflatow at northwestern.edu Tue Oct 16 16:14:16 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 15:14:16 -0500 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <4714EB34.8000207@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> Message-ID: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> > I would say one reason why we aren't charging ahead with a move > from CVS to subversion is only a few posters on this mailing list > actively WANT to move to subversion, and no-one has really > championed the move (yet). Does that mean most developers don't WANT to move, or just that they don't ACTIVELY want to move? jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:42:18 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 21:42:18 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk> <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> Message-ID: <4715222A.2070909@maubp.freeserve.co.uk> Jared Flatow wrote: > On Oct 16, 2007, at 2:33 PM, Peter wrote: > >>> In Bio.Fasta there are Parsers which really belong in >>> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more >>> general Fasta reader, nothing to do with sequences. ... >> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was >> thinking in a few releases time of suggesting its deprecation (but >> not just yet as for several years it was the best documented and >> most used parser in Biopython). > > I see, it looks like its meant to be deprecated, I was just saying > its actually doing SeqIO functionality. Well I'm currently just making a suggestion for the future, deprecating Bio.Fasta, we should still canvas opinion on the main mailing list before taking that action. >> If we do decided keep Bio.Fasta (or extend it), then perhaps >> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta >> >> I'm still digressing your ideas to turn Bio.Fasta into a generic >> parser that copes with sequences, qualities scores, or anything else. That was a typo, but you managed to guess my meaning. I meant to say: I'm still digesting [i.e. thinking about] your ideas to turn Bio.Fasta into a generic parser that copes with sequences, qualities scores, or anything else. > I'm not quite sure you're meaning of digressing, if you mean thinking > it over, then great =) Otherwise I hope you'll seriously consider it > anyway. Either way, I think I posted a more coherent message on > bugzilla with some example data and motivation. I'll take a look, Bug 2382 - Generic FASTA parser http://bugzilla.open-bio.org/show_bug.cgi?id=2382 Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 17:01:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 22:01:29 +0100 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> Message-ID: <471526A9.1010709@maubp.freeserve.co.uk> Jared Flatow wrote: >> I would say one reason why we aren't charging ahead with a move >> from CVS to subversion is only a few posters on this mailing list >> actively WANT to move to subversion, and no-one has really >> championed the move (yet). > > Does that mean most developers don't WANT to move, or just that they > don't ACTIVELY want to move? Going back over the archives, Chris Lasher was most vocal in supporting the move, and there were a few other positive voices. Speaking for myself, I have no strong desire either way, and I don't think Michiel objected either (except over the timing). Then as now, we are hoping to get the next release out "shortly", so after that would be a good time to make the switch. [I'm assuming we won't loose any revision history or comments, and that things like the web based ViewCVS and its RSS feed will still be available] Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:02:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:02:03 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162102.l9GL23rr010250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:02 EST ------- Are there any other "FASTA like" formats you know of, in addition to traditional sequence data and the 454 GSFlex quality score files? We could do this using the old Scanner/Consumer model (see the pre-Martel parse, CVS revision 1.8 of Bio/Fasta/__init__.py for example). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?rev=1.8&cvsroot=biopython&content-type=text/vnd.viewcvs-markup The scanner would be the same for all formats, and would pass the data with whitespace (spaces, new lines etc) as is. We could then have one consumer for each supported FASTA variant: _Scanner Scans a FASTA-format stream. _RecordConsumer Consumes FASTA data to a Record object. _SequenceConsumer Consumes FASTA data to a Sequence object. _QualityConsumer (new) could build a list of integers for each record? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:26:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:26:29 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162126.l9GLQT8O011239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #3 from jflatow at northwestern.edu 2007-10-16 17:26 EST ------- On second thought, let me just rewrite all the code: # The Bio.Fasta parser class Fasta(): # or whatever @staticmethod def parse(file): # return an iterator over the file as Bio.Fasta.Records # for the records, trim newline from header, don't do anything to data # The Bio.SeqIO.FastaIO wrapper for Bio.Fasta class FastaIO(): # or however its organized @staticmethod def header_todict(header): parts = re.split('[,|]?\s+', header, maxsplit=1) assert len(parts) == 2 return {'id': parts[0], 'description': header} @staticmethod def data_toseq(data, alphabet): return Seq(re.sub('\s+', '', data), alphabet) @staticmethod def parse(file, header_todict=Fasta.header_todict, alphabet=single_letter_alphabet): return (SeqRecord(seq=data_toseq(record.data, alphabet), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) # Now to use these in my example I can do seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file)) for record in Bio.Fasta.parse(qual_file): id = Bio.SeqIO.FastaIO.header_todict(record.header)['id'] seq_dict[id].quality = [int(x) for x in record.data.split()] # Suppose instead I have an alignment file, which looks like this: >contigname A A 10 64 T T 9 64 C C 9 64 ... # and on, where the first column is a reference sequence, the second column is a consensus # sequence, the third column is the number of reads aligned, the fourth column is the combined # quality score # Now its just as easy for me to parse this into an object class ContigAlign(): def __init__(self, name, ref, consensus, numreads, qscore): self.name = name self.ref = ref self.consensus = consensus self.numreads = numreads self.qscore = qscore # ill make a dictionary of my contigaligns d = {} for record in Bio.Fasta.parse(file): (ref, consensus, numreads, qscore) = zip(record.data.split('\n')) d[record.header] = ContigAlign(record.header, ref, consensus, numreads, qscore) # maybe i would turn ref and consensus into Seqs, but you get the point -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:38:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:38:45 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162138.l9GLcj29011655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:38 EST ------- In comment 3, did you just make up this file format as an example? >contigname A A 10 64 T T 9 64 C C 9 64 ... with four columns: reference sequence, consensus, number of reads aligned, and combined quality score. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:58:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:58:38 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162158.l9GLwc68012343@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #5 from jflatow at northwestern.edu 2007-10-16 17:58 EST ------- Nope, they actually have a file format that looks like this: Position Consensus Quality Score Depth Signal StdDeviation >contig00001 1 1 G 64 2 1.00 0.00 2 A 64 2 1.00 0.00 3 G 64 2 1.00 0.00 4 A 64 2 1.00 0.00 5 G 64 2 2.00 0.00 6 G 64 2 2.00 0.00 7 A 64 2 3.00 0.00 8 A 64 2 3.00 0.00 9 A 64 2 3.00 0.00 10 C 64 2 2.00 0.00 11 C 64 2 2.00 0.00 12 T 64 2 1.00 0.00 13 C 64 2 3.00 0.00 14 C 64 2 3.00 0.00 15 C 64 2 3.00 0.00 16 G 64 2 1.00 0.00 17 T 64 2 1.00 0.00 18 G 64 2 1.00 0.00 19 A 64 2 1.00 0.00 20 T 64 2 1.00 0.00 21 C 64 2 2.00 0.00 22 C 64 2 2.00 0.00 Note the file-wide header at the top of the page (a generic FASTA-like parser might skip to the first '>'), or we could get rid of that beforehand but it would be nice if it were smart. Also, here is another sample FASTA-like file format they use for pair alignments: >ERSGEES01EM5WC, 2..30 of 95 and ERSGEES01C1ZV2, 1..29 of 268 (29/29 ident) 2 CGGTGACCCGGGAGATCTGAATTCCTGGT 30 1 CGGTGACCCGGGAGATCTGAATTCCTGGT 29 >ERSGEES01EM5WC, 2..29 of 95 and ERSGEES01DMS5T, 1..28 of 259 (28/28 ident) 2 CGGTGACCCGGGAGATCTGAATTCCTGG 29 1 CGGTGACCCGGGAGATCTGAATTCCTGG 28 >ERSGEES01EM5WC, 29..2 of 95 and ERSGEES01D8GDV, 205..232 of 232 (28/28 ident) 29 CCAGGAATTCAGATCTCCCGGGTCACCG 2 205 CCAGGAATTCAGATCTCCCGGGTCACCG 232 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 18:09:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 18:09:06 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162209.l9GM96N5012764@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #6 from jflatow at northwestern.edu 2007-10-16 18:09 EST ------- The reference/consensus one was inspired by yet another format they have: there are 2 tools they provide, one for mapping to an existing sequence, the other for ab initio contig building. The mapping one has the extra reference column. As you can see it might be hard to keep up with all these similar formats as part of Biopython (these are only from one source). Certainly the common ones should have wrappers but we should also be able to easily get the stream of records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 18:13:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 18:13:48 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162213.l9GMDmBM012914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 18:13 EST ------- Could you attach a few of these real files? Including where they came from, i.e. the company whose software writes such output, and what the call each file format variant. If you can get a matched set (i.e. all associated with the same few sequences) then even better. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 19:09:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 19:09:00 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162309.l9GN90wg015092@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #8 from jflatow at northwestern.edu 2007-10-16 19:08 EST ------- The files are very large, I assure you they are just longer versions of what I have supplied here though. The company is Roche Diagnostics. The initial reads/quality files are the output of the 454 GSFlex genome sequencing machines. They have two pieces of software: gsMapper and gsAssembler which output the contigs. Reads/Quality files from the machine are called: 454Reads.{fna,qual} gs* output: 454{All,Large}Contigs.{fna,qual} 454PairAlign.txt 454AlignmentInfo.tsv -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 20:10:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 20:10:45 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710170010.l9H0AjYe018147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 20:10 EST ------- > Note there is yet another (!) translation function Bio.SeqUtils.translate() > which is frame aware [personally I would mark a lot of this module as > deprecated]. Given the various translate functions we already have in Biopython, why do you want to add another one? Is there something the "translate" method can do that the "translate" function cannot? Since the "translate" function can take Seq objects as well as simple strings, I'd prefer the "translate" function over a "translate" method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:49:18 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:49:18 +0100 Subject: [Biopython-dev] SeqRecord to file format as string In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EB8E.3000700@maubp.freeserve.co.uk> >> Did you know you can write to a string using any Bio.SeqIO supported >> file format using StringIO? Perhaps we should spell this out more >> explicitly in the documentation, but a motivating example would help. > > This is what I do now, but it seems like a hack to me to go this > route. To always have to write to a file feels strange, but I see > that it would be messy to go OO since there are so many formats. > However, giving preference to fasta over other formats by making it > innate doesn't seem like such a terrible idea. I do have mixed > feelings about 'bloating' the code which is why I asked, and you have > convinced me that this is not quite appropriate given existing > convention. However the idea would be to put the to_fasta or > to_format method inside the SeqRecord, then to call it from the IO > when needed to actually write to a file, but call it directly when > all that is wanted is a string... Its debatable isn't it? I suspect that for most users, when they want a record in a particular file format its for writing to a file. However, adding a to_format() method to a SeqRecord some sense (suitable for sequential file formats only). This would take a format name and return a string, by calling Bio.SeqIO with a StringIO object internally. Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:17:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 22:17:28 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710170217.l9H2HSAx024040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 22:17 EST ------- If all these special fasta files are coming from Roche Diagnostics, I'd suggest to create a module rather than trying to put this in Bio.SeqIO. Bio.SeqIO is one of the few modules in Biopython that is used by most users, so I'd like to keep it clean as much as possible. To avoid confusion for users who just want to parse regular Fasta files, I think the module should not be called Bio.Fasta. In addition, I doubt we'd get much code reuse from a generic Bio.Fasta module beyond what is needed for the Roche files, since the only thing they have in common is that they use ">" to separate records. With a separate module to handle the Roche files, my preferred usage would be something like this: from Bio import SeqIO, GSFlex # Or whatever you'd like to call it seqrecords = SeqIO.parse(open("mysequences.fa"), "fasta") qualities = GSFlex.parse(open("myqualities.qual"), "quality") for seqrecord, quality in zip(seqrecords, qualities): seqrecord.quality = quality Note that "quality" is currently not a field of the SeqRecord class, but with SeqRecord being a Python class, we can just add fields on the fly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 16 22:30:41 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 22:30:41 -0400 Subject: [Biopython-dev] CVS to SVN References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> <471526A9.1010709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63B@mail2.exch.c2b2.columbia.edu> > > Does that mean most developers don't WANT to move, or just that they > > don't ACTIVELY want to move? > > Speaking for myself, I have no strong desire either way, and I don't > think Michiel objected either (except over the timing). I don't have an objection against SVN either now or later. If you wants to do the CVS->SVN conversion, just make sure to inform the developers when they should pause making commits to CVS during the actual move. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/16/2007 5:01 PM To: Jared Flatow; biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] CVS to SVN Jared Flatow wrote: >> I would say one reason why we aren't charging ahead with a move >> from CVS to subversion is only a few posters on this mailing list >> actively WANT to move to subversion, and no-one has really >> championed the move (yet). > > Does that mean most developers don't WANT to move, or just that they > don't ACTIVELY want to move? Going back over the archives, Chris Lasher was most vocal in supporting the move, and there were a few other positive voices. Speaking for myself, I have no strong desire either way, and I don't think Michiel objected either (except over the timing). Then as now, we are hoping to get the next release out "shortly", so after that would be a good time to make the switch. [I'm assuming we won't loose any revision history or comments, and that things like the web based ViewCVS and its RSS feed will still be available] Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From mdehoon at c2b2.columbia.edu Tue Oct 16 22:45:34 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 22:45:34 -0400 Subject: [Biopython-dev] SeqRecord to file format as string References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB8E.3000700@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu> How about the following: SeqIO.write(sequences, handle, format) returns the properly formatted string if handle==None. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/16/2007 12:49 PM To: Jared Flatow Cc: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] SeqRecord to file format as string >> Did you know you can write to a string using any Bio.SeqIO supported >> file format using StringIO? Perhaps we should spell this out more >> explicitly in the documentation, but a motivating example would help. > > This is what I do now, but it seems like a hack to me to go this > route. To always have to write to a file feels strange, but I see > that it would be messy to go OO since there are so many formats. > However, giving preference to fasta over other formats by making it > innate doesn't seem like such a terrible idea. I do have mixed > feelings about 'bloating' the code which is why I asked, and you have > convinced me that this is not quite appropriate given existing > convention. However the idea would be to put the to_fasta or > to_format method inside the SeqRecord, then to call it from the IO > when needed to actually write to a file, but call it directly when > all that is wanted is a string... Its debatable isn't it? I suspect that for most users, when they want a record in a particular file format its for writing to a file. However, adding a to_format() method to a SeqRecord some sense (suitable for sequential file formats only). This would take a format name and return a string, by calling Bio.SeqIO with a StringIO object internally. Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Wed Oct 17 03:01:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 03:01:53 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710170701.l9H71rML002584@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 03:01 EST ------- The Biopython test currently fails: ====================================================================== FAIL: test_seq ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 151, in runTest self.runSafeTest() File "run_tests.py", line 188, in runSafeTest expected_handle) File "run_tests.py", line 289, in compare_output "\nOutput : "+`output_line` + "\nExpected: "+`expected_line` AssertionError: Output : "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XRBWAYSNKMDCHVGU', Alphabet())\n" Expected: "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XYVWARSNMKHCDBGU', Alphabet())\n" ---------------------------------------------------------------------- This is with a fresh checkout from CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:01:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:01:12 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710170801.l9H81CVn005428@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:01 EST ------- > Given the various translate functions we already have in > Biopython, why do you want to add another one? Is there > something the "translate" method can do that the "translate" > function cannot? Since the "translate" function can take Seq > objects as well as simple strings, I'd prefer the "translate" > function over a "translate" method. Its a style thing, allowing more of an object orientated coding style. For comparison, look at the evolution of the string module in python which was phased out in favour of string object methods. In terms of capabilities/arguments, I think the Bio.Seq.translate() function and the suggested new Bio.Seq.Seq.translate() object method should be equivalent. In fact, I would have one call the other internally. Currently, if you work with strings, you can use the following nice concise style: from Bio import Seq #The module my_str = "ACTGACCGTGC" print Seq.translate(my_str) However, if you want to use Seq objects, this becomes rather a mess (in my opinion): from Bio import Seq #The module from Bio.Alphabet.IUPAC import unambiguous_dna my_seq = Seq.Seq("ACTGACCGTGC", unambiguous_dna) print Seq.translate(my_seq) I would like to be able to do this, using an object method: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import unambiguous_dna my_seq = Seq("ACTGACCGTGC", unambiguous_dna) print my_seq.translate() Another bonus for people who think OO, is doing dir(my_seq) would list these useful methods. Right now the user has to know to go looking in the Bio.Seq module for a function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:06:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:06:51 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710170806.l9H86ppn006217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Generic FASTA parser |Generic Roche or GSFlex | |"FASTA" parser ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:06 EST ------- Now that I'm clear where these files come from, I would agree with Michiel that a separate Bio.GSFlex or Bio.Roche module would make more sense. I've added these keywords to the bug summary (to help anyone searching in future). P.S. From Michiel's example, you could use the existing SeqRecord annotations dictionary if you wanted to avoid adding a new attribute to the objects on the fly. for seqrecord, quality in zip(seqrecords, qualities): #seqrecord.quality = quality #If you would rather use the existing dictionary: seqrecord.annotations["quality"] = quality -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:46:41 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:46:41 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710170846.l9H8kfYq008185@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:46 EST ------- Fixed, I think. I had some RNA/DNA with the U and T the wrong way round... and when I recently tweaked the alphabet detection in Bio/Seq.py this had an impact. The root issue is that we don't check the Alphabet's letters agree with the sequence when creating a Seq object (where the Alphabet supplied has an explicit list of letters). That would have caught the error in the test suite much earlier. Maybe I should file an enhancement bug on this issue. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 11:20:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 11:20:51 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710171520.l9HFKpXj030514@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #11 from jflatow at northwestern.edu 2007-10-17 11:20 EST ------- That sounds very reasonable to me to put all this stuff in a separate module. I would like to implement the design we have been discussing, and I will name it whatever you think is appropriate, but I would like to do it the way that seems *right* to me. I mean by that building off of streams of >header data ... since I think this pattern could eventually be used to actually clean up the rest of the FASTA stuff, not make it worse. I also believe there could potentially be other instances when a more general FASTA parser would be useful, even if we don't know of them yet. How does this sound? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 20:46:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 20:46:19 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710180046.l9I0kJfN027373@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 20:46 EST ------- > I also believe there could > potentially be other instances when a more general FASTA parser would be > useful, even if we don't know of them yet. How does this sound? To me, it sounds premature to create a general Fasta parser if we don't know other instances where it might be useful. (For comparison, note that Biopython's general parser framework described in section 5.3 of the tutorial is hardly used in recent Biopython parsers). By all means, keep the module short and simple. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 00:33:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 00:33:34 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710180433.l9I4XYeY004357@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 00:33 EST ------- If we add translate, transcribe methods to Seq objects, can we then deprecate Bio.Transcribe, Bio.Translate? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 01:21:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 01:21:15 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710180521.l9I5LFVS006209@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 01:21 EST ------- Looking at the test_GenBankFormat failure again. This is the only test that fails with the Biopython currently in CVS, using mxTextTools 3.0. This test is the only test for Bio.expressions. If we remove test_GenBankFormat, we should deprecate Bio.expressions. Of all the Biopython tests, only test_format_registry depends on Bio.expressions. This test relies on the function _load_registries() in Bio/__init__.py. Skipping this function call in Bio/__init__.py affects no other Biopython test. So I'd like to suggest the following for the upcoming release: -) Remove test_GenBankFormat.py and test_format_registry.py -) Add DeprecationWarnings to Bio.expressions. -) Skip the call to _load_registries() in Bio/__init__.py We then have a working Biopython again with the recent mxTextTools, with minimal disruptive changes to the code. Any objections? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 06:35:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 06:35:01 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710181035.l9IAZ1DH022693@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-18 06:35 EST ------- Michiel in comment 6 wrote: > If we add translate, transcribe methods to Seq objects, can we then > deprecate Bio.Transcribe, Bio.Translate? Bio.Transcribe - Yes ==================== Bio.Transcribe is so trivial we could recreate that code in Bio.Seq and then deprecate Bio.Transcribe without losing any functionality: - transcibe - back_transcibe Bio.Translate - Maybe ===================== Initially I was just proposing to add: - translate (including all stop codons) I was simply going to have Bio.Seq call Bio.Translate to do the work. It would be nice to simplify Biopython by also deprecating Bio.Translate, but if we want to do this without loss of current functionality we need to consider including the following in Bio.Seq: - translate_to_stop (all amino acids up to but excluding the first stop) - back_translate (gives a single possible nucleotide sequence) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Oct 18 16:06:10 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Oct 2007 21:06:10 +0100 Subject: [Biopython-dev] BioSQL documentation Message-ID: <4717BCB2.2070301@maubp.freeserve.co.uk> I was just having a look at: http://biopython.org/DIST/docs/biosql/python_biosql_basic.html The source for this HTML and PDF document lives here in the BioSQL CVS: biosql-schema/doc/biopython/python_biosql_basic.tex It would be nice to update the following section: > 3.3 Loading a GenBank file into the database > > ... > > Now we want to do the loading of the file into the database. The > Biopython implementation works by taking a standard Iterator object > that returns Biopython SeqFeature objects and then doing the loading. I think that should actually say "... that returns Biopython SeqRecord objects containing SeqFeature objects ..." > ... our GenBank file, which we can do with the following code: > >>>> from Bio import GenBank parser = GenBank.FeatureParser() >>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser) That can now be done with Bio.SeqIO which should be clearer, and hopefully make it easier to see how to extend this to SwissProt etc: from Bio import SeqIO iterator = SeqIO.parse(open("cor6_6.gb"), "genbank") I would do this myself, but I don't have a BioSQL database setup myself right now. It would also be nice if the documentation didn't skip the bit about setting up the database with the BioSQL schema, or at least had links to the relevant BioSQL documentation. Peter From bugzilla-daemon at portal.open-bio.org Thu Oct 18 22:15:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 22:15:01 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710190215.l9J2F1bo006275@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 22:15 EST ------- > It would be nice to simplify Biopython by also deprecating Bio.Translate, To avoid a plethora of translation functions, I would prefer that. > but if we want to do this without loss of current functionality we > need to consider including the following in Bio.Seq: > - translate_to_stop (all amino acids up to but excluding the first stop) Whether or not to stop translating at the first stop codon could be an argument to the translate method. As an alternative, it may be preferable to have a split() method that splits the sequences at the stop codons. Such a method could be applied to all protein sequences, not only those created by translate(). > - back_translate (gives a single possible nucleotide sequence) Does anybody actually use this function? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From salish at picasso.ucsf.edu Fri Oct 19 03:12:53 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Fri, 19 Oct 2007 00:12:53 -0700 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: <200710190215.l9J2F1bo006275@portal.open-bio.org> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> Message-ID: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> Yes. Back-translating a sequence is important in codon optimization, searching for homologous proteins, etc. > > - back_translate (gives a single possible nucleotide sequence) > Does anybody actually use this function? > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From bugzilla-daemon at portal.open-bio.org Fri Oct 19 08:38:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 19 Oct 2007 08:38:59 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710191238.l9JCcx4I001886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #37 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-19 08:38 EST ------- I would have suggested adding a mxTextTools version check to test_GenBankFormat.py and throwing the external dependancy error is 3.0 is found. That would "solve" the problem test case, and after the next release we could begin the process of deprecating the modules you suggested. But I'm OK with your suggestion Michiel (comment 36), although it seems a little drastic. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Fri Oct 19 04:08:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Oct 2007 09:08:41 +0100 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> Message-ID: <47186609.3090408@maubp.freeserve.co.uk> Howard Salis wrote: > Yes. Back-translating a sequence is important in codon optimization, > searching for homologous proteins, etc. Unlike forward translation, transcription, back-transcription, complements and reverse complements, back-translation is not a one-to-one mapping. In your examples, would you want to know all: - all possible back translations (as unambigous nucleotides) - all possible back translations (as ambigous nucleotides) - a possible back translation (using ambiguous nucleotides) - a possible back translation (using un-ambiguous nucleotides) For example, back translating an Tyr => UAC or UAU => UAW (nice and clear - we can represent this perfectly with a single ambiguous codon). On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN Oh, and would you expect DNA or RNA back? Peter From salish at picasso.ucsf.edu Fri Oct 19 12:31:49 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Fri, 19 Oct 2007 09:31:49 -0700 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <47186609.3090408@maubp.freeserve.co.uk> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk> Message-ID: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> Yes, I know it's a one-to-many mapping. But that's why it's nice to have a handy subroutine for doing it. For codon optimization, all possible back translations with unambiguous nucleotides would be best. Then, one evaluates some objective function over all possible sequences to find an optimal one. Optimality depends on the application, but eliminating restriction sites, avoiding certain repetitive or transposon sequences, etc is very common. For searching for homologous proteins, it would be best to have the back-translate function produce something that could be fed into an alignment program or regexp expression. Then, one could align a database of sequences with your back-translated protein to determine which sequence is most similar to your protein. Basically, this is what BlastP does (you might want to look up its algorithm to determine a good way of doing this, if you wish to reproduce it in Biopython or, otherwise, rely on the NCBI webserver). What does the current back-translate function output? -Howard On 10/19/07, Peter wrote: > Howard Salis wrote: > > Yes. Back-translating a sequence is important in codon optimization, > > searching for homologous proteins, etc. > > Unlike forward translation, transcription, back-transcription, > complements and reverse complements, back-translation is not a > one-to-one mapping. > > In your examples, would you want to know all: > - all possible back translations (as unambigous nucleotides) > - all possible back translations (as ambigous nucleotides) > - a possible back translation (using ambiguous nucleotides) > - a possible back translation (using un-ambiguous nucleotides) > > For example, back translating an Tyr => UAC or UAU => UAW (nice and > clear - we can represent this perfectly with a single ambiguous codon). > On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN > > Oh, and would you expect DNA or RNA back? > > Peter > > From bugzilla-daemon at portal.open-bio.org Mon Oct 22 05:07:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Oct 2007 05:07:05 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710220907.l9M975hw013729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-22 05:07 EST ------- Marking as fixed -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 22 08:30:59 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Oct 2007 13:30:59 +0100 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk> <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> Message-ID: <471C9803.8050709@maubp.freeserve.co.uk> Howard Salis wrote: > > What does the current back-translate function output? > Short example, >>> from Bio import Translate >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> my_dna = Seq("GCCGCATGCATAGATAGATAG", unambiguous_dna) >>> my_prot = Translate.unambiguous_dna_by_id[11].translate(my_dna) >>> my_prot Seq('AACIDR*', HasStopCodon(IUPACProtein(), '*')) >>> Translate.unambiguous_dna_by_id[11].back_translate(my_prot) Seq('GCTGCTTGTATTGATCGTTAA', IUPACUnambiguousDNA()) >>> my_dna Seq('GCCGCATGCATAGATAGATAG', IUPACUnambiguousDNA()) i.e. The current back_translate picks one possible back translation. Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 22 12:52:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Oct 2007 12:52:12 -0400 Subject: [Biopython-dev] [Bug 2386] New: Bio.Seq.Seq and MutableSeq count() method only works for single residues Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2386 Summary: Bio.Seq.Seq and MutableSeq count() method only works for single residues Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Logging this bug to report an issue raised on the mailing list by Jimmy Musselwhite. The Seq object and MutableSeq objects' count methods only works for single residues (e.g. "G"). Zero is returned when asked to count a multicharacter string, "GG" for example. For compatibility with strings, my_seq.count("GG") should work as expected, returning the same value as my_seq.tostring().count("GG") Doing this for the Seq object is trivial. Adding support for the MutableSeq could be done via the tostring() method but there might be a more elegant solution with less overhead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 23 20:46:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 23 Oct 2007 20:46:58 -0400 Subject: [Biopython-dev] Removing deprecated functionality Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> Hi everybody, We now have a fixed Biopython in CVS that works with both the old and the new mxTextTools. I am planning to create the new Biopython release later this week. Bio.Kabat and the blast and blasturl functions in Bio.Blast.NCBIWWW were deprecated in previous Biopython. The two functions in Bio.Blast.NCBIWWW have been deprecated in favor of qblast starting with Biopython 1.40b (February 2005). I am planning to remove this functionality for release 1.44 -- let us know if this would cause you some problems. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Thu Oct 25 12:58:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 12:58:19 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710251658.l9PGwJgB007432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 12:58 EST ------- Marking as fixed, Michiel made the changes outlined in comment 36 in CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 13:02:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 13:02:50 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251702.l9PH2oC8008104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Uppdated lcc code. |Updated Bio.lcc code. ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 13:02 EST ------- Sebastian - any feedback on my above comment? P.S. Corrected spelling in bug title. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 13:22:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 13:22:46 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251722.l9PHMkFm009816@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 ------- Comment #4 from sbassi at gmail.com 2007-10-25 13:22 EST ------- (In reply to comment #3) > Sebastian - any feedback on my above comment? > > P.S. Corrected spelling in bug title. > I do agree with most of your comments, but I can't implement them right now because of my current workload. LCC stands for Local Composition Complexity (see here http://mrw.interscience.wiley.com/emrw/9780470015902/els/article/a0005260/current/abstract) Please move it to Bio/SeqUtils/. I don't know the values for ambiguous nucleotides, I would ckeck this for next version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 14:01:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 14:01:50 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251801.l9PI1oRF012742@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:01 EST ------- I've checked this in as Bio/SeqUtils/lcc.py (and deprecated Bio/lcc.py which had a slightly different API since you dropped the start/end options in lcc_mult). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 14:25:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 14:25:49 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251825.l9PIPnEG015022@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:25 EST ------- Also updated Bio/SeqUtils/lcc.py to cope with Seq and MutableSeq objects in addition to strings. Plus added a unit test, test_SeqUtils.py which covers both Bio.SeqUtils.lcc and Bio.SeqUtils.CheckSum and replaces my older test_CheckSum.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:03:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 18:03:15 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710252203.l9PM3For029293@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 18:03 EST ------- Created an attachment (id=795) --> (http://bugzilla.open-bio.org/attachment.cgi?id=795&action=view) Rough patch to add methods to Bio.Seq Part of this patch is to add ambiguous_generic_by_id and ambiguous_generic_by_name entries to Bio.Data.CodonTable, variants of the unambiguous generic_by_id and generic_by_name tables. Using this lets us properly translate ambiguous sequences. This patch does NOT tackle back_translate, or have special treatment of start/stop codons, in the Seq methods. This patch does NOT mark Bio.Translate or Bio.Transcribe as deprecated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Thu Oct 25 22:30:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 25 Oct 2007 22:30:56 -0400 Subject: [Biopython-dev] CVS freeze Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Hi everybody, With the fixed Biopython now in CVS, I'm planning to make the new Biopython release tomorrow (Saturday). I'd therefore like to ask you not to make commits to CVS starting from 0:01 GMT on Saturday, until the new release is out. If you make any commits before that time, please double-check that all the Biopython tests still run. Also, if you have some important patches for which you need more time, please let us know. Thanks! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bsouthey at gmail.com Fri Oct 26 11:12:14 2007 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Oct 2007 10:12:14 -0500 Subject: [Biopython-dev] CVS freeze In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: Hi, Just in case you are not aware of it, UniProt is going to make a substantial change to the DE line in SwissProt/TrEMBL format 'Not before: 01-Feb-2008'. See http://www.expasy.org/sprot/relnotes/sp_soon.html#DE Bruce On 10/25/07, Michiel De Hoon wrote: > Hi everybody, > > With the fixed Biopython now in CVS, I'm planning to make the new Biopython > release tomorrow (Saturday). I'd therefore like to ask you not to make > commits to CVS starting from 0:01 GMT on Saturday, until the new release is > out. If you make any commits before that time, please double-check that all > the Biopython tests still run. Also, if you have some important patches for > which you need more time, please let us know. > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython-dev at maubp.freeserve.co.uk Fri Oct 26 11:24:57 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 16:24:57 +0100 Subject: [Biopython-dev] DE line in SwissProt/TrEMBL format In-Reply-To: References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: <472206C9.6060407@maubp.freeserve.co.uk> Bruce Southey wrote: > Hi, > Just in case you are not aware of it, UniProt is going to make a > substantial change to the DE line in SwissProt/TrEMBL format 'Not > before: 01-Feb-2008'. See > http://www.expasy.org/sprot/relnotes/sp_soon.html#DE > > Bruce Thanks for the heads up. I don't think we need to worry about that for the planned release. We should still be able to parse the new files, its just the new structured description will be stored as a single concatenated string in Biopython. Peter From mdehoon at c2b2.columbia.edu Fri Oct 26 23:12:46 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 26 Oct 2007 23:12:46 -0400 Subject: [Biopython-dev] CVS freeze References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B644@mail2.exch.c2b2.columbia.edu> Thanks for letting us know. I think that it is OK though to make the release now, as we'll probably have another release before the date the SwissProt/TrEMBL format changes. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Bruce Southey [mailto:bsouthey at gmail.com] Sent: Fri 10/26/2007 11:12 AM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] CVS freeze Hi, Just in case you are not aware of it, UniProt is going to make a substantial change to the DE line in SwissProt/TrEMBL format 'Not before: 01-Feb-2008'. See http://www.expasy.org/sprot/relnotes/sp_soon.html#DE Bruce On 10/25/07, Michiel De Hoon wrote: > Hi everybody, > > With the fixed Biopython now in CVS, I'm planning to make the new Biopython > release tomorrow (Saturday). I'd therefore like to ask you not to make > commits to CVS starting from 0:01 GMT on Saturday, until the new release is > out. If you make any commits before that time, please double-check that all > the Biopython tests still run. Also, if you have some important patches for > which you need more time, please let us know. > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mdehoon at c2b2.columbia.edu Sun Oct 28 02:32:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:32:40 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Hi everybody, Biopython release 1.44 is now available for download from the Biopython website at http://biopython.org. This release includes lots of code improvements and fixes in the Blast interface and parsers, sequence input/output, the SwissProt parser, the clustering routines, as well as a brand new module for population genetics. For reasons of compatibility, some radical changes were necessary in some parts of the code; please let us know if you find some functionality missing. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Sun Oct 28 02:35:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:35:12 -0400 Subject: [Biopython-dev] Non-ASCII character in PopGen documentation Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> While making the 1.44 release, I noticed that a non-ascii character in a formula in the PopGen documentation was causing problems for Hevea. As I couldn't guess what the formula should be, I replaced this formula by a placeholder (see CVS). Can somebody have a look at this and try to fix it? Thanks! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 05:43:56 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 09:43:56 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <472459DC.3050907@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Grand. Thank you for dedicating your weekend to this Michiel. A couple of questions, the main Wiki page is locked and needs updating to mention the new release. Who has access? Secondly, I see you have updated the open-bio news feed, http://news.open-bio.org/ What about http://biopython.org/news/ - which appears not to have been used at all since it was started. Perhaps we can just have a filtered view of http://news.open-bio.org/ here? Related to this, on the wiki News page perhaps we can just show the last few items from http://news.open-bio.org/index.rdf (even though some of them are for other Bio* projects). Peter From tiagoantao at gmail.com Sun Oct 28 14:24:55 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 18:24:55 +0000 Subject: [Biopython-dev] Non-ASCII character in PopGen documentation In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> Message-ID: <4724D3F7.40105@gmail.com> I currently have a different version of Tutorial.tex here (I have other things already written in preparation for future versions). I don't know how the non-ascii character got there. The formula is: \[ N_{m} = \frac{1 - F_{st}}{4F_{st}} \] Michiel De Hoon wrote: > While making the 1.44 release, I noticed that a non-ascii character in a > formula in the PopGen documentation was causing problems for Hevea. As I > couldn't guess what the formula should be, I replaced this formula by a > placeholder (see CVS). Can somebody have a look at this and try to fix it? > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- tiagoantao at gmail.com http://tiago.org/ps From tiagoantao at gmail.com Sun Oct 28 16:54:06 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 20:54:06 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <4724F6EE.50805@gmail.com> Hi, Michiel De Hoon wrote: > This release includes lots of code improvements and fixes in the Blast > interface and parsers, sequence input/output, the SwissProt parser, the > clustering routines, as well as a brand new module for population genetics. > For reasons of compatibility, some radical changes were necessary in some > parts of the code; please let us know if you find some functionality missing. Is it OK to resume uploading non stable code to CVS? I have a few things that I would like to add to the population genetics module (coalescent simulation), but that still needs polishing (mainly documenting and test code). Tiago -- tiagoantao at gmail.com http://tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 16:55:42 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 20:55:42 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <4724F6EE.50805@gmail.com> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <4724F74E.5010801@maubp.freeserve.co.uk> Tiago Antao wrote: > Is it OK to resume uploading non stable code to CVS? I have a few things > that I would like to add to the population genetics module (coalescent > simulation), but that still needs polishing (mainly documenting and test > code). Wait and see what Michiel's says. However, perhaps you should hold off a few more days - in case there are any teething problems with the Biopython 1.44 that would warrant making another release ASAP. Peter From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 15:59:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 19:59:41 +0000 Subject: [Biopython-dev] mxTextTools optional? In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <4724EA2D.3020609@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Given some of the changes (deprecations) made in this release, perhaps we can now change setup.py so that mxTextTools is merely suggested, but not required (the same way we treat reportlab and Numeric). Any comments? Peter From mdehoon at c2b2.columbia.edu Sun Oct 28 21:12:48 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 21:12:48 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <472459DC.3050907@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B647@mail2.exch.c2b2.columbia.edu> > Michiel De Hoon wrote: > > Hi everybody, > > > > Biopython release 1.44 is now available for download from the Biopython > > website at http://biopython.org. > > Grand. Thank you for dedicating your weekend to this Michiel. > > A couple of questions, the main Wiki page is locked and needs updating > to mention the new release. Who has access? Apparently I do. I updated this page now. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2913 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071028/20e03942/attachment.bin From mdehoon at c2b2.columbia.edu Sun Oct 28 21:57:18 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 21:57:18 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B648@mail2.exch.c2b2.columbia.edu> > Is it OK to resume uploading non stable code to CVS? I have a few things > that I would like to add to the population genetics module (coalescent > simulation), but that still needs polishing (mainly documenting and test > code). Sure, go ahead. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Sun Oct 28 22:01:16 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 22:01:16 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B649@mail2.exch.c2b2.columbia.edu> On second thought, I agree with Peter's suggestion of waiting a few days to see if any disasters show up with the 1.44 release. Sorry! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Tiago Antao [mailto:tiagoantao at gmail.com] Sent: Sun 10/28/2007 4:54 PM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] Biopython release 1.44 ready Hi, Michiel De Hoon wrote: > This release includes lots of code improvements and fixes in the Blast > interface and parsers, sequence input/output, the SwissProt parser, the > clustering routines, as well as a brand new module for population genetics. > For reasons of compatibility, some radical changes were necessary in some > parts of the code; please let us know if you find some functionality missing. Is it OK to resume uploading non stable code to CVS? I have a few things that I would like to add to the population genetics module (coalescent simulation), but that still needs polishing (mainly documenting and test code). Tiago -- tiagoantao at gmail.com http://tiago.org/ps From mdehoon at c2b2.columbia.edu Sun Oct 28 22:02:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 22:02:12 -0400 Subject: [Biopython-dev] mxTextTools optional? References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724EA2D.3020609@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64A@mail2.exch.c2b2.columbia.edu> The fewer software packages Biopython requires, the better, to keep things simple for users, not to mention developers. For a future release, we should check if the modules that still rely on mxTextTools can be replaced by pure-Python code. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Sun 10/28/2007 3:59 PM To: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] mxTextTools optional? Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Given some of the changes (deprecations) made in this release, perhaps we can now change setup.py so that mxTextTools is merely suggested, but not required (the same way we treat reportlab and Numeric). Any comments? Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3310 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071028/1f83a5f2/attachment.bin From bugzilla-daemon at portal.open-bio.org Mon Oct 29 13:21:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 13:21:03 -0400 Subject: [Biopython-dev] [Bug 2390] New: Error importing Swiss Prot in BioSQL Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2390 Summary: Error importing Swiss Prot in BioSQL Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: Biosql at hotmail.com Hello, I already submitted this problem in the mailing list, where I can't import the SwissProt flat file in BioSQL, even with the new version (1.44) of Biopython. Here's the script I'm using : from BioSQL import BioSeqDatabase from Bio.SwissProt import SProt server = BioSeqDatabase.open_database(driver = 'MySQLdb', user = '', passwd = '', host = 'localhost', db = 'bioseqdb') s_parser = SProt.SequenceParser() s_iterator = SProt.Iterator(open('path to/uniprot_sprot.dat', 'r'), s_parser) db = server.new_database('Swiss') db.load(s_iterator) And here's the error: Traceback (most recent call last): File '', line 1, in File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 414, in load db_loader.load_seqrecord(cur_record) File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 250, in _load_bioentry_table version)) File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 277, in execute self.cursor.execute(sql, args or ()) File '/sw/lib/python2.5/site-packages/MySQLdb/cursors.py', line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Thanks for the help ! Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 29 13:23:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 13:23:54 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710291723.l9THNsun017818@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #1 from Biosql at hotmail.com 2007-10-29 13:23 EST ------- Created an attachment (id=799) --> (http://bugzilla.open-bio.org/attachment.cgi?id=799&action=view) Sample of Swiss Prot flat file -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 29 15:19:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 15:19:01 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710291919.l9TJJ1O2026999@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-29 15:19 EST ------- I'm trying to narrow down the problem: * Have you tried different input SwissProt files? * Have you tried a GenBank file (using the GenBank parser)? * Did you check the username/password as suggested on the mailing list (empty strings look wrong to me)? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 15:58:45 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 19:58:45 +0000 Subject: [Biopython-dev] BioRegistry, Bio.db Message-ID: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> While looking over the Tutorial this evening (and making some sequence related updates), I and noticed that the section "BioRegistry ? automatically ?nding sequence sources" (in the Cook Book chapter) doesn't work anymore. I believe that Bio.db is setup by the complicated and un-commented code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was commented out for Biopython 1.44 Does anyone use this module? I've never really looked at it in depth, but it looks interesting and perhaps worth saving. Note if we do want to resurrect it, it needs a unit test. At first glance, the only Martel dependency here is for recognising error conditions and giving nice messages instead. If that's all it is used for, then perhaps we can switch to regular expressions instead. Peter From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 17:39:50 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 21:39:50 +0000 Subject: [Biopython-dev] Removing deprecated functionality In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00710291439t6f636964i9681e2b0c90e6c96@mail.gmail.com> On 10/24/07, Michiel De Hoon wrote: > Bio.Kabat and ,,, were deprecated in previous Biopython. .... > I am planning to remove this functionality for release 1.44 I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is it OK if we remove the now empty directory as well? Peter From mdehoon at c2b2.columbia.edu Mon Oct 29 21:06:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 29 Oct 2007 21:06:38 -0400 Subject: [Biopython-dev] Removing Bio.Kabat References: <320fb6e00710291438x3f7d7d57t77b06e4b2221c470@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64E@mail2.exch.c2b2.columbia.edu> As far as I know, it is not possible to remove a directory in CVS. See http://www.thathost.com/wincvs-howto/cvsdoc/cvs_7.html#SEC69 I believe that it is possible to remove a directory by hand from the CVS source tree, but it is not the official way to do it. Hopefully, we can remove directories once we're using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: Mon 10/29/2007 5:38 PM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] Removing Bio.Kabat On 10/24/07, Michiel De Hoon wrote: > Bio.Kabat and ,,, were deprecated in previous Biopython. .... > I am planning to remove this functionality for release 1.44 I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is it OK if we remove the now empty directory as well? Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 30 08:25:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 08:25:20 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301225.l9UCPKjo026963@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 08:25 EST ------- Created an attachment (id=800) --> (http://bugzilla.open-bio.org/attachment.cgi?id=800&action=view) Patch to Bio/Seq.py count methods Lets the Seq and MutableSeq count methods take either a single letter or a multiple letter argument, which can be strings, Seq objects or MutableSeq objects. Adds doc-strings Includes a trivial mini-test which would be used in the Seq unit test instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Tue Oct 30 10:17:29 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 30 Oct 2007 10:17:29 -0400 Subject: [Biopython-dev] Biopython SVN Transition OK'd Message-ID: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> Hi all, Biopython just got the okay from OpenBio to transition from CVS to Subversion--a good step in the right direction (though I've recently started transitioning from SVN to Bazaar VCS). All we have to do is come up with a date when the CVS repository can be locked down and taken offline. Also, I need to know what is needed from me in terms of helping all the devs migrate to SVN. I produced a screencast series on Subversion at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html Would providing links to these on the wiki be sufficient? What further information would you like to know? Subversion is not a radical departure from CVS and many of the commands are a one-to-one mapping. The biggest difference is commits occur for the whole repository, not on a per-file basis, and directories are tracked, as well. Let's get a discussion on this and set a date soon. Chris From bugzilla-daemon at portal.open-bio.org Tue Oct 30 10:25:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 10:25:01 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301425.l9UEP19U002945@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #2 from dalloliogm at gmail.com 2007-10-30 10:25 EST ------- The new code is good, but please consider about implementing case-insensitive searches: >>> Seq('AACCCCaa').count('a') ... 2 >>> Seq('AACCCCaa').count('a', 'i') ... 4 they could be useful in many cases, because sometimes one has to deal mixed-case sequences. I think the easiest way to implement this would be by using regular expressions.. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 14:02:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 14:02:49 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710301802.l9UI2n1J020073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #3 from Biosql at hotmail.com 2007-10-30 14:02 EST ------- (In reply to comment #2) > I'm trying to narrow down the problem: > * Have you tried different input SwissProt files? > * Have you tried a GenBank file (using the GenBank parser)? > * Did you check the username/password as suggested on the mailing list (empty > strings look wrong to me)? > > Peter > I'm sorry Peter, the reply you sent me on the mailing list was cut in half and I didn't see the rest of your message until I've read it directly on the mailing list. I tried to parse the cor6_6.gb with the Genbank parser and I'm getting the same result, sorry I didn't tried this before. I also tried what you suggest with the SeqIO module with the cor6_6.gb and also a SwissProt file and I'm still getting the same TypeError, which is : Traceback (most recent call last): File "DB_Gen.py", line 25, in db.load(iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "build/bdist.macosx-10.4-ppc/egg/MySQLdb/cursors.py", line 151, in execute TypeError: not all arguments converted during string formatting It seems to me that the problem could be with the MySQLdb module, but I don't understant since I'm using the latest release 1.2.2c1, but I've also tried it with the stable 1.2.2 release. Am I right ? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:06:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:06:38 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301906.l9UJ6cDZ023596@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST ------- I really don't want to make the Seq count method different to the python string count method. Speaking of which, the string uses count(sub [, start[, end]]) to allow searching with a optional start and further optional end index. I should probably add that. In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is a simple enough way of doing things. Counting case insensistive variants of a longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the python re library would work directly on Seq objects (without having to explicitly turn them into strings first). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:06:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:06:52 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710301906.l9UJ6q7l023634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST ------- Thanks for that. It looks like we can *probably* rule out a problem in the sequence parsing. Unfortunately I personally haven't used BioSQL myself (yet), and don't have a system setup here I can try this on. It appears (just from reading the stack error) that there is some mis-match between the SQL query (which I assume contains python % placeholders) and the list of arguments (to go in these placeholders). If you fancy trying to investigate this further yourself, I would start by adding a break point on BioSQL/BioSeqDatabase.py line 277 to check out what contents of the sql and args variables are. Or, just add some print statements just before line 277: self.cursor.execute(sql, args or ()) I hope someone else on the mailing list will have some suggestions... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:22:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:22:30 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301922.l9UJMUoM024725@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #4 from howard.salis at gmail.com 2007-10-30 15:22 EST ------- How about the upper and lower methods for Seq classes? Then, one could do my_seq.upper().count("ATG") Would that work well? -Howard (In reply to comment #3) > I really don't want to make the Seq count method different to the python string > count method. > > Speaking of which, the string uses count(sub [, start[, end]]) to allow > searching with a optional start and further optional end index. I should > probably add that. > > In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is > a simple enough way of doing things. Counting case insensistive variants of a > longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the > python re library would work directly on Seq objects (without having to > explicitly turn them into strings first). > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 15:30:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 19:30:29 +0000 Subject: [Biopython-dev] Biopython SVN Transition In-Reply-To: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> Message-ID: <47278655.8090300@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi all, > > Biopython just got the okay from OpenBio to transition from CVS to > Subversion--a good step in the right direction (though I've recently > started transitioning from SVN to Bazaar VCS). All we have to do is > come up with a date when the CVS repository can be locked down and > taken offline. I was wondering if anyone would start suggesting moving to git or something else ;) Michiel - are you expecting any complications from CVS to SVN regarding the build process? Another thought; will existing developer accounts "just work" on the SVN system? Also do you (Chris) have CVS access, and if not do you need or want it? > Also, I need to know what is needed from me in terms of helping all > the devs migrate to SVN. I produced a screencast series on Subversion > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash plugin working on my 64bit Ubuntu. I'll have to check that out on Windows later in the week :) If you are able to field any queries on the mailing list, that would probably be fine. > Would providing links to these on the wiki be sufficient? If you could look after that aspect of the wiki, that would be great. > What further information would you like to know? Subversion is not a > radical departure from CVS and many of the commands are a one-to-one > mapping. The biggest difference is commits occur for the whole > repository, not on a per-file basis, and directories are tracked, as > well. The fact the CVS and SVN are relatively similar is probably one reason why no-one has raised any real objections to the move. > Let's get a discussion on this and set a date soon. In terms of timing, how long do you/the OBF guys expect the transfer to take? And would they prefer to do this over a weekend or mid week? Barring any problems with Biopython 1.44 which would force us to do another release in the very short term, I guess in the next fortnight is reasonable (especially if we only expect a couple of days downtime). Of course, I personally want to start working on the Seq objects and alignments - and Tiago wants to get back to his Population Genetics module. Peter P.S. Would you or any of the people doing the transition be able to sort out bug 2363? http://bugzilla.open-bio.org/show_bug.cgi?id=2363 From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:33:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:33:40 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301933.l9UJXedO025330@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:33 EST ------- Adding .upper() and .lower() methods is on my mental todo list, just a bit lower down the my priorities than the .count() method (this bug) and biological methods covered on bug 2381. One of us should file an enhancement bug for .upper() and .lower() I agree they are needed to make the Seq object more string like. However the implementation is non-trivial due to the alphabet object (which may define a case sensitive list of expected letters). And yes, once these methods are supported then doing my_seq.upper().count("ATG") would work fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:45:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:45:35 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710301945.l9UJjZlQ026374@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Make SeqRecord subclass Seq |Make Seq more like a string, |subclass string? |even subclass string? ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:45 EST ------- I modified the title to focus on the Seq object. See also bug 2386 (about the count method), and bug 2381 (about biological methods). (In reply to comment #4) > (In reply to comment #3) > > It does not add any .short() method to give a truncated representation > > string like the current str() method gives. > > Why not? This new method should not cause any compatibility problem Mainly because I'm not convinced that we need a .short() method, and its harder to remove things at a later date (as people may be using them). Surely my_seq[:50] or depending on the context, str(my_seq[:50]), is enough? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 18:32:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 18:32:12 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710302232.l9UMWCb3004960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #5 from Biosql at hotmail.com 2007-10-30 18:32 EST ------- It seems that a %s is missing at line 243 in Loader.py, since there's only 6 %s in the sql query, but 7 arguments are being fed for the loading of bioentry. So I added an %s and the loading is fine, but another problem is arising after this. Traceback (most recent call last): File "DB_Gen.py", line 25, in db.load(iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 415, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 253, in _load_bioentry_table bioentry_id = self.adaptor.last_id('bioentry') File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 148, in last_id return self.dbutils.last_id(self.cursor, table) File "/sw/lib/python2.5/site-packages/BioSQL/DBUtils.py", line 35, in last_id return cursor.insert_id() AttributeError: 'Cursor' object has no attribute 'insert_id' I'm gonna check it tommorow. Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 18:36:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 22:36:43 +0000 Subject: [Biopython-dev] BioRegistry, Bio.db In-Reply-To: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> References: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> Message-ID: <4727B1FB.2020803@maubp.freeserve.co.uk> Peter wrote: > While looking over the Tutorial this evening (and making some sequence > related updates), I noticed that the section "BioRegistry ? > automatically ?nding sequence sources" (in the Cook Book chapter) > doesn't work any more. Does anyone here use this? Should I ask on the main list? > I believe that Bio.db is setup by the complicated and un-commented > code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was > commented out for Biopython 1.44 Confirmed. After uncommenting the call to _load_registries() in Bio/__init__.py the example in the tutorial using Bio.db works. Note you do get a DeprecationWarning about the concurrent behaviour provided by Bio.MultiProc, but I have not explored any further. Thoughts? Peter From mdehoon at c2b2.columbia.edu Tue Oct 30 21:05:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 30 Oct 2007 21:05:22 -0400 Subject: [Biopython-dev] Biopython SVN Transition References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu> > Michiel - are you expecting any complications from CVS to SVN regarding > the build process? For the build process, we are not doing anything very complicated with CVS, so I doubt that there will be any major problems when we start using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Tue Oct 30 21:05:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 30 Oct 2007 21:05:22 -0400 Subject: [Biopython-dev] Biopython SVN Transition References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu> > Michiel - are you expecting any complications from CVS to SVN regarding > the build process? For the build process, we are not doing anything very complicated with CVS, so I doubt that there will be any major problems when we start using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2845 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071030/e1307bde/attachment.bin From bugzilla-daemon at portal.open-bio.org Tue Oct 30 21:30:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 21:30:20 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710310130.l9V1UKEN014287@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-30 21:30 EST ------- First, let's think about how a Seq object should look like, before getting into implementation details. In my opinion, a Seq object is essentially a string, but with some added functionality that are useful in biological contexts. Currently, this is limited to specifying an alphabet. Personally, I never used such an alphabet, so in practice I prefer using a simple string instead of a Seq object. However, if we extend its functionality, I think a Seq class can be useful enough to warrant its existence in Biopython. In short, to my mind a Seq object should have the following properties: 1) A Seq object is basically a string, so it should behave as if it were subclassed from string. 2) As a result, functions that have a sequence as an argument, but don't need the added features of a Seq object, should work with strings as well as Seq objects. 3) The sequence should be mutable, so that we won't need a separate MutableSeq class. This also implies that a Seq class cannot subclass from string, since strings are not mutable. 4) Currently, Seq objects have an associated alphabet; SeqRecord objects have annotations, dbxrefs, a description, features, id, and name. I think a new Seq object should have both, so that we can avoid having both a Seq and a SeqRecord class. Of course, some or all of these fields can remain None. 5) A Seq class should have methods that one expects from a sequence class, in particular complement(), reverse_complement(), perhaps a modified count() that can ignore case. With respect to 3), we'd probably have to write such a Seq class in C. The end result would be a Seq class that actually has some benefit to the user, without requiring its use when a string suffices, and avoids having three classes (Seq, MutableSeq, SeqRecord) for essentially the same thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Wed Oct 31 01:55:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 31 Oct 2007 01:55:03 -0400 Subject: [Biopython-dev] Biopython SVN Transition In-Reply-To: <47278655.8090300@maubp.freeserve.co.uk> References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <128a885f0710302255y4c34ac8axa48f48b253d5854a@mail.gmail.com> On 10/30/07, Peter wrote: > I was wondering if anyone would start suggesting moving to git or > something else ;) I tried Git and didn't like it. Bazaar suits me much better, and it even has support for SVN repositories with bzr-svn. Git is not truly cross-platform. It performs terribly on Windows. This left me looking at Mercurial (Hg) and Bazaar (bzr). I liked the direction that Bazaar was moving in and their emphasis on testing with real unit/regression tests. For those interested, you can see some of the "literature" I read through on my del.icio.us page: http://del.icio.us/gotgenes/dscm > Another thought; will existing developer accounts "just work" on the SVN > system? Also do you (Chris) have CVS access, and if not do you need or > want it? The existing developer accounts will "just work" because they're going to do SVN over SSH. I have SSH access on the machine and CVS access as well. Thanks for checking. > > Also, I need to know what is needed from me in terms of helping all > > the devs migrate to SVN. I produced a screencast series on Subversion > > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a > > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html > > Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash > plugin working on my 64bit Ubuntu. I'll have to check that out on > Windows later in the week :) Bummer! Does nspluginwrapper not work? > If you are able to field any queries on the mailing list, that would > probably be fine. I'd be happy to do that. Should this page be renamed to SVN to be in the same line as tho CVS page? > > Would providing links to these on the wiki be sufficient? > > If you could look after that aspect of the wiki, that would be great. At some point I had started this: http://biopython.org/wiki/Subversion_migration > > What further information would you like to know? Subversion is not a > > radical departure from CVS and many of the commands are a one-to-one > > mapping. The biggest difference is commits occur for the whole > > repository, not on a per-file basis, and directories are tracked, as > > well. > > The fact the CVS and SVN are relatively similar is probably one reason > why no-one has raised any real objections to the move. > > > Let's get a discussion on this and set a date soon. > > In terms of timing, how long do you/the OBF guys expect the transfer to > take? And would they prefer to do this over a weekend or mid week? Not sure, let me ask Jason Stajich. > Barring any problems with Biopython 1.44 which would force us to do > another release in the very short term, I guess in the next fortnight is > reasonable (especially if we only expect a couple of days downtime). I think we could expect less than a full day downtime. > Of course, I personally want to start working on the Seq objects and > alignments - and Tiago wants to get back to his Population Genetics module. By all means, continue using CVS until I get a firm date for the Biopython Devs. Even if you have uncommitted changes when the CVS server goes down, you can simply copy the files to your checked out copy of the SVN repository and continue as is. > P.S. Would you or any of the people doing the transition be able to sort > out bug 2363? > http://bugzilla.open-bio.org/show_bug.cgi?id=2363 That's a very good question. I wonder if cvs2svn is capable of picking up those errors in commits and choose the proper format. I had trouble getting a hold of an expert who could tell me how to identify files committed as binary files, and how to change that to text (or vice versa). I should send an email to the Subversion mailing list, perhaps, or the CVS list if it's still active. I'll also check to see if Jason knows. From bugzilla-daemon at portal.open-bio.org Wed Oct 31 05:54:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 31 Oct 2007 05:54:24 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710310954.l9V9sOw7014572@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-31 05:54 EST ------- > In short, to my mind a Seq object should have the following properties: > 1) A Seq object is basically a string, so it should behave as if it were > subclassed from string. I agree, where possible the Seq object should act like a string. In particular str(my_seq) should give the full string. > 2) As a result, functions that have a sequence as an argument, but don't > need the added features of a Seq object, should work with strings as well > as Seq objects. Again, I agree. I've doubled checked this works for some of the recently updated SeqUtils functionality. I would hope we get this "for free" once the Seq object itself becomes more string like. > 3) The sequence should be mutable, so that we won't need a separate > MutableSeq class. This also implies that a Seq class cannot subclass from > string, since strings are not mutable. Why? Python strings are not mutable, and this isn't usually a problem. Personally, I have never needed a mutable sequence and have only ever used them in test cases. Having the basic Seq non-mutable means we can leverage existing string functionality and optimizations. Also writing a new mutable sequence in C seems like a bit maintainance load in the long term (and may complicate the cross platform build process). Surely we can get good enough performance via the array of characters route currently used? On related remark: The fact that the current MutableSeq methods like reverse_complement() work in-situ rather than returning a new object makes switching between the Seq and MutableSeq fiddly. > 4) Currently, Seq objects have an associated alphabet; SeqRecord objects > [also] have annotations, dbxrefs, a description, features, id, and name. > I think a new Seq object should have both, so that we can avoid having both > a Seq and a SeqRecord class. Of course, some or all of these fields can > remain None. I don't really see the benefit over the current scheme. I'm happy with the division between Seq and SeqRecord, but we could go for SeqRecord being a more annotated subclass of the Seq class. This would be similar to Bioperl's Seq, PrimarySeq, or RichSeq objects. Something I do want to add is splicing for SeqRecords, which would return a new SeqRecord with sensible name/id/description. I think for this to really be useful we need to add "per residue annotation", such as lists or strings of information the same length as the sequence (e.g. predicted secondary structure, or sequencing quality scores) which would also get spliced when splicing a SeqRecord. > 5) A Seq class should have methods that one expects from a sequence class, > in particular complement(), reverse_complement(), perhaps a modified count() > that can ignore case. Usually mixed case sequences are used for a reason, and the user may need both case sensitive counts and case insensitive counts. I would keep .count() case sensistive like a real string, and suggest .upper().count() as a simple workarround for case in-sensitive counts. Plus the Seq object should have methods for forward and back transcription and translation, see Bug 2381 A more drastic change we could consider is getting rid of the alphabet as an explicit property, and having ProteinSeq, NucleotideSeq, DnaSeq and RnaSeq (decorator/sub)classes which would have only the relevant biological sequence methods. We would lose the expected "letters" feature of the alphabet, but I don't think this is really helpful at the moment because the Seq class does not enforce it. Otherwise I would advocate when creating a Seq object (or editing a MutableSeq object) the new letters should be screened against self.alphabet.letters (if present). On balance I favour making gradual changes which don't change the current scheme (Seq with Alphabet property; SeqRecord with Seq property). Anything more drastic might best be pursued on a new branch which could become Biopython 2.0 P.S. We should try not to implicitly assume that the elements in a sequence are single letters? What about when working with protein structures which contain modified amino acids (with defined three letter codes) which do not map back to single letters. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 2 09:09:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Oct 2007 05:09:48 -0400 Subject: [Biopython-dev] [Bug 2362] test_copen fails on Windows XP as tries os.fork() In-Reply-To: Message-ID: <200710020909.l9299moD015903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2362 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-10-02 05:09 EST ------- I removed test_copen.py from CVS and deprecated the Bio.MultiProc code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 2 09:06:54 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 05:06:54 -0400 Subject: [Biopython-dev] [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Hi everybody, Since no users of Bio.MultiProc came forward, I deprecated it for the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon Sent: Tue 9/11/2007 10:37 AM To: BioPython Developers List; biopython at biopython.org Subject: [BioPython] Bio.MultiProc Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at gmail.com Tue Oct 2 16:00:41 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 2 Oct 2007 09:00:41 -0700 Subject: [Biopython-dev] [BioPython] Bio.MultiProc In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From biopython-dev at maubp.freeserve.co.uk Tue Oct 2 16:55:53 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 02 Oct 2007 17:55:53 +0100 Subject: [Biopython-dev] Bio.MultiProc / Bio.FormatIO In-Reply-To: References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <47027819.1010207@maubp.freeserve.co.uk> Iddo Friedberg wrote: > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is sort of what Michiel did - he's just added a deprecation warning, but not touched the code itself. This isn't an option for some of the more "integrated" bits of code like Bio.FormatIO which I suggested removing in Bug 2361 (see also my email to the main list on 19 September): http://bugzilla.open-bio.org/show_bug.cgi?id=2361#c27 Peter From rhaygood at duke.edu Tue Oct 2 23:59:43 2007 From: rhaygood at duke.edu (Ralph Haygood) Date: Tue, 2 Oct 2007 19:59:43 -0400 (EDT) Subject: [Biopython-dev] Statistics code In-Reply-To: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> Message-ID: Tiago, Sorry to be so long replying---I've been almost drowning in work. Use anything you find useful in my code. If you do write an article about it, I'd be glad to be a coauthor, not just in name but actually to help with writing the discussion of sequence statistics. There *is* a lot of stuff in my code, not all of it generally important. For example, few people will care about indel statistics, beyond counting them and maybe getting the frequency distribution of their lengths. The things most people will care about are K (the number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing. As for ambiguous nucleotides, my code handles them in one of two ways, at the programmer's option. By default, a site at which any sequence in the alignment contains an ambiguous nucleotide is ignored; for example, ACRGTY ACAGTC is effectively equivalent to ACGT ACGT . However, if the 'expand_diplotypes' option is specified when the Sample object is constructed, each sequence in the alignment is interpreted as a diplotype and converted into a pair of pseudo- haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K) being interpreted as heterozygous; for example, ACRGTY ACAGTC is effectively equivalent to ACAGTC ACGGTT ACAGTC ACAGTC . In expand_diplotypes mode, sites containing three- or four-fold ambiguous nucleotides are still ignored. Also, you'll get a warning if you request a statistic that depends on correct SNP phasing, which most statistics don't. So far, I've found these two operating modes sufficient for my needs. I think your plan sounds very reasonable, just adding sequence statistics at a pace that's comfortable for you. Any time you have questions, feel free to ask me, and I'll give you whatever benefit there is in my opinion and experience. I'm happy for all this to happen on biopython-dev, so that other people (e.g., Alex Lancaster) can add to it. I'll leave it to the core developers to tell us if we're too noisy. (I'd recommend still sending messages to me with copies to biopython-dev, however, so that I don't accidentally miss them on biopython-dev, which I don't always read carefully.) Ralph On Sat, 29 Sep 2007, Tiago Ant?o wrote: > Hi Ralph, > > Hope all is good with you. I am now finally starting to commit > statistics code to Biopython. But before I go ahead I would like to > ask some advice to you (plus some extra comments): > > About code merging and authorship: > > I am finally looking to your code. There is really lots of stuff > there! Would it be OK with you if I merged your code with mine into > Bio.PopGen.Stats? Obviously the copyright/authorship for the module > would be co-shared as would any authorship of any article deriving > from it... > > About a strategy to advance: > > 1. I personally don't have any experience, really, with working with > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and > that sort of stuff) > 2. Starting on Monday I am beginning a PhD which will require, part > time, sequence analysis > 3. What I mean from 1 and 2 is that I currently don't have maturity to > architect and design a good framework for sequence analysis but I will > gain it with time. > My plan is then to defer all sequence code until I fell I know what I > am doing (although I was still thinking in providing something like > BioPerl's facility of extracting all SNPs from sequences) > If this is OK with you I plan to start committing code the week > starting on this Monday, > > About request for insight: > > If you have any comments to offer on issues regarding representing > indels and ambiguous data (ie ambiguous nucleotides) they might be > useful, as I suppose that is the biggest issue that makes me afraid of > sequence code. > > > Finally: I would summarize our discussion here on biopython-dev (I am > not taking it there directly just because you might not want your code > on Biopython or might want it in other terms). > > Thanks, > Tiago > From mdehoon at c2b2.columbia.edu Wed Oct 3 00:18:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 20:18:59 -0400 Subject: [Biopython-dev] [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu> > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is what I did. > 3) Leave an option of fixing and commenting the code back in (i.e. it is not > lost forever). Even after removing the code in some future release, the code will not be lost forever. It can always be retrieved from CVS and from older Biopython releases. > Also, is it possible to track down the original author? That would be Jeff Chang. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Iddo Friedberg [mailto:idoerg at gmail.com] Sent: Tue 10/2/2007 12:00 PM To: Michiel De Hoon Cc: BioPython Developers List; biopython at biopython.org Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From tiagoantao at gmail.com Wed Oct 3 10:14:33 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 11:14:33 +0100 Subject: [Biopython-dev] Coalescent code Message-ID: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> Hi, I had a plan of starting to commit statistical related code this weekend, but (contrary to my expectations) I am having requests for the coalescent code. As such, I am planning to commit the coalescent code instead. It is quite straightforward code, with only one issue that I would require advice: Some of the code (regarding modeling demographies) requires some templates (very small text files, circa 10 of around 700 bytes each) to go along. Where should I put the files in Biopython? Also, on installation those files have to be put somewhere... Tiago -- http://www.tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Wed Oct 3 14:18:21 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 03 Oct 2007 15:18:21 +0100 Subject: [Biopython-dev] Coalescent code In-Reply-To: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> Message-ID: <4703A4AD.7030008@maubp.freeserve.co.uk> Tiago Ant?o wrote: > It is quite straightforward code, with only one issue that I would > require advice: Some of the code (regarding modeling demographies) > requires some templates (very small text files, circa 10 of around 700 > bytes each) to go along. Where should I put the files in Biopython? > Also, on installation those files have to be put somewhere... There is a similar precedent with Bio/EUtils/DTDs (where the data files are XML DTD files). I guess you could have the 10 plain text data files in with the python files (or under a subdirectory). Opinions? I should really refresh myself on current python packaging guidelines... Peter From tiagoantao at gmail.com Wed Oct 3 15:37:17 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 16:37:17 +0100 Subject: [Biopython-dev] Statistics code In-Reply-To: References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com> Message-ID: <6d941f120710030837k1aa2d4ak7eca8e6e27e35fdd@mail.gmail.com> Ralph, Thanks for the detailed explanation. Because of a couple of requests I had, I am going to commit first the coalescent code, but after the coalescent code is in, I will pick this up. Tiago On 10/3/07, Ralph Haygood wrote: > Tiago, > > Sorry to be so long replying---I've been almost drowning in work. > > Use anything you find useful in my code. If you do write an article > about it, I'd be glad to be a coauthor, not just in name but actually > to help with writing the discussion of sequence statistics. > > There *is* a lot of stuff in my code, not all of it generally > important. For example, few people will care about indel statistics, > beyond counting them and maybe getting the frequency distribution of > their lengths. The things most people will care about are K (the > number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu > and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing. > > As for ambiguous nucleotides, my code handles them in one of two ways, > at the programmer's option. By default, a site at which any sequence > in the alignment contains an ambiguous nucleotide is ignored; for > example, > > ACRGTY > ACAGTC > > is effectively equivalent to > > ACGT > ACGT . > > However, if the 'expand_diplotypes' option is specified when the > Sample object is constructed, each sequence in the alignment is > interpreted as a diplotype and converted into a pair of pseudo- > haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K) > being interpreted as heterozygous; for example, > > ACRGTY > ACAGTC > > is effectively equivalent to > > ACAGTC > ACGGTT > ACAGTC > ACAGTC . > > In expand_diplotypes mode, sites containing three- or four-fold > ambiguous nucleotides are still ignored. Also, you'll get a warning > if you request a statistic that depends on correct SNP phasing, which > most statistics don't. So far, I've found these two operating modes > sufficient for my needs. > > I think your plan sounds very reasonable, just adding sequence > statistics at a pace that's comfortable for you. Any time you have > questions, feel free to ask me, and I'll give you whatever benefit > there is in my opinion and experience. > > I'm happy for all this to happen on biopython-dev, so that other > people (e.g., Alex Lancaster) can add to it. I'll leave it to the > core developers to tell us if we're too noisy. (I'd recommend still > sending messages to me with copies to biopython-dev, however, so that > I don't accidentally miss them on biopython-dev, which I don't always > read carefully.) > > Ralph > > On Sat, 29 Sep 2007, Tiago Ant?o wrote: > > > Hi Ralph, > > > > Hope all is good with you. I am now finally starting to commit > > statistics code to Biopython. But before I go ahead I would like to > > ask some advice to you (plus some extra comments): > > > > About code merging and authorship: > > > > I am finally looking to your code. There is really lots of stuff > > there! Would it be OK with you if I merged your code with mine into > > Bio.PopGen.Stats? Obviously the copyright/authorship for the module > > would be co-shared as would any authorship of any article deriving > > from it... > > > > About a strategy to advance: > > > > 1. I personally don't have any experience, really, with working with > > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and > > that sort of stuff) > > 2. Starting on Monday I am beginning a PhD which will require, part > > time, sequence analysis > > 3. What I mean from 1 and 2 is that I currently don't have maturity to > > architect and design a good framework for sequence analysis but I will > > gain it with time. > > My plan is then to defer all sequence code until I fell I know what I > > am doing (although I was still thinking in providing something like > > BioPerl's facility of extracting all SNPs from sequences) > > If this is OK with you I plan to start committing code the week > > starting on this Monday, > > > > About request for insight: > > > > If you have any comments to offer on issues regarding representing > > indels and ambiguous data (ie ambiguous nucleotides) they might be > > useful, as I suppose that is the biggest issue that makes me afraid of > > sequence code. > > > > > > Finally: I would summarize our discussion here on biopython-dev (I am > > not taking it there directly just because you might not want your code > > on Biopython or might want it in other terms). > > > > Thanks, > > Tiago > > -- http://www.tiago.org/ps From tiagoantao at gmail.com Wed Oct 3 16:04:07 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 3 Oct 2007 17:04:07 +0100 Subject: [Biopython-dev] Coalescent code In-Reply-To: <4703A4AD.7030008@maubp.freeserve.co.uk> References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com> <4703A4AD.7030008@maubp.freeserve.co.uk> Message-ID: <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com> Hi On 10/3/07, Peter wrote: > There is a similar precedent with Bio/EUtils/DTDs (where the data files > are XML DTD files). I guess you could have the 10 plain text data files > in with the python files (or under a subdirectory). Opinions? In the mean time, I will start committing the code (I can easily accommodate the details of the places to put the files later, when there is a decision). Michiel, please, please don't include SimCoal code that I will be committing on the next public version. Regards, Tiago From mdehoon at c2b2.columbia.edu Thu Oct 4 00:39:47 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 3 Oct 2007 20:39:47 -0400 Subject: [Biopython-dev] Coalescent code References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com><4703A4AD.7030008@maubp.freeserve.co.uk> <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62E@mail2.exch.c2b2.columbia.edu> > Michiel, please, please don't include SimCoal code that I will be > committing on the next public version. To avoid confusion, please don't commit code to CVS that you don't want to be included in the next Biopython release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Tiago Ant?o Sent: Wed 10/3/2007 12:04 PM To: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] Coalescent code Hi On 10/3/07, Peter wrote: > There is a similar precedent with Bio/EUtils/DTDs (where the data files > are XML DTD files). I guess you could have the 10 plain text data files > in with the python files (or under a subdirectory). Opinions? In the mean time, I will start committing the code (I can easily accommodate the details of the places to put the files later, when there is a decision). Michiel, please, please don't include SimCoal code that I will be committing on the next public version. Regards, Tiago _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Thu Oct 4 02:10:13 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Oct 2007 22:10:13 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710040210.l942ADGF030763@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-10-03 22:10 EST ------- Looking at the patch for Bio.FormatIO: ------------------------- #Would like to have just issued a deprecation warning, and removed this #module later. However, due to the FormatIO code in Bio/SeqRecord.py the #deprecation warning would be triggered whenever someone used the SeqRecord. raise ImportError, "Bio.FormatIO has been removed. Please try Bio.SeqIO instead" ------------------------- Since the patch for Bio/SeqRecord.py removes its dependence on Bio.FormatIO, is it still necessary to raise an ImportError instead of issuing a DeprecationWarning? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 09:44:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 05:44:09 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710050944.l959i9BX029760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-05 05:44 EST ------- In terms of typical usage, SeqRecord does not depend on FormatIO However, from a code perspective, FormatIO and SeqRecord "depend" on each other. If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken module, I wanted to remove it. A DeprecationWarning doesn't seem right if FormatIO is removed, which is why I suggested an ImportError. We might be able instead to MOVE the FormatIO hooks out of SeqRecord and then issue a DeprecationWarning for FormatIO ... but it looks rather complicated, and probably means tackling the Bio.config code as well. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 11:05:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 07:05:49 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710051105.l95B5nXW001755@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2007-10-05 07:05 EST ------- > If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not > depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken > module, I wanted to remove it. A DeprecationWarning doesn't seem right if > FormatIO is removed, which is why I suggested an ImportError. OK, I see. As far as I'm concerned, your patch is fine then. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Oct 5 13:46:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Oct 2007 09:46:51 -0400 Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython In-Reply-To: Message-ID: <200710051346.l95Dkpc2010074@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2174 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from tiagoantao at gmail.com 2007-10-05 09:46 EST ------- It is implemented, documented and with test code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Fri Oct 5 14:26:43 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Oct 2007 15:26:43 +0100 Subject: [Biopython-dev] Configuration files Message-ID: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> Hi, Is there any (Biopython standard) way to configure Biopython during runtime? When writing code sometimes I think it would be very convenient (especially to the programmer using Biopython) to abstract some configuration parameters away from the code. Things like the location of binaries, hosts, user names (and maybe passwords) of databases, timeout parameters, etc. These could be stored on a configuration file (or registry entry, or whatever) thus saving users to have to deal in the code with supplying these... Just an idea... Tiago -- http://www.tiago.org/ps From bugzilla-daemon at portal.open-bio.org Mon Oct 8 11:14:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Oct 2007 07:14:30 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710081114.l98BEUZh019757@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #759 is|0 |1 obsolete| | ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:14 EST ------- (From update of attachment 759) Applied these changes to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 8 10:52:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 08 Oct 2007 11:52:48 +0100 Subject: [Biopython-dev] Configuration files In-Reply-To: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> Message-ID: <470A0C00.50505@maubp.freeserve.co.uk> Tiago Ant?o wrote: > Hi, > > Is there any (Biopython standard) way to configure Biopython during > runtime? When writing code sometimes I think it would be very > convenient (especially to the programmer using Biopython) to abstract > some configuration parameters away from the code. Things like the > location of binaries, hosts, user names (and maybe passwords) of > databases, timeout parameters, etc. These could be stored on a > configuration file (or registry entry, or whatever) thus saving users > to have to deal in the code with supplying these... > Just an idea... This sounds like a fairly general thing (i.e. for all of python) rather than being Biopython specific. For example, I find a lot of my scripts have a few if statements at the top setting locations of files and executables based on which user/machine I'm running on (I use both Windows and a couple of Linux boxes with different user names). e.g. Where are the blast executables, the blast databases, and my genome collection, ... Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 8 11:30:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 8 Oct 2007 07:30:03 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710081130.l98BU36u021016@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:30 EST ------- Recap, most of the issues were resolved by switching Bio.Fasta from Martel to pure python. Additionally: test_Fasta - 'fixed' by deprecating the Mindy indexing functions test_KEGG - fixed by switching from Martel to pure python test_format_registry - 'fixed' by removing FormatIO test_geo - fixed by switching from Martel to pure python test_GenBankFormat - this entire test is for the little-used Martel GenBank expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Oct 9 04:34:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 9 Oct 2007 00:34:28 -0400 Subject: [Biopython-dev] Output of Biopython tests Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> Hi everybody, With the help of several Biopython developers, especially Peter, the problems with Martel and the new mxTextTools release have now been solved (in the sense that all unit tests now succeed). So we're a lot closer to a new Biopython release. Thanks everybody! When I was running the Biopython tests, one thing bothered me though. All Biopython tests now have a corresponding output file that contains the output the test should generate if it runs correctly. For some tests, this makes perfect sense, particularly if the output is large. For others, on the other hand, having the test output explicitly in a file doesn't actually add much information. For example, the output for test_psw is test_psw test_AlignmentColumn_assertions (test_psw.TestPSW) ... ok test_AlignmentColumn_full (test_psw.TestPSW) ... ok test_AlignmentColumn_kinds (test_psw.TestPSW) ... ok test_AlignmentColumn_repr (test_psw.TestPSW) ... ok test_Alignment_assertions (test_psw.TestPSW) ... ok test_Alignment_normal (test_psw.TestPSW) ... ok test_ColumnUnit (test_psw.TestPSW) ... ok Doctest: Bio.Wise.psw.parse_line ... ok ---------------------------------------------------------------------- Ran 8 tests in 0.002s OK For comparison, this is the test output if test_psw.py fails: test_AlignmentColumn_assertions (__main__.TestPSW) ... ok test_AlignmentColumn_full (__main__.TestPSW) ... ok test_AlignmentColumn_kinds (__main__.TestPSW) ... FAIL test_AlignmentColumn_repr (__main__.TestPSW) ... ok test_Alignment_assertions (__main__.TestPSW) ... ok test_Alignment_normal (__main__.TestPSW) ... ok test_ColumnUnit (__main__.TestPSW) ... ok Doctest: Bio.Wise.psw.parse_line ... ok ====================================================================== FAIL: test_AlignmentColumn_kinds (__main__.TestPSW) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_psw.py", line 47, in test_AlignmentColumn_kinds self.assertEqual(ac.kind, "some_funny_output_I_made_up_instead_of_INSERT") AssertionError: 'INSERT' != 'some_funny_output_I_made_up_instead_of_INSERT' ---------------------------------------------------------------------- Ran 8 tests in 0.000s The point is that for this test, having the output explicitly is not needed in order to identify the problem. Now, for some tests having the output explicitly actually causes a problem. I'm thinking about those unit tests that only run if some particular software is installed on the system (for example, SQL). In those cases, we need to distinguish failure due to missing software from a true failure (the former may not bother the user much if he's not interested in that particular part of Biopython). If a test cannot be run because of missing prerequisites, currently a unit test generates an ImportError, which is then caught inside run_tests. Hence, we get the following output when running the Biopython tests: test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests -- enable tests in Tests/test_BioSQL.py ok When you look inside test_BioSQL.py, you'll see that the actual error is not an ImportError. In addition, if a true ImportError occurs during the test, the test will inadvertently be treated as skipped. My solution would be to skip tests inside test_BioSQL if the prerequisites are not met. However, in that case the test output no longer agrees with the expected test output, generating a failure message. I'd therefore like to suggest the following: 1) Keep the test output, but let each test_* script (instead of run_tests.py) be responsible of comparing the test output with the expected output. 2) If the expected output is trivial, simply use the assert statements to verify the test output instead of storing them in a file and reading them from there. Any objections? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mhobbs_of_lawson at bigpond.com Tue Oct 9 02:18:39 2007 From: mhobbs_of_lawson at bigpond.com (mhobbs_of_lawson) Date: Tue, 9 Oct 2007 12:18:39 +1000 Subject: [Biopython-dev] translate Message-ID: <5496247.1191896319102.JavaMail.root@web06sl> Hi, Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet. Thanks, Matthew >>> from Bio import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio import Translate >>> s = "NNNTCAAAAAGGTGCATCTAGATG" >>> dna = Seq.Seq(s, IUPAC.ambiguous_dna) >>> trans = Translate.ambiguous_dna_by_id[1] >>> print trans.translate(dna) Traceback (most recent call last): File "", line 1, in File "/cygdrive/c/Python24/Lib/site-packages/Bio/Translate.py", line 20, in translate append(get(s[i:i+3], stop_symbol)) File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 544, in get return self.__getitem__(codon) File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 577, in __getitem__ raise TranslationError, codon # does not code Bio.Data.CodonTable.TranslationError: NNN From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 11:54:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 09 Oct 2007 12:54:29 +0100 Subject: [Biopython-dev] translate In-Reply-To: <5496247.1191896319102.JavaMail.root@web06sl> References: <5496247.1191896319102.JavaMail.root@web06sl> Message-ID: <470B6BF5.607@maubp.freeserve.co.uk> mhobbs_of_lawson wrote: > Hi, > > Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet. A very reasonable request. I assume you expect just an X for an NNN codon? I have the general impression that some of Biopython's handling of ambiguous sequences isn't all wonderful... something I have started to tackle in bug 2356: http://bugzilla.open-bio.org/show_bug.cgi?id=2366 Obviously sequence manipulation is a core bit of functionality - and I would like at least one other person to comment on that code before I risk committing it ;) Translation of ambiguous codons would be next on my hit list... as right now it doesn't seem to do what I would expect at all. In the short term, manually adding additional mappings to the forward table (a python dictionary) would probably "fix" your specific issue. While we are on this topic, we use "*" for stop codons and "X" for an ambiguous amino acid - but is anyone aware of a character convention for something that might be either a stop codon or an amino acid? (other than just using "X" for this too)? Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 11:44:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 09 Oct 2007 12:44:01 +0100 Subject: [Biopython-dev] Output of Biopython tests In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> Message-ID: <470B6981.3020707@maubp.freeserve.co.uk> Michiel De Hoon wrote: > When I was running the Biopython tests, one thing bothered me though. > All Biopython tests now have a corresponding output file that > contains the output the test should generate if it runs correctly. > For some tests, this makes perfect sense, particularly if the output > is large. For others, on the other hand, having the test output > explicitly in a file doesn't actually add much information. Is this actually a problem? It gives us a simple unified test framework where developers can use whatever fancy test frameworks they want to. Personally I have tried to write simple scripts with meaningful output (plus often additional assertions). I think that because these are very simple, they can double as examples/documentation for the curious. My personal view is that some of the "fancy frameworks" used in some test cases are very intimidating to a beginner (and act as a barrier to taking the code and modifying it for their own use). > The point is that for this test, having the output explicitly is not > needed in order to identify the problem. True. I would have written that particular test to give some meaningful output; I find it makes it easier to start debugging why a test fails. > Now, for some tests having the output explicitly actually causes a > problem. I'm thinking about those unit tests that only run if some > particular software is installed on the system (for example, SQL). In > those cases, we need to distinguish failure due to missing software > from a true failure (the former may not bother the user much if he's > not interested in that particular part of Biopython). If a test > cannot be run because of missing prerequisites, currently a unit test > generates an ImportError, which is then caught inside run_tests. > ... > When you look inside test_BioSQL.py, you'll see that the actual error > is not an ImportError. In addition, if a true ImportError occurs > during the test, the test will inadvertently be treated as skipped. Perhaps we should introduce a MissingExternalDependency error instead, used for this specific case, and catch that in run_tests.py, while treating ImportError as a real error. As you say, if we have done some dramatic restructuring (such as removing a module) there could be some REAL ImportErrors which we might risk ignoring. > I'd therefore like to suggest the following: > 1) Keep the test output, but let each test_* script (instead of > run_tests.py) be responsible of comparing the test output with the > expected output. I'm not keen on that - it means duplication of code (or at least some common functionality to call) and makes writing simple tests that little bit harder. I like the fact that the more verbose test scripts can be run on their own as an example of what the module can do. > 2) If the expected output is trivial, simply use the assert > statements to verify the test output instead of storing them in a > file and reading them from there. By all means, test trivial output with assertions. I already do this within many of my "verbose" tests where I want to keep the console output reasonably short. Peter From tiagoantao at gmail.com Tue Oct 9 14:27:18 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Oct 2007 15:27:18 +0100 Subject: [Biopython-dev] Configuration files In-Reply-To: <470A0C00.50505@maubp.freeserve.co.uk> References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com> <470A0C00.50505@maubp.freeserve.co.uk> Message-ID: <6d941f120710090727m787c08abn13665c662727446c@mail.gmail.com> Would it be interesting to have something like config = Bio.Config.getConfig() fdist_path = config['PopGen.FDistDir'] Something that: 1. Would allow for a standard configuration mechanism (as opposed to having different styles for each module/author) 2. Would abstract away how the configuration is stored (registry, conf file, ...) If there was an agreement on doing this (or something along these lines), I would volunteer the time to do it. On 10/8/07, Peter wrote: > Tiago Ant?o wrote: > > Hi, > > > > Is there any (Biopython standard) way to configure Biopython during > > runtime? When writing code sometimes I think it would be very > > convenient (especially to the programmer using Biopython) to abstract > > some configuration parameters away from the code. Things like the > > location of binaries, hosts, user names (and maybe passwords) of > > databases, timeout parameters, etc. These could be stored on a > > configuration file (or registry entry, or whatever) thus saving users > > to have to deal in the code with supplying these... > > Just an idea... > > This sounds like a fairly general thing (i.e. for all of python) rather > than being Biopython specific. > > For example, I find a lot of my scripts have a few if statements at the > top setting locations of files and executables based on which > user/machine I'm running on (I use both Windows and a couple of Linux > boxes with different user names). > > e.g. Where are the blast executables, the blast databases, and my genome > collection, ... > > Peter > > -- http://www.tiago.org/ps From mhobbs_of_lawson at bigpond.com Tue Oct 9 23:07:43 2007 From: mhobbs_of_lawson at bigpond.com (Matthew Hobbs) Date: Wed, 10 Oct 2007 09:07:43 +1000 Subject: [Biopython-dev] translate In-Reply-To: <470B6BF5.607@maubp.freeserve.co.uk> References: <5496247.1191896319102.JavaMail.root@web06sl> <470B6BF5.607@maubp.freeserve.co.uk> Message-ID: <470C09BF.8050906@bigpond.com> Thanks Peter for your reply. Peter wrote: > mhobbs_of_lawson wrote: >> Please can someone tell me what is wrong here. I simply want to be >> able to translate ambiguous DNA which includes an 'NNN' triplet. > > A very reasonable request. I assume you expect just an X for an NNN codon? yep > In the short term, manually adding additional mappings to the forward > table (a python dictionary) would probably "fix" your specific issue. OK - so this works: from Bio import Seq from Bio.Alphabet import IUPAC from Bio import Translate s = "NNNTCAAAAAGGTGCATCTAGATG" dna = Seq.Seq(s, IUPAC.ambiguous_dna) trans = Translate.ambiguous_dna_by_id[1] trans.table.forward_table.forward_table['NNN'] = 'X' print trans.translate(dna) > While we are on this topic, we use "*" for stop codons and "X" for an > ambiguous amino acid - but is anyone aware of a character convention for > something that might be either a stop codon or an amino acid? (other > than just using "X" for this too)? No I don't know Thanks, Matthew From mdehoon at c2b2.columbia.edu Thu Oct 11 10:31:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:31:59 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> > Perhaps we should introduce a MissingExternalDependency error instead, > used for this specific case, and catch that in run_tests.py, while > treating ImportError as a real error. OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, and modified BioSQL, Bio.GFF, and some test scripts accordingly. When MissingExternalDependencyError occurs in a test, a warning is printed but it is not counted as a failure. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Oct 11 10:44:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:44:56 -0400 Subject: [Biopython-dev] function enumerate in Bio/GFF/GenericTools.py; Bio/DocSQL.py Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B637@mail2.exch.c2b2.columbia.edu> Do we still need the function "enumerate" in Bio/GFF/GenericTools.py and Bio/DocSQL.py? AFAICT, this function does exactly the same as the Python built-in enumerate function. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Oct 11 10:31:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 06:31:59 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> > Perhaps we should introduce a MissingExternalDependency error instead, > used for this specific case, and catch that in run_tests.py, while > treating ImportError as a real error. OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, and modified BioSQL, Bio.GFF, and some test scripts accordingly. When MissingExternalDependencyError occurs in a test, a warning is printed but it is not counted as a failure. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2910 bytes Desc: not available URL: From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 20:44:46 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Oct 2007 21:44:46 +0100 Subject: [Biopython-dev] Revised tutorial Message-ID: <470E8B3E.6080709@maubp.freeserve.co.uk> In anticipation of the next release, I've done some more work on the tutorial today -- in particular the section on the Seq object which I have turned into a new chapter. If anyone has the time to go over this soon that would be great. I'll be away tomorrow (Friday) but will probably have time to make any revisions needed at the weekend. Its here in CVS: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython This is a LaTeX file which gets turned into the PDF and HTML versions of the tutorial using pdflatex and hevea. If you want to proof read but don't know anything about LaTeX then I can probably email you the PDF version for comment (half a megabyte). Peter From sbassi at gmail.com Thu Oct 11 22:48:39 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 11 Oct 2007 19:48:39 -0300 Subject: [Biopython-dev] Revised tutorial In-Reply-To: <470E8B3E.6080709@maubp.freeserve.co.uk> References: <470E8B3E.6080709@maubp.freeserve.co.uk> Message-ID: Hello, I can't resolve all the dependencies to install hevea so I can't generate the dvi from the tex file. Could you please send me by email the final PDF? Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Fri Oct 12 01:53:19 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 21:53:19 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> <470E3E7E.1000301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu> Peter wrote: > Michiel De Hoon wrote: > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > > MissingExternalDependencyError occurs in a test, a warning is printed but it > > is not counted as a failure. > > I might have defined the exception within the test framework rather than > Bio/__init__.py, but now that it's there we can start to use in things > like modules that wrap external tools. That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using this exception (outside of the testing framework). > I've updated Tests/requires_internet.py and Test/requires_wise.py to > match (I don't have wise on my machine which is why I noticed it still > threw an ImportError). Thanks! I missed those. > Is there anything I can do to help get things ready for the release of > Biopython 1.44? At some point, somebody will need to go through the documentation to check if everything documented there still works with the Biopython in CVS, and to remove sections in the documentation describing deprecated code. But it's probably better to wait until after we decide what to do with test_GenBankFormat. > If you do have time to give the patch on bug 2366 a check, I think it > would be worth including before the next release. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2366 No time to check it. But I'd be happy to rely on your judgement and include it. --Michiel. From mdehoon at c2b2.columbia.edu Fri Oct 12 01:53:19 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 11 Oct 2007 21:53:19 -0400 Subject: [Biopython-dev] Output of Biopython tests References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> <470E3E7E.1000301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu> Peter wrote: > Michiel De Hoon wrote: > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > > MissingExternalDependencyError occurs in a test, a warning is printed but it > > is not counted as a failure. > > I might have defined the exception within the test framework rather than > Bio/__init__.py, but now that it's there we can start to use in things > like modules that wrap external tools. That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using this exception (outside of the testing framework). > I've updated Tests/requires_internet.py and Test/requires_wise.py to > match (I don't have wise on my machine which is why I noticed it still > threw an ImportError). Thanks! I missed those. > Is there anything I can do to help get things ready for the release of > Biopython 1.44? At some point, somebody will need to go through the documentation to check if everything documented there still works with the Biopython in CVS, and to remove sections in the documentation describing deprecated code. But it's probably better to wait until after we decide what to do with test_GenBankFormat. > If you do have time to give the patch on bug 2366 a check, I think it > would be worth including before the next release. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2366 No time to check it. But I'd be happy to rely on your judgement and include it. --Michiel. From bugzilla-daemon at portal.open-bio.org Fri Oct 12 02:32:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Oct 2007 22:32:05 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710120232.l9C2W5e9022504@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2007-10-11 22:32 EST ------- > test_GenBankFormat - this entire test is for the little-used Martel GenBank > expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0 If it's little-used, should we include it for the next release or can it be removed? If we remove the test, should we then also remove the corresponding module? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 20:37:52 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Oct 2007 21:37:52 +0100 Subject: [Biopython-dev] Output of Biopython tests In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu> Message-ID: <470E89A0.1010502@maubp.freeserve.co.uk> Michiel De Hoon wrote: >> Perhaps we should introduce a MissingExternalDependency error instead, >> used for this specific case, and catch that in run_tests.py, while >> treating ImportError as a real error. > > OK. I added a MissingExternalDependencyError exception to Bio/__init__.py, > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When > MissingExternalDependencyError occurs in a test, a warning is printed but it > is not counted as a failure. I might have defined the exception within the test framework rather than Bio/__init__.py, but not that its there we can start to use in things like modules that wrap external tools. I've updated Tests/requires_internet.py and Test/requires_wise.py to match (I don't have wise on my machine which is why I noticed it still threw an ImportError). This means run_tests.py now runs without errors using CVS on my 64 bit Linux machine (bar the mxTextTools 3.0 issue with test_GenBankFormat.py (bug 2361). Is there anything I can do to help get things ready for the release of Biopython 1.44? If you do have time to give the patch on bug 2366 a check, I think it would be worth including before the next release. http://bugzilla.open-bio.org/show_bug.cgi?id=2366 Peter From fennan at gmail.com Mon Oct 15 09:48:45 2007 From: fennan at gmail.com (Fernando) Date: Mon, 15 Oct 2007 11:48:45 +0200 Subject: [Biopython-dev] Database into variables Message-ID: <7b13e61d0710150248v72a550d6h38e1467edf5073eb@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From fennan at gmail.com Mon Oct 15 10:28:56 2007 From: fennan at gmail.com (Fernando) Date: Mon, 15 Oct 2007 12:28:56 +0200 Subject: [Biopython-dev] Precompute database information Message-ID: <7b13e61d0710150328l354bfb5eu1b76ed05024a65c4@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From bugzilla-daemon at portal.open-bio.org Mon Oct 15 11:11:26 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Oct 2007 07:11:26 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710151111.l9FBBQOE012625@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tiagoantao at gmail.com ------- Comment #3 from tiagoantao at gmail.com 2007-10-15 07:11 EST ------- I had a look at the test code and tried to find which test case is changing the ambiguous_dna dict. I used this little script (putting it here as it might be useful for detecting these types of problems): for i in test_*py; do python run_tests.py $i; done It turns out that it is text_Nexus.py. A further inspection to the code seems to reveal that is not the test case that pollutes the dictionary but the Nexus modules itself. Maybe it makes sense to raise a bug on the Nexus module... Any comments on these findings? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 15 14:16:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Oct 2007 10:16:00 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710151416.l9FEG01A023797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-15 10:16 EST ------- Thanks for that Tiago, I guess we should file a bug on Bio.Nexus on the alphabet issue; It may be that it should create a copy or subclass of the ambiguous DNA alphabet in order to include "?" (I imagine that Nexus uses this rather than "N"), and see if it is using the Gapped() alphabet system or not. Did you have any comments on this patch for (reverse) complements? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Tue Oct 16 00:08:13 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Mon, 15 Oct 2007 19:08:13 -0500 Subject: [Biopython-dev] Biopython status Message-ID: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Hi all, I've just started using Biopython and I am wondering about the status of the group, since I've heard rumors that its dying. So far I have found the library very useful, if not at times frustrating, though I will admit I am fairly new to developing python as well. I have been hesitant to make changes to existing code, however I have found that in a few cases it has been by far the best way to accomplish what I need, and have only done so in cases where it seems to be the *right* thing to do. With that in mind, I have a few questions I was hoping you all could answer. First, how might I put these changes up for review in order to contribute back to the code base? The main changes have been to the AlignAce parser, since as it was it just ignored information contained in the alignace file regarding the motif instances (namely which input sequence they came from, where they started in the sequence, and what strand they were on). I have also needed to create a modified FASTA parser so that I can read things like quality score files. I would be happy to submit the changes to the group or an individual for inspection, but I would like to avoid having to maintain my own separate version of Biopython if possible. I am also wondering how it would be received if I did something like add a to_fasta method to SeqRecord instead of having to go through writing it to a file using a SeqIO when all I want is the string. Finally, are there plans to move to a subversion repository at any point? Thanks! Jared Flatow From sbassi at gmail.com Tue Oct 16 05:09:16 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 16 Oct 2007 02:09:16 -0300 Subject: [Biopython-dev] Biopython status In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: On 10/15/07, Jared Flatow wrote: > I've just started using Biopython and I am wondering about the status > of the group, since I've heard rumors that its dying. So far I have You could subscribe to the rss feed of the CVS and you will see a lot of activity. The developers list and the bug tracking program (bugzilla) is also pretty busy, that doesn't look as a dying group to me :) -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Tue Oct 16 05:37:14 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 01:37:14 -0400 Subject: [Biopython-dev] Biopython status References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> Hi Jared, > I've just started using Biopython and I am wondering about the status > of the group, since I've heard rumors that its dying. >From looking at the activity on the Biopython mailing lists in recent months, it doesn't seem to be dying :-). > So far I have found the library very useful, if not at times frustrating, > though I will admit I am fairly new to developing python as well. One thing to keep in mind is that Biopython started about eight years ago, and some approaches that seemed to be a good idea at that time may not seem to be so now. Nevertheless, I feel that Biopython is moving in the right direction in terms of ease-of-use. > First, how might I put these changes up for review in order > to contribute back to the code base? The main changes have been to > the AlignAce parser, since as it was it just ignored information > contained in the alignace file regarding the motif instances (namely > which input sequence they came from, where they started in the > sequence, and what strand they were on). In this case, it is a good idea to contact the current maintainer of Bio.AlignAce, either via the mailing list or directly. From the Biopython CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce, so it would be a good idea to discuss with him. > I have also needed to create a modified FASTA parser so that I > can read things like quality score files. At some point, Biopython had several (two or three?) Fasta parsers, two Fasta formats, etc. This is a situation we should definitely avoid. So if your modifications fit in well with the existing Fasta parser in Bio.SeqIO, it may very well be accepted into Biopython. Otherwise, it's better to leave it out. This is just my opinion though. > I am also wondering how it would be received if I did something like > add a to_fasta method to SeqRecord instead of having to go through > writing it to a file using a SeqIO when all I want is the string. This sounds like feature creep to me, so I would be against it. It's easy to add code to Biopython, it's much harder to remove stuff. Code bloat is a real problem in Biopython. > Finally, are there plans to move to a subversion repository at any > point? There were some plans at some point, but I don't know the current status. Best, --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Jared Flatow Sent: Mon 10/15/2007 8:08 PM To: biopython-dev at lists.open-bio.org Subject: [Biopython-dev] Biopython status Hi all, I've just started using Biopython and I am wondering about the status of the group, since I've heard rumors that its dying. So far I have found the library very useful, if not at times frustrating, though I will admit I am fairly new to developing python as well. I have been hesitant to make changes to existing code, however I have found that in a few cases it has been by far the best way to accomplish what I need, and have only done so in cases where it seems to be the *right* thing to do. With that in mind, I have a few questions I was hoping you all could answer. First, how might I put these changes up for review in order to contribute back to the code base? The main changes have been to the AlignAce parser, since as it was it just ignored information contained in the alignace file regarding the motif instances (namely which input sequence they came from, where they started in the sequence, and what strand they were on). I have also needed to create a modified FASTA parser so that I can read things like quality score files. I would be happy to submit the changes to the group or an individual for inspection, but I would like to avoid having to maintain my own separate version of Biopython if possible. I am also wondering how it would be received if I did something like add a to_fasta method to SeqRecord instead of having to go through writing it to a file using a SeqIO when all I want is the string. Finally, are there plans to move to a subversion repository at any point? Thanks! Jared Flatow _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 08:16:01 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 09:16:01 +0100 Subject: [Biopython-dev] Biopython status In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> Message-ID: <47147341.4020708@maubp.freeserve.co.uk> Jared Flatow wrote: > I have also needed to create a modified FASTA parser so that I can > read things like quality score files. Could you be a little more specific - what exactly do you mean by a quality score files (links and/or examples). It may be that this warrants setting up a new file format in Bio.SeqIO > I would be happy to submit the changes to the group or an individual > for inspection, but I would like to avoid having to maintain my own > separate version of Biopython if possible. As has already been said - please file some (enhancement) bugs and attach your patches, or raise specific issues for discussion on this mailing list. Depending on the nature of your changes, you might be able to achieve some of them by subclassing Biopython's objects - rather than literally maintaining your own branch of the project. > I am also wondering how it would be received if I did something like > add a to_fasta method to SeqRecord instead of having to go through > writing it to a file using a SeqIO when all I want is the string. Out of interest, why do you want to create a FASTA record as a string? Did you know you can write to a string using any Bio.SeqIO supported file format using StringIO? Perhaps we should spell this out more explicitly in the documentation, but a motivating example would help. I would suggest rather than adding a to_fasta method to the SeqRecord, simply write your own "seqrecord_to_string" function (or create a subclass of SeqRecord with this method). > Finally, are there plans to move to a subversion repository at any > point? It was raised a while ago, and our cunning plan was to let BioPerl try the move first. Once that has been proven, it should be fairly easy for the OBF guys to also move us over. I should email them to see how things stand... Peter From bartek at rezolwenta.eu.org Tue Oct 16 09:11:01 2007 From: bartek at rezolwenta.eu.org (bartek wilczynski) Date: Tue, 16 Oct 2007 11:11:01 +0200 Subject: [Biopython-dev] Biopython status In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu> Message-ID: <1192525861.4714802535dae@imp.rezolwenta.eu.org> Michiel De Hoon wrote: > > First, how might I put these changes up for review in order > > to contribute back to the code base? The main changes have been to > > the AlignAce parser, since as it was it just ignored information > > contained in the alignace file regarding the motif instances (namely > > which input sequence they came from, where they started in the > > sequence, and what strand they were on). > > In this case, it is a good idea to contact the current maintainer of > Bio.AlignAce, either via the mailing list or directly. From the Biopython > CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce, > so it would be a good idea to discuss with him. I'm not dying either ;). I'm the author of the Bio.AlignAce module and if you have any new code to contribute to it, I'll be glad to help you. The best way to do it would be to submit an enhancement bug report in bugzilla. If the changes are smaller, you can just send them (as a diff) to the list and I'll try to fit them to the current cvs version of Bio.AlignAce Bartek Wilczynski From bugzilla-daemon at portal.open-bio.org Tue Oct 16 09:55:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 05:55:37 -0400 Subject: [Biopython-dev] [Bug 2380] New: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2380 Summary: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This issue was raised in Bug 2366 where a unit test was found to be "polluting" ambiguous_dna_values, later identified as Bio.Nexus via test_Nexus.py Need to see if Bio.Nexus should be making a copy of this dict, or perhaps defining a subclass of the alphabet (using the Gapped() class maybe). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 09:56:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 05:56:37 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710160956.l9G9ub18007735@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 05:56 EST ------- Fix committed (after Michiel's OK on the mailing list), marking as fixed. Checking in Tests/test_seq.py; /home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py new revision: 1.6; previous revision: 1.5 done Checking in Tests/output/test_seq; /home/repository/biopython/biopython/Tests/output/test_seq,v <-- test_seq new revision: 1.6; previous revision: 1.5 done Checking in Bio/Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.17; previous revision: 1.16 done I've filed Bug 2380 for the Nexus issue: Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:11:09 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:11:09 -0400 Subject: [Biopython-dev] [Bug 2381] New: translate and transcibe method for the the Seq object (in Bio.Seq) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2381 Summary: translate and transcibe method for the the Seq object (in Bio.Seq) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Biopython has translation and transcription modules (Bio/Translate.py and Bio/Transcibe.py) but I find them a little bit complicated to use. There are module level functions translate, transcribe, and back_transcribe in Bio/Seq.py which take either a string, a Seq object or a MutableSeq object. I would like to add similar methods to the Seq object (also defined Bio/Seq.py) to make this functionality more accessable from a Seq object. NOTE: Python strings have a translate method of their own which is rather different. Having the Seq translate method doing a biological translation makes sense. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:13:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:13:35 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161013.l9GADZtJ008751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|translate and transcibe |translate and transcibe |method for the the Seq |methods for the Seq object |object (in Bio.Seq) |(in Bio.Seq) ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:13 EST ------- fixed typo in the bug summary -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:26:44 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:26:44 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161026.l9GAQixw009268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #2 from dalloliogm at gmail.com 2007-10-16 06:26 EST ------- I find difficult to translate a sequence in the 6 reading frames with a single command. Actually I use something like this: for i in xrange(2): translate(Seq[i:]) which is not very nice. It would be nice to add a parameter to the translate function like in the emboss application transeq (http://emboss.sourceforge.net/apps/cvs/emboss/apps/transeq.html), something like this: >>> a = Seq('CAGCTAGCT') >>> a.translate() [(translation of a in the frame 0)] >>> a.translate(1) [(translation of a in the frame 1)] >>> a.translate(F) [(translation of a in the 3 forward frames)] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:46:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 06:46:47 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710161046.l9GAklI6010391@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:46 EST ------- Doing a three/six frame translation is however fairly common, and perhaps warrents an "official" implementation in Bio.SeqUtils My current inclination is try and keep the Bio.Seq translation function as simple as possible. There are lots of possible options to worry about... catering to them all could make the translate method rather daunting. Perhaps things like the frame (or even the starting nucleotide) could be done in Bio.Translate only. Another "special case" example I personally would like is an option to check the first codon is a valid start codon for the specified codon table, and to translate it as methionine (M). Then there is the question of if Bio.Translate's "translate_to_stop" functionality should be exposed in a Seq method. Note there is yet another (!) translation function Bio.SeqUtils.translate() which is frame aware [personally I would mark a lot of this module as deprecated]. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Tue Oct 16 16:02:19 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 11:02:19 -0500 Subject: [Biopython-dev] Biopython status In-Reply-To: <47147341.4020708@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> Message-ID: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Please forgive me for ever doubting your health, it seems the group is very much alive! On Oct 16, 2007, at 3:16 AM, Peter wrote: > Jared Flatow wrote: >> I have also needed to create a modified FASTA parser so that I can >> read things like quality score files. > > Could you be a little more specific - what exactly do you mean by a > quality score files (links and/or examples). It may be that this > warrants setting up a new file format in Bio.SeqIO That is what I did. The quality score files I meant are simply FASTA- like records that indicate the quality of each base pair read from a sequencing machine, on a scale of something like 1 to 64. The values are tab separated and correspond to 'reads' in another FASTA file that contain the actual sequences read. This is the way the 454 GSFlex machines output their sequencing reads, so for every set of reads there will be a pair of 454Reads.fna, 454Reads.qual files. The only difference between a parser that processes these qual files and one that processes the sequence files is that it shouldn't get rid of spaces, and the newlines should not to be stripped but converted into spaces (when 454 writes a newline of scores they omit the space). Essentially I have made a duplicate of FastaIOs iterator, named it something else, made these two small changes and put an entry for it in the SeqIO file. 16,17c16,17 < def GSQualIterator(handle, alphabet = single_letter_alphabet, title2ids = None) : < """Generator function to iterate over GSFlex quality records (as SeqRecord objects). --- > def FastaIterator(handle, alphabet = single_letter_alphabet, title2ids = None) : > """Generator function to iterate over Fasta records (as SeqRecord objects). 54c54 < lines.append(line.rstrip()) # .replace(" ","")) leave off the replacing internal spaces so we can process qscore files (jf) --- > lines.append(line.rstrip().replace(" ","")) 58c58 < yield SeqRecord(Seq(" ".join(lines), alphabet), --- > yield SeqRecord(Seq("".join(lines), alphabet), 63a64,199 As you can see a parser like this might be useful for other FASTA- like formats as well and is in no way specific to the GS quality files (its just a space preserving parser). If it were to be implemented in Biopython you might call it something else. > >> I would be happy to submit the changes to the group or an individual >> for inspection, but I would like to avoid having to maintain my own >> separate version of Biopython if possible. > > As has already been said - please file some (enhancement) bugs and > attach your patches, or raise specific issues for discussion on this > mailing list. > > Depending on the nature of your changes, you might be able to achieve > some of them by subclassing Biopython's objects - rather than > literally > maintaining your own branch of the project. > >> I am also wondering how it would be received if I did something like >> add a to_fasta method to SeqRecord instead of having to go >> through writing it to a file using a SeqIO when all I want is the >> string. > > Out of interest, why do you want to create a FASTA record as a string? I am serving the fasta from a database of sequences dynamically via a web server. > > Did you know you can write to a string using any Bio.SeqIO supported > file format using StringIO? Perhaps we should spell this out more > explicitly in the documentation, but a motivating example would help. This is what I do now, but it seems like a hack to me to go this route. To always have to write to a file feels strange, but I see that it would be messy to go OO since there are so many formats. However, giving preference to fasta over other formats by making it innate doesn't seem like such a terrible idea. I do have mixed feelings about 'bloating' the code which is why I asked, and you have convinced me that this is not quite appropriate given existing convention. However the idea would be to put the to_fasta or to_format method inside the SeqRecord, then to call it from the IO when needed to actually write to a file, but call it directly when all that is wanted is a string... > > I would suggest rather than adding a to_fasta method to the > SeqRecord, simply write your own "seqrecord_to_string" function (or > create a subclass of SeqRecord with this method). > I'll leave it alone for now until I can come up with a real proposal =) >> Finally, are there plans to move to a subversion repository at any >> point? > > It was raised a while ago, and our cunning plan was to let BioPerl try > the move first. Once that has been proven, it should be fairly > easy for > the OBF guys to also move us over. I should email them to see how > things stand... BioPerl seems to be the guinea pigs for everything. Leading the way on this might put a stop to those nasty rumors about Biopython. Best Regards, Jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:47:48 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:47:48 +0100 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EB34.8000207@maubp.freeserve.co.uk> Jared wrote: > Leading the way on this ... [CVS to SVN] I would say one reason why we aren't charging ahead with a move from CVS to subversion is only a few posters on this mailing list actively WANT to move to subversion, and no-one has really championed the move (yet). I'm sure if we as a group wanted to this, then the OBF would be happy to assist. After all, moving us rather than BioPerl as the first CVS/SVN migration should be easier as we have a smaller code base. Peter From jflatow at northwestern.edu Tue Oct 16 18:46:53 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 13:46:53 -0500 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <4714EBC7.1040504@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> Message-ID: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> Hi Peter, >>>> I have also needed to create a modified FASTA parser so that I >>>> can read things like quality score files. >>> >>> Could you be a little more specific - what exactly do you mean by a >>> quality score files (links and/or examples). It may be that this >>> warrants setting up a new file format in Bio.SeqIO >> That is what I did. The quality score files I meant are simply >> FASTA- like records that indicate the quality of each base pair >> read from a sequencing machine, on a scale of something like 1 to >> 64. The values are tab separated and correspond to 'reads' in >> another FASTA file that contain the actual sequences read. This >> is the way the 454 GSFlex machines output their sequencing reads, >> so for every set of reads there will be a pair of 454Reads.fna, >> 454Reads.qual files. The only difference between a parser that >> processes these qual files and one that processes the sequence >> files is that it shouldn't get rid of spaces, and the newlines >> should not to be stripped but converted into spaces (when 454 >> writes a newline of scores they omit the space). Essentially I >> have made a duplicate of FastaIOs iterator, named it something >> else, made these two small changes and put an entry for it in the >> SeqIO file. > > Patches and emails don't do well together. Could you file an > enhancement bug, and then upload your code as an attachment? If > you have a few examples of matched pairs of FASTA files and quality > files which you can contribute that would be very helpful too. > Yes I'll get on that. > It looks like you are trying to construct a "sequence" of numerical > values (rather than a sequence of letters like nucleotides/amino > acids). As written I don't think it would work for element access/ > splicing etc. However, with some extra work I suppose we could > stretch the Seq object in this way - and define a new > "IntegerAlphabet". > > But on balance, I don't think "lists of quality values" should be > treated in the same way as sequences (and thus it doesn't seem to > belong in Bio.SeqIO). > I agree. > Alternatively you could regard the quality scores as sequence meta- > data or annotation. One idea would be to generate SeqRecord > objects containing dummy sequences of the correct length made up of > the ambiguous character "N", with the associated quality scores > held as a list of integers in the SeqRecord's annotation > dictionary. Then it would fit into the Bio.SeqIO framework [I was > thinking of something similar for PTT files, NCBI Protein tables, > where again we have annotation but not the actual sequence]. I agree, and this way is most flexible. > > Maybe there should just be a separate parser for GSFlex quality > records which returns iterator giving each record name with a list > of integers. A more elegant scheme would read in the pair of files > together (the FASTA file and the quality file) and generate nicely > annotated SeqRecords with the sequence and the quality. This isn't > really possible with the Bio.SeqIO framework. > Yes, at first I liked this idea best, but it puts some constraints on the way these things are read in. Like if it is to be an iterator, you must have a guarantee that these files contain exactly the same sequences in exactly the same order. This seems like it could potentially be fine for the GSFlex files, but I wonder if there might somewhere down the line be use for quality information about sequences in other cases. If I am not mistaken, some sources use upper/lower case letters now to indicate a bistable degree of confidence in a sequence letter. In any event, this seems like an unnecessary restriction. The way I do it now is I load the reads into a database, then update the database when I read in a quality score file. I think Biopython should have a simple way of implementing something similar which can solve both our metadata problems. In Bio.Fasta there are Parsers which really belong in Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more general Fasta reader, nothing to do with sequences. It can iterate over a FASTA file using the '>' as the record separator, creating Record objects, much like it does now, except without processing them at all or assuming they are sequences. >Record.header Record.data Now Bio.SeqIO.FastaIO can use Bio.Fasta to iterate over the Record objects in a file and transform them into SeqRecord object. If you like, you can provide it with a function header_todict, which takes a string (in this case Record.header) and returns a dictionary, which gets unpacked and passed to the SeqRecord initializer. Basically the Bio.SeqIO.FastaIO returns a generator that looks something like this: (SeqRecord(seq=cleanup(record.data), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) I can also use the Bio.Fasta.parse function now to parse my quality files and add them as metadata: # I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, my_header_todict)) # Then I iterate over the sequences in the qual file and look them up in the seq_dict using the same header parsing function # I passed to create my initial SeqRecords, setting the quality scores as I find them them for record in Bio.Fasta.parse(qual_file): seq_dict[my_header_todict(record.header)['id']].quality = my_qualitycleanup(record.data) I hope that makes sense. The advantage to doing it this way is that I can reuse my header parsing function for both the sequence and the metadata, and I can do whatever I want with the fasta record data without writing a whole new parser. The SeqIO fasta parsing functions just makes some default assumptions (like the data is a sequence). Let me know what you think. Jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:50:15 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:50:15 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EBC7.1040504@maubp.freeserve.co.uk> Hi Jared, >>> I have also needed to create a modified FASTA parser so that I can >>> read things like quality score files. >> >> Could you be a little more specific - what exactly do you mean by a >> quality score files (links and/or examples). It may be that this >> warrants setting up a new file format in Bio.SeqIO > > That is what I did. The quality score files I meant are simply FASTA- > like records that indicate the quality of each base pair read from a > sequencing machine, on a scale of something like 1 to 64. The values > are tab separated and correspond to 'reads' in another FASTA file > that contain the actual sequences read. This is the way the 454 > GSFlex machines output their sequencing reads, so for every set of > reads there will be a pair of 454Reads.fna, 454Reads.qual files. The > only difference between a parser that processes these qual files and > one that processes the sequence files is that it shouldn't get rid of > spaces, and the newlines should not to be stripped but converted into > spaces (when 454 writes a newline of scores they omit the space). > Essentially I have made a duplicate of FastaIOs iterator, named it > something else, made these two small changes and put an entry for it > in the SeqIO file. Patches and emails don't do well together. Could you file an enhancement bug, and then upload your code as an attachment? If you have a few examples of matched pairs of FASTA files and quality files which you can contribute that would be very helpful too. It looks like you are trying to construct a "sequence" of numerical values (rather than a sequence of letters like nucleotides/amino acids). As written I don't think it would work for element access/splicing etc. However, with some extra work I suppose we could stretch the Seq object in this way - and define a new "IntegerAlphabet". But on balance, I don't think "lists of quality values" should be treated in the same way as sequences (and thus it doesn't seem to belong in Bio.SeqIO). Alternatively you could regard the quality scores as sequence meta-data or annotation. One idea would be to generate SeqRecord objects containing dummy sequences of the correct length made up of the ambiguous character "N", with the associated quality scores held as a list of integers in the SeqRecord's annotation dictionary. Then it would fit into the Bio.SeqIO framework [I was thinking of something similar for PTT files, NCBI Protein tables, where again we have annotation but not the actual sequence]. Maybe there should just be a separate parser for GSFlex quality records which returns iterator giving each record name with a list of integers. A more elegant scheme would read in the pair of files together (the FASTA file and the quality file) and generate nicely annotated SeqRecords with the sequence and the quality. This isn't really possible with the Bio.SeqIO framework. Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 19:33:54 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 20:33:54 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> Message-ID: <47151222.1060502@maubp.freeserve.co.uk> > In Bio.Fasta there are Parsers which really belong in > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more > general Fasta reader, nothing to do with sequences. ... In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was thinking in a few releases time of suggesting its deprecation (but not just yet as for several years it was the best documented and most used parser in Biopython). If we do decided keep Bio.Fasta (or extend it), then perhaps Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta I'm still digressing your ideas to turn Bio.Fasta into a generic parser that copes with sequences, qualities scores, or anything else. Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 16 19:57:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 15:57:35 -0400 Subject: [Biopython-dev] [Bug 2382] New: Generic FASTA parser Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2382 Summary: Generic FASTA parser Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: jflatow at northwestern.edu I would like to be able read in and iterate over records in generic fasta files of the format: >header data >header data ... This iterator should return Bio.Fasta.Record objects with the corresponding header and data fields. I suggest putting this inside the existing Bio.Fasta module and updating Bio.SeqIO.Fasta to use this iterator and transform the records returned into Bio.SeqRecord objects. This should make it easier to add metadata to SeqRecord objects parsed in from FASTA. Consider the following example for illustration. I have data from a genome sequencing machine that outputs pairs of files. One contains the sequence reads which look like this, the other contains estimates of the quality of each base call in the sequence. The sequence file might look something like this (only with hundreds of thousands more entries): >ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run=R_runname CAATATAATTTCTCTTAAAATTATTCCCATGGCCAGGTGTGGTGGCTCACACCTGTAGTC CCGGCACTTTGGGAGGCCAAGGCACACAGGGGATAGG >ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname GGTCTCCAGTGCCCTGTCTCCCCATATTTCTGACACACCTTCTCACAGCCTGGCCCATCT TGCTGGGTCCCTCTTCTCCTCCCTTCCTGCTCCATTTGTCAACACTGCTGGGACATTAGA ATTCAGATCTCCCGGGTCACCG >ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname AAAGTGACTAAAGAATCAATTTACATTAATATTCTATGTGAACAGGCAAAATACTTACAA AGAAGTAGAGAAAATATGAATTCAGTACAGAATTCAGATCTCCCGGGTCACCG The corresponding quality score file might look something like this: >ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run= R_runname 27 28 21 27 27 27 28 22 28 25 3 27 27 27 28 21 33 31 20 6 28 21 26 26 18 28 25 2 26 25 29 23 31 24 27 29 22 27 27 27 29 23 27 31 25 27 27 27 27 27 27 32 26 27 27 27 27 26 27 33 30 12 32 26 27 27 27 33 30 12 33 30 12 26 31 25 33 27 32 28 33 28 27 27 27 27 27 26 33 32 20 7 27 27 27 32 26 >ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname 28 9 26 24 27 27 20 26 18 25 27 32 29 10 26 26 27 18 25 32 30 17 1 25 27 22 32 30 12 27 27 22 26 25 27 23 25 28 21 32 27 27 27 25 26 27 26 25 27 20 26 26 19 28 25 3 25 27 22 27 19 24 24 24 32 29 11 24 34 31 17 23 23 30 23 27 25 30 23 27 33 31 17 27 20 28 21 27 25 26 26 30 24 27 33 31 13 26 27 27 31 25 27 25 23 26 16 26 27 30 27 7 27 27 27 32 27 26 26 32 27 30 26 27 27 27 27 27 27 27 30 27 6 34 31 17 27 21 27 32 28 18 >ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname 29 26 5 25 27 24 27 27 27 30 27 7 26 27 19 25 26 31 26 34 32 16 20 27 26 32 27 32 28 27 25 26 18 27 25 27 26 26 24 27 31 25 27 27 31 26 26 34 32 23 11 26 22 27 32 26 27 26 32 30 11 26 31 24 27 27 25 23 27 27 33 30 19 4 17 26 25 26 31 27 30 26 27 26 22 26 18 24 27 26 32 26 32 28 27 27 25 27 25 24 25 31 28 10 34 31 15 27 21 27 28 21 27 I would like to be able to do the following: # create a function to parse the header line and return a dictionary def parse_gsflex_header(gs_header): parts = gs_record.description.split(' ') assert len(parts) == 5 xy = parts[2].split('=')[1].split('_') return {'letters': gs_record.seq.tostring(), 'name': parts[0], 'length': parts[1].split('=')[1], 'xpos': xy[0], 'ypos': xy[1], 'region': parts[3].split('=')[1], 'run': parts[4].split('=')[1]} # Bio.SeqIO.FastaIO wraps the Bio.Fasta parser, might look something like this class Fasta(): # or however its organized def data_toseq(data): # do some parsing of the data return Seq(...) def parse(file, header_todict): return (SeqRecord(seq=data_toseq(record.data), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) # I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, parse_gsflex_header)) # Then I iterate over the sequences in the qual file and look them up in the seq_dict # setting the quality scores as I find them them for record in Bio.Fasta.parse(qual_file): seq_dict[my_header_todict(record.header)['id']].quality = my_qualitycleanup(record.data) This would work well for parsing all kinds of FASTA-like files and provides a simple mechanism for dealing with them record by record. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 20:03:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 16:03:33 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162003.l9GK3XmF007588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #1 from jflatow at northwestern.edu 2007-10-16 16:03 EST ------- My mistake, the parse_gsflex_header function should look something like this: def parse_gsflex_header(gs_header): parts = re.split('[,|]?\s+', header, maxsplit=1) assert len(parts) == 2 return {'id': parts[0], 'description': header} def my_qualitycleanup(data): return [int x for x in data.replace('\n', '').split(' ')] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Tue Oct 16 20:11:04 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 15:11:04 -0500 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <47151222.1060502@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk> Message-ID: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> On Oct 16, 2007, at 2:33 PM, Peter wrote: > > In Bio.Fasta there are Parsers which really belong in > > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more > > general Fasta reader, nothing to do with sequences. ... > > In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was > thinking in a few releases time of suggesting its deprecation (but > not just yet as for several years it was the best documented and > most used parser in Biopython). > I see, it looks like its meant to be deprecated, I was just saying its actually doing SeqIO functionality. > If we do decided keep Bio.Fasta (or extend it), then perhaps > Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta > > I'm still digressing your ideas to turn Bio.Fasta into a generic > parser that copes with sequences, qualities scores, or anything else. I'm not quite sure you're meaning of digressing, if you mean thinking it over, then great =) Otherwise I hope you'll seriously consider it anyway. Either way, I think I posted a more coherent message on bugzilla with some example data and motivation. jared From jflatow at northwestern.edu Tue Oct 16 20:14:16 2007 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 16 Oct 2007 15:14:16 -0500 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <4714EB34.8000207@maubp.freeserve.co.uk> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> Message-ID: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> > I would say one reason why we aren't charging ahead with a move > from CVS to subversion is only a few posters on this mailing list > actively WANT to move to subversion, and no-one has really > championed the move (yet). Does that mean most developers don't WANT to move, or just that they don't ACTIVELY want to move? jared From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 20:42:18 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 21:42:18 +0100 Subject: [Biopython-dev] 454 GSFlex quality score files In-Reply-To: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk> <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu> Message-ID: <4715222A.2070909@maubp.freeserve.co.uk> Jared Flatow wrote: > On Oct 16, 2007, at 2:33 PM, Peter wrote: > >>> In Bio.Fasta there are Parsers which really belong in >>> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more >>> general Fasta reader, nothing to do with sequences. ... >> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was >> thinking in a few releases time of suggesting its deprecation (but >> not just yet as for several years it was the best documented and >> most used parser in Biopython). > > I see, it looks like its meant to be deprecated, I was just saying > its actually doing SeqIO functionality. Well I'm currently just making a suggestion for the future, deprecating Bio.Fasta, we should still canvas opinion on the main mailing list before taking that action. >> If we do decided keep Bio.Fasta (or extend it), then perhaps >> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta >> >> I'm still digressing your ideas to turn Bio.Fasta into a generic >> parser that copes with sequences, qualities scores, or anything else. That was a typo, but you managed to guess my meaning. I meant to say: I'm still digesting [i.e. thinking about] your ideas to turn Bio.Fasta into a generic parser that copes with sequences, qualities scores, or anything else. > I'm not quite sure you're meaning of digressing, if you mean thinking > it over, then great =) Otherwise I hope you'll seriously consider it > anyway. Either way, I think I posted a more coherent message on > bugzilla with some example data and motivation. I'll take a look, Bug 2382 - Generic FASTA parser http://bugzilla.open-bio.org/show_bug.cgi?id=2382 Peter From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 21:01:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 22:01:29 +0100 Subject: [Biopython-dev] CVS to SVN In-Reply-To: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> Message-ID: <471526A9.1010709@maubp.freeserve.co.uk> Jared Flatow wrote: >> I would say one reason why we aren't charging ahead with a move >> from CVS to subversion is only a few posters on this mailing list >> actively WANT to move to subversion, and no-one has really >> championed the move (yet). > > Does that mean most developers don't WANT to move, or just that they > don't ACTIVELY want to move? Going back over the archives, Chris Lasher was most vocal in supporting the move, and there were a few other positive voices. Speaking for myself, I have no strong desire either way, and I don't think Michiel objected either (except over the timing). Then as now, we are hoping to get the next release out "shortly", so after that would be a good time to make the switch. [I'm assuming we won't loose any revision history or comments, and that things like the web based ViewCVS and its RSS feed will still be available] Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:02:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:02:03 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162102.l9GL23rr010250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:02 EST ------- Are there any other "FASTA like" formats you know of, in addition to traditional sequence data and the 454 GSFlex quality score files? We could do this using the old Scanner/Consumer model (see the pre-Martel parse, CVS revision 1.8 of Bio/Fasta/__init__.py for example). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?rev=1.8&cvsroot=biopython&content-type=text/vnd.viewcvs-markup The scanner would be the same for all formats, and would pass the data with whitespace (spaces, new lines etc) as is. We could then have one consumer for each supported FASTA variant: _Scanner Scans a FASTA-format stream. _RecordConsumer Consumes FASTA data to a Record object. _SequenceConsumer Consumes FASTA data to a Sequence object. _QualityConsumer (new) could build a list of integers for each record? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:26:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:26:29 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162126.l9GLQT8O011239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #3 from jflatow at northwestern.edu 2007-10-16 17:26 EST ------- On second thought, let me just rewrite all the code: # The Bio.Fasta parser class Fasta(): # or whatever @staticmethod def parse(file): # return an iterator over the file as Bio.Fasta.Records # for the records, trim newline from header, don't do anything to data # The Bio.SeqIO.FastaIO wrapper for Bio.Fasta class FastaIO(): # or however its organized @staticmethod def header_todict(header): parts = re.split('[,|]?\s+', header, maxsplit=1) assert len(parts) == 2 return {'id': parts[0], 'description': header} @staticmethod def data_toseq(data, alphabet): return Seq(re.sub('\s+', '', data), alphabet) @staticmethod def parse(file, header_todict=Fasta.header_todict, alphabet=single_letter_alphabet): return (SeqRecord(seq=data_toseq(record.data, alphabet), **header_todict(record.header)) for record in Bio.Fasta.parse(file)) # Now to use these in my example I can do seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file)) for record in Bio.Fasta.parse(qual_file): id = Bio.SeqIO.FastaIO.header_todict(record.header)['id'] seq_dict[id].quality = [int(x) for x in record.data.split()] # Suppose instead I have an alignment file, which looks like this: >contigname A A 10 64 T T 9 64 C C 9 64 ... # and on, where the first column is a reference sequence, the second column is a consensus # sequence, the third column is the number of reads aligned, the fourth column is the combined # quality score # Now its just as easy for me to parse this into an object class ContigAlign(): def __init__(self, name, ref, consensus, numreads, qscore): self.name = name self.ref = ref self.consensus = consensus self.numreads = numreads self.qscore = qscore # ill make a dictionary of my contigaligns d = {} for record in Bio.Fasta.parse(file): (ref, consensus, numreads, qscore) = zip(record.data.split('\n')) d[record.header] = ContigAlign(record.header, ref, consensus, numreads, qscore) # maybe i would turn ref and consensus into Seqs, but you get the point -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:38:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:38:45 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162138.l9GLcj29011655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:38 EST ------- In comment 3, did you just make up this file format as an example? >contigname A A 10 64 T T 9 64 C C 9 64 ... with four columns: reference sequence, consensus, number of reads aligned, and combined quality score. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:58:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 17:58:38 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162158.l9GLwc68012343@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #5 from jflatow at northwestern.edu 2007-10-16 17:58 EST ------- Nope, they actually have a file format that looks like this: Position Consensus Quality Score Depth Signal StdDeviation >contig00001 1 1 G 64 2 1.00 0.00 2 A 64 2 1.00 0.00 3 G 64 2 1.00 0.00 4 A 64 2 1.00 0.00 5 G 64 2 2.00 0.00 6 G 64 2 2.00 0.00 7 A 64 2 3.00 0.00 8 A 64 2 3.00 0.00 9 A 64 2 3.00 0.00 10 C 64 2 2.00 0.00 11 C 64 2 2.00 0.00 12 T 64 2 1.00 0.00 13 C 64 2 3.00 0.00 14 C 64 2 3.00 0.00 15 C 64 2 3.00 0.00 16 G 64 2 1.00 0.00 17 T 64 2 1.00 0.00 18 G 64 2 1.00 0.00 19 A 64 2 1.00 0.00 20 T 64 2 1.00 0.00 21 C 64 2 2.00 0.00 22 C 64 2 2.00 0.00 Note the file-wide header at the top of the page (a generic FASTA-like parser might skip to the first '>'), or we could get rid of that beforehand but it would be nice if it were smart. Also, here is another sample FASTA-like file format they use for pair alignments: >ERSGEES01EM5WC, 2..30 of 95 and ERSGEES01C1ZV2, 1..29 of 268 (29/29 ident) 2 CGGTGACCCGGGAGATCTGAATTCCTGGT 30 1 CGGTGACCCGGGAGATCTGAATTCCTGGT 29 >ERSGEES01EM5WC, 2..29 of 95 and ERSGEES01DMS5T, 1..28 of 259 (28/28 ident) 2 CGGTGACCCGGGAGATCTGAATTCCTGG 29 1 CGGTGACCCGGGAGATCTGAATTCCTGG 28 >ERSGEES01EM5WC, 29..2 of 95 and ERSGEES01D8GDV, 205..232 of 232 (28/28 ident) 29 CCAGGAATTCAGATCTCCCGGGTCACCG 2 205 CCAGGAATTCAGATCTCCCGGGTCACCG 232 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:09:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 18:09:06 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162209.l9GM96N5012764@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #6 from jflatow at northwestern.edu 2007-10-16 18:09 EST ------- The reference/consensus one was inspired by yet another format they have: there are 2 tools they provide, one for mapping to an existing sequence, the other for ab initio contig building. The mapping one has the extra reference column. As you can see it might be hard to keep up with all these similar formats as part of Biopython (these are only from one source). Certainly the common ones should have wrappers but we should also be able to easily get the stream of records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:13:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 18:13:48 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162213.l9GMDmBM012914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 18:13 EST ------- Could you attach a few of these real files? Including where they came from, i.e. the company whose software writes such output, and what the call each file format variant. If you can get a matched set (i.e. all associated with the same few sequences) then even better. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 16 23:09:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 19:09:00 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710162309.l9GN90wg015092@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #8 from jflatow at northwestern.edu 2007-10-16 19:08 EST ------- The files are very large, I assure you they are just longer versions of what I have supplied here though. The company is Roche Diagnostics. The initial reads/quality files are the output of the 454 GSFlex genome sequencing machines. They have two pieces of software: gsMapper and gsAssembler which output the contigs. Reads/Quality files from the machine are called: 454Reads.{fna,qual} gs* output: 454{All,Large}Contigs.{fna,qual} 454PairAlign.txt 454AlignmentInfo.tsv -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 00:10:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 20:10:45 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710170010.l9H0AjYe018147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 20:10 EST ------- > Note there is yet another (!) translation function Bio.SeqUtils.translate() > which is frame aware [personally I would mark a lot of this module as > deprecated]. Given the various translate functions we already have in Biopython, why do you want to add another one? Is there something the "translate" method can do that the "translate" function cannot? Since the "translate" function can take Seq objects as well as simple strings, I'd prefer the "translate" function over a "translate" method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:49:18 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 17:49:18 +0100 Subject: [Biopython-dev] SeqRecord to file format as string In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> Message-ID: <4714EB8E.3000700@maubp.freeserve.co.uk> >> Did you know you can write to a string using any Bio.SeqIO supported >> file format using StringIO? Perhaps we should spell this out more >> explicitly in the documentation, but a motivating example would help. > > This is what I do now, but it seems like a hack to me to go this > route. To always have to write to a file feels strange, but I see > that it would be messy to go OO since there are so many formats. > However, giving preference to fasta over other formats by making it > innate doesn't seem like such a terrible idea. I do have mixed > feelings about 'bloating' the code which is why I asked, and you have > convinced me that this is not quite appropriate given existing > convention. However the idea would be to put the to_fasta or > to_format method inside the SeqRecord, then to call it from the IO > when needed to actually write to a file, but call it directly when > all that is wanted is a string... Its debatable isn't it? I suspect that for most users, when they want a record in a particular file format its for writing to a file. However, adding a to_format() method to a SeqRecord some sense (suitable for sequential file formats only). This would take a format name and return a string, by calling Bio.SeqIO with a StringIO object internally. Peter From bugzilla-daemon at portal.open-bio.org Wed Oct 17 02:17:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Oct 2007 22:17:28 -0400 Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser In-Reply-To: Message-ID: <200710170217.l9H2HSAx024040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 22:17 EST ------- If all these special fasta files are coming from Roche Diagnostics, I'd suggest to create a module rather than trying to put this in Bio.SeqIO. Bio.SeqIO is one of the few modules in Biopython that is used by most users, so I'd like to keep it clean as much as possible. To avoid confusion for users who just want to parse regular Fasta files, I think the module should not be called Bio.Fasta. In addition, I doubt we'd get much code reuse from a generic Bio.Fasta module beyond what is needed for the Roche files, since the only thing they have in common is that they use ">" to separate records. With a separate module to handle the Roche files, my preferred usage would be something like this: from Bio import SeqIO, GSFlex # Or whatever you'd like to call it seqrecords = SeqIO.parse(open("mysequences.fa"), "fasta") qualities = GSFlex.parse(open("myqualities.qual"), "quality") for seqrecord, quality in zip(seqrecords, qualities): seqrecord.quality = quality Note that "quality" is currently not a field of the SeqRecord class, but with SeqRecord being a Python class, we can just add fields on the fly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Oct 17 02:30:41 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 22:30:41 -0400 Subject: [Biopython-dev] CVS to SVN References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk> <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu> <471526A9.1010709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63B@mail2.exch.c2b2.columbia.edu> > > Does that mean most developers don't WANT to move, or just that they > > don't ACTIVELY want to move? > > Speaking for myself, I have no strong desire either way, and I don't > think Michiel objected either (except over the timing). I don't have an objection against SVN either now or later. If you wants to do the CVS->SVN conversion, just make sure to inform the developers when they should pause making commits to CVS during the actual move. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/16/2007 5:01 PM To: Jared Flatow; biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] CVS to SVN Jared Flatow wrote: >> I would say one reason why we aren't charging ahead with a move >> from CVS to subversion is only a few posters on this mailing list >> actively WANT to move to subversion, and no-one has really >> championed the move (yet). > > Does that mean most developers don't WANT to move, or just that they > don't ACTIVELY want to move? Going back over the archives, Chris Lasher was most vocal in supporting the move, and there were a few other positive voices. Speaking for myself, I have no strong desire either way, and I don't think Michiel objected either (except over the timing). Then as now, we are hoping to get the next release out "shortly", so after that would be a good time to make the switch. [I'm assuming we won't loose any revision history or comments, and that things like the web based ViewCVS and its RSS feed will still be available] Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From mdehoon at c2b2.columbia.edu Wed Oct 17 02:45:34 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 16 Oct 2007 22:45:34 -0400 Subject: [Biopython-dev] SeqRecord to file format as string References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB8E.3000700@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu> How about the following: SeqIO.write(sequences, handle, format) returns the properly formatted string if handle==None. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/16/2007 12:49 PM To: Jared Flatow Cc: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] SeqRecord to file format as string >> Did you know you can write to a string using any Bio.SeqIO supported >> file format using StringIO? Perhaps we should spell this out more >> explicitly in the documentation, but a motivating example would help. > > This is what I do now, but it seems like a hack to me to go this > route. To always have to write to a file feels strange, but I see > that it would be messy to go OO since there are so many formats. > However, giving preference to fasta over other formats by making it > innate doesn't seem like such a terrible idea. I do have mixed > feelings about 'bloating' the code which is why I asked, and you have > convinced me that this is not quite appropriate given existing > convention. However the idea would be to put the to_fasta or > to_format method inside the SeqRecord, then to call it from the IO > when needed to actually write to a file, but call it directly when > all that is wanted is a string... Its debatable isn't it? I suspect that for most users, when they want a record in a particular file format its for writing to a file. However, adding a to_format() method to a SeqRecord some sense (suitable for sequential file formats only). This would take a format name and return a string, by calling Bio.SeqIO with a StringIO object internally. Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Wed Oct 17 07:01:53 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 03:01:53 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710170701.l9H71rML002584@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 03:01 EST ------- The Biopython test currently fails: ====================================================================== FAIL: test_seq ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 151, in runTest self.runSafeTest() File "run_tests.py", line 188, in runSafeTest expected_handle) File "run_tests.py", line 289, in compare_output "\nOutput : "+`output_line` + "\nExpected: "+`expected_line` AssertionError: Output : "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XRBWAYSNKMDCHVGU', Alphabet())\n" Expected: "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XYVWARSNMKHCDBGU', Alphabet())\n" ---------------------------------------------------------------------- This is with a fresh checkout from CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:01:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:01:12 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710170801.l9H81CVn005428@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:01 EST ------- > Given the various translate functions we already have in > Biopython, why do you want to add another one? Is there > something the "translate" method can do that the "translate" > function cannot? Since the "translate" function can take Seq > objects as well as simple strings, I'd prefer the "translate" > function over a "translate" method. Its a style thing, allowing more of an object orientated coding style. For comparison, look at the evolution of the string module in python which was phased out in favour of string object methods. In terms of capabilities/arguments, I think the Bio.Seq.translate() function and the suggested new Bio.Seq.Seq.translate() object method should be equivalent. In fact, I would have one call the other internally. Currently, if you work with strings, you can use the following nice concise style: from Bio import Seq #The module my_str = "ACTGACCGTGC" print Seq.translate(my_str) However, if you want to use Seq objects, this becomes rather a mess (in my opinion): from Bio import Seq #The module from Bio.Alphabet.IUPAC import unambiguous_dna my_seq = Seq.Seq("ACTGACCGTGC", unambiguous_dna) print Seq.translate(my_seq) I would like to be able to do this, using an object method: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import unambiguous_dna my_seq = Seq("ACTGACCGTGC", unambiguous_dna) print my_seq.translate() Another bonus for people who think OO, is doing dir(my_seq) would list these useful methods. Right now the user has to know to go looking in the Bio.Seq module for a function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:06:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:06:51 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710170806.l9H86ppn006217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Generic FASTA parser |Generic Roche or GSFlex | |"FASTA" parser ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:06 EST ------- Now that I'm clear where these files come from, I would agree with Michiel that a separate Bio.GSFlex or Bio.Roche module would make more sense. I've added these keywords to the bug summary (to help anyone searching in future). P.S. From Michiel's example, you could use the existing SeqRecord annotations dictionary if you wanted to avoid adding a new attribute to the objects on the fly. for seqrecord, quality in zip(seqrecords, qualities): #seqrecord.quality = quality #If you would rather use the existing dictionary: seqrecord.annotations["quality"] = quality -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:46:41 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 04:46:41 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710170846.l9H8kfYq008185@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:46 EST ------- Fixed, I think. I had some RNA/DNA with the U and T the wrong way round... and when I recently tweaked the alphabet detection in Bio/Seq.py this had an impact. The root issue is that we don't check the Alphabet's letters agree with the sequence when creating a Seq object (where the Alphabet supplied has an explicit list of letters). That would have caught the error in the test suite much earlier. Maybe I should file an enhancement bug on this issue. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Oct 17 15:20:51 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 11:20:51 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710171520.l9HFKpXj030514@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #11 from jflatow at northwestern.edu 2007-10-17 11:20 EST ------- That sounds very reasonable to me to put all this stuff in a separate module. I would like to implement the design we have been discussing, and I will name it whatever you think is appropriate, but I would like to do it the way that seems *right* to me. I mean by that building off of streams of >header data ... since I think this pattern could eventually be used to actually clean up the rest of the FASTA stuff, not make it worse. I also believe there could potentially be other instances when a more general FASTA parser would be useful, even if we don't know of them yet. How does this sound? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 00:46:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Oct 2007 20:46:19 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200710180046.l9I0kJfN027373@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 20:46 EST ------- > I also believe there could > potentially be other instances when a more general FASTA parser would be > useful, even if we don't know of them yet. How does this sound? To me, it sounds premature to create a general Fasta parser if we don't know other instances where it might be useful. (For comparison, note that Biopython's general parser framework described in section 5.3 of the tutorial is hardly used in recent Biopython parsers). By all means, keep the module short and simple. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 04:33:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 00:33:34 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710180433.l9I4XYeY004357@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 00:33 EST ------- If we add translate, transcribe methods to Seq objects, can we then deprecate Bio.Transcribe, Bio.Translate? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 05:21:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 01:21:15 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710180521.l9I5LFVS006209@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 01:21 EST ------- Looking at the test_GenBankFormat failure again. This is the only test that fails with the Biopython currently in CVS, using mxTextTools 3.0. This test is the only test for Bio.expressions. If we remove test_GenBankFormat, we should deprecate Bio.expressions. Of all the Biopython tests, only test_format_registry depends on Bio.expressions. This test relies on the function _load_registries() in Bio/__init__.py. Skipping this function call in Bio/__init__.py affects no other Biopython test. So I'd like to suggest the following for the upcoming release: -) Remove test_GenBankFormat.py and test_format_registry.py -) Add DeprecationWarnings to Bio.expressions. -) Skip the call to _load_registries() in Bio/__init__.py We then have a working Biopython again with the recent mxTextTools, with minimal disruptive changes to the code. Any objections? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 18 10:35:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 06:35:01 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710181035.l9IAZ1DH022693@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-18 06:35 EST ------- Michiel in comment 6 wrote: > If we add translate, transcribe methods to Seq objects, can we then > deprecate Bio.Transcribe, Bio.Translate? Bio.Transcribe - Yes ==================== Bio.Transcribe is so trivial we could recreate that code in Bio.Seq and then deprecate Bio.Transcribe without losing any functionality: - transcibe - back_transcibe Bio.Translate - Maybe ===================== Initially I was just proposing to add: - translate (including all stop codons) I was simply going to have Bio.Seq call Bio.Translate to do the work. It would be nice to simplify Biopython by also deprecating Bio.Translate, but if we want to do this without loss of current functionality we need to consider including the following in Bio.Seq: - translate_to_stop (all amino acids up to but excluding the first stop) - back_translate (gives a single possible nucleotide sequence) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Thu Oct 18 20:06:10 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Oct 2007 21:06:10 +0100 Subject: [Biopython-dev] BioSQL documentation Message-ID: <4717BCB2.2070301@maubp.freeserve.co.uk> I was just having a look at: http://biopython.org/DIST/docs/biosql/python_biosql_basic.html The source for this HTML and PDF document lives here in the BioSQL CVS: biosql-schema/doc/biopython/python_biosql_basic.tex It would be nice to update the following section: > 3.3 Loading a GenBank file into the database > > ... > > Now we want to do the loading of the file into the database. The > Biopython implementation works by taking a standard Iterator object > that returns Biopython SeqFeature objects and then doing the loading. I think that should actually say "... that returns Biopython SeqRecord objects containing SeqFeature objects ..." > ... our GenBank file, which we can do with the following code: > >>>> from Bio import GenBank parser = GenBank.FeatureParser() >>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser) That can now be done with Bio.SeqIO which should be clearer, and hopefully make it easier to see how to extend this to SwissProt etc: from Bio import SeqIO iterator = SeqIO.parse(open("cor6_6.gb"), "genbank") I would do this myself, but I don't have a BioSQL database setup myself right now. It would also be nice if the documentation didn't skip the bit about setting up the database with the BioSQL schema, or at least had links to the relevant BioSQL documentation. Peter From bugzilla-daemon at portal.open-bio.org Fri Oct 19 02:15:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Oct 2007 22:15:01 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710190215.l9J2F1bo006275@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 22:15 EST ------- > It would be nice to simplify Biopython by also deprecating Bio.Translate, To avoid a plethora of translation functions, I would prefer that. > but if we want to do this without loss of current functionality we > need to consider including the following in Bio.Seq: > - translate_to_stop (all amino acids up to but excluding the first stop) Whether or not to stop translating at the first stop codon could be an argument to the translate method. As an alternative, it may be preferable to have a split() method that splits the sequences at the stop codons. Such a method could be applied to all protein sequences, not only those created by translate(). > - back_translate (gives a single possible nucleotide sequence) Does anybody actually use this function? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From salish at picasso.ucsf.edu Fri Oct 19 07:12:53 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Fri, 19 Oct 2007 00:12:53 -0700 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: <200710190215.l9J2F1bo006275@portal.open-bio.org> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> Message-ID: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> Yes. Back-translating a sequence is important in codon optimization, searching for homologous proteins, etc. > > - back_translate (gives a single possible nucleotide sequence) > Does anybody actually use this function? > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From bugzilla-daemon at portal.open-bio.org Fri Oct 19 12:38:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 19 Oct 2007 08:38:59 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710191238.l9JCcx4I001886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 ------- Comment #37 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-19 08:38 EST ------- I would have suggested adding a mxTextTools version check to test_GenBankFormat.py and throwing the external dependancy error is 3.0 is found. That would "solve" the problem test case, and after the next release we could begin the process of deprecating the modules you suggested. But I'm OK with your suggestion Michiel (comment 36), although it seems a little drastic. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Fri Oct 19 08:08:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Oct 2007 09:08:41 +0100 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> Message-ID: <47186609.3090408@maubp.freeserve.co.uk> Howard Salis wrote: > Yes. Back-translating a sequence is important in codon optimization, > searching for homologous proteins, etc. Unlike forward translation, transcription, back-transcription, complements and reverse complements, back-translation is not a one-to-one mapping. In your examples, would you want to know all: - all possible back translations (as unambigous nucleotides) - all possible back translations (as ambigous nucleotides) - a possible back translation (using ambiguous nucleotides) - a possible back translation (using un-ambiguous nucleotides) For example, back translating an Tyr => UAC or UAU => UAW (nice and clear - we can represent this perfectly with a single ambiguous codon). On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN Oh, and would you expect DNA or RNA back? Peter From salish at picasso.ucsf.edu Fri Oct 19 16:31:49 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Fri, 19 Oct 2007 09:31:49 -0700 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <47186609.3090408@maubp.freeserve.co.uk> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk> Message-ID: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> Yes, I know it's a one-to-many mapping. But that's why it's nice to have a handy subroutine for doing it. For codon optimization, all possible back translations with unambiguous nucleotides would be best. Then, one evaluates some objective function over all possible sequences to find an optimal one. Optimality depends on the application, but eliminating restriction sites, avoiding certain repetitive or transposon sequences, etc is very common. For searching for homologous proteins, it would be best to have the back-translate function produce something that could be fed into an alignment program or regexp expression. Then, one could align a database of sequences with your back-translated protein to determine which sequence is most similar to your protein. Basically, this is what BlastP does (you might want to look up its algorithm to determine a good way of doing this, if you wish to reproduce it in Biopython or, otherwise, rely on the NCBI webserver). What does the current back-translate function output? -Howard On 10/19/07, Peter wrote: > Howard Salis wrote: > > Yes. Back-translating a sequence is important in codon optimization, > > searching for homologous proteins, etc. > > Unlike forward translation, transcription, back-transcription, > complements and reverse complements, back-translation is not a > one-to-one mapping. > > In your examples, would you want to know all: > - all possible back translations (as unambigous nucleotides) > - all possible back translations (as ambigous nucleotides) > - a possible back translation (using ambiguous nucleotides) > - a possible back translation (using un-ambiguous nucleotides) > > For example, back translating an Tyr => UAC or UAU => UAW (nice and > clear - we can represent this perfectly with a single ambiguous codon). > On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN > > Oh, and would you expect DNA or RNA back? > > Peter > > From bugzilla-daemon at portal.open-bio.org Mon Oct 22 09:07:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Oct 2007 05:07:05 -0400 Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in (Reverse)complement functions in Bio.Seq In-Reply-To: Message-ID: <200710220907.l9M975hw013729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2366 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-22 05:07 EST ------- Marking as fixed -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 22 12:30:59 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Oct 2007 13:30:59 +0100 Subject: [Biopython-dev] [Bug 2381] back-translate In-Reply-To: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk> <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com> Message-ID: <471C9803.8050709@maubp.freeserve.co.uk> Howard Salis wrote: > > What does the current back-translate function output? > Short example, >>> from Bio import Translate >>> from Bio.Seq import Seq >>> from Bio.Alphabet.IUPAC import unambiguous_dna >>> my_dna = Seq("GCCGCATGCATAGATAGATAG", unambiguous_dna) >>> my_prot = Translate.unambiguous_dna_by_id[11].translate(my_dna) >>> my_prot Seq('AACIDR*', HasStopCodon(IUPACProtein(), '*')) >>> Translate.unambiguous_dna_by_id[11].back_translate(my_prot) Seq('GCTGCTTGTATTGATCGTTAA', IUPACUnambiguousDNA()) >>> my_dna Seq('GCCGCATGCATAGATAGATAG', IUPACUnambiguousDNA()) i.e. The current back_translate picks one possible back translation. Peter From bugzilla-daemon at portal.open-bio.org Mon Oct 22 16:52:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Oct 2007 12:52:12 -0400 Subject: [Biopython-dev] [Bug 2386] New: Bio.Seq.Seq and MutableSeq count() method only works for single residues Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2386 Summary: Bio.Seq.Seq and MutableSeq count() method only works for single residues Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Logging this bug to report an issue raised on the mailing list by Jimmy Musselwhite. The Seq object and MutableSeq objects' count methods only works for single residues (e.g. "G"). Zero is returned when asked to count a multicharacter string, "GG" for example. For compatibility with strings, my_seq.count("GG") should work as expected, returning the same value as my_seq.tostring().count("GG") Doing this for the Seq object is trivial. Adding support for the MutableSeq could be done via the tostring() method but there might be a more elegant solution with less overhead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Oct 24 00:46:58 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 23 Oct 2007 20:46:58 -0400 Subject: [Biopython-dev] Removing deprecated functionality Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> Hi everybody, We now have a fixed Biopython in CVS that works with both the old and the new mxTextTools. I am planning to create the new Biopython release later this week. Bio.Kabat and the blast and blasturl functions in Bio.Blast.NCBIWWW were deprecated in previous Biopython. The two functions in Bio.Blast.NCBIWWW have been deprecated in favor of qblast starting with Biopython 1.40b (February 2005). I am planning to remove this functionality for release 1.44 -- let us know if this would cause you some problems. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Thu Oct 25 16:58:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 12:58:19 -0400 Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with egenix mxTextTools 3.0 In-Reply-To: Message-ID: <200710251658.l9PGwJgB007432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2361 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 12:58 EST ------- Marking as fixed, Michiel made the changes outlined in comment 36 in CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 17:02:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 13:02:50 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251702.l9PH2oC8008104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Uppdated lcc code. |Updated Bio.lcc code. ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 13:02 EST ------- Sebastian - any feedback on my above comment? P.S. Corrected spelling in bug title. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 17:22:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 13:22:46 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251722.l9PHMkFm009816@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 ------- Comment #4 from sbassi at gmail.com 2007-10-25 13:22 EST ------- (In reply to comment #3) > Sebastian - any feedback on my above comment? > > P.S. Corrected spelling in bug title. > I do agree with most of your comments, but I can't implement them right now because of my current workload. LCC stands for Local Composition Complexity (see here http://mrw.interscience.wiley.com/emrw/9780470015902/els/article/a0005260/current/abstract) Please move it to Bio/SeqUtils/. I don't know the values for ambiguous nucleotides, I would ckeck this for next version. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:01:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 14:01:50 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251801.l9PI1oRF012742@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:01 EST ------- I've checked this in as Bio/SeqUtils/lcc.py (and deprecated Bio/lcc.py which had a slightly different API since you dropped the start/end options in lcc_mult). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:25:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 14:25:49 -0400 Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code. In-Reply-To: Message-ID: <200710251825.l9PIPnEG015022@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2374 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:25 EST ------- Also updated Bio/SeqUtils/lcc.py to cope with Seq and MutableSeq objects in addition to strings. Plus added a unit test, test_SeqUtils.py which covers both Bio.SeqUtils.lcc and Bio.SeqUtils.CheckSum and replaces my older test_CheckSum.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Oct 25 22:03:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Oct 2007 18:03:15 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200710252203.l9PM3For029293@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 18:03 EST ------- Created an attachment (id=795) --> (http://bugzilla.open-bio.org/attachment.cgi?id=795&action=view) Rough patch to add methods to Bio.Seq Part of this patch is to add ambiguous_generic_by_id and ambiguous_generic_by_name entries to Bio.Data.CodonTable, variants of the unambiguous generic_by_id and generic_by_name tables. Using this lets us properly translate ambiguous sequences. This patch does NOT tackle back_translate, or have special treatment of start/stop codons, in the Seq methods. This patch does NOT mark Bio.Translate or Bio.Transcribe as deprecated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Fri Oct 26 02:30:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 25 Oct 2007 22:30:56 -0400 Subject: [Biopython-dev] CVS freeze Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Hi everybody, With the fixed Biopython now in CVS, I'm planning to make the new Biopython release tomorrow (Saturday). I'd therefore like to ask you not to make commits to CVS starting from 0:01 GMT on Saturday, until the new release is out. If you make any commits before that time, please double-check that all the Biopython tests still run. Also, if you have some important patches for which you need more time, please let us know. Thanks! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bsouthey at gmail.com Fri Oct 26 15:12:14 2007 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 26 Oct 2007 10:12:14 -0500 Subject: [Biopython-dev] CVS freeze In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: Hi, Just in case you are not aware of it, UniProt is going to make a substantial change to the DE line in SwissProt/TrEMBL format 'Not before: 01-Feb-2008'. See http://www.expasy.org/sprot/relnotes/sp_soon.html#DE Bruce On 10/25/07, Michiel De Hoon wrote: > Hi everybody, > > With the fixed Biopython now in CVS, I'm planning to make the new Biopython > release tomorrow (Saturday). I'd therefore like to ask you not to make > commits to CVS starting from 0:01 GMT on Saturday, until the new release is > out. If you make any commits before that time, please double-check that all > the Biopython tests still run. Also, if you have some important patches for > which you need more time, please let us know. > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython-dev at maubp.freeserve.co.uk Fri Oct 26 15:24:57 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 16:24:57 +0100 Subject: [Biopython-dev] DE line in SwissProt/TrEMBL format In-Reply-To: References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: <472206C9.6060407@maubp.freeserve.co.uk> Bruce Southey wrote: > Hi, > Just in case you are not aware of it, UniProt is going to make a > substantial change to the DE line in SwissProt/TrEMBL format 'Not > before: 01-Feb-2008'. See > http://www.expasy.org/sprot/relnotes/sp_soon.html#DE > > Bruce Thanks for the heads up. I don't think we need to worry about that for the planned release. We should still be able to parse the new files, its just the new structured description will be stored as a single concatenated string in Biopython. Peter From mdehoon at c2b2.columbia.edu Sat Oct 27 03:12:46 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 26 Oct 2007 23:12:46 -0400 Subject: [Biopython-dev] CVS freeze References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B644@mail2.exch.c2b2.columbia.edu> Thanks for letting us know. I think that it is OK though to make the release now, as we'll probably have another release before the date the SwissProt/TrEMBL format changes. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Bruce Southey [mailto:bsouthey at gmail.com] Sent: Fri 10/26/2007 11:12 AM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] CVS freeze Hi, Just in case you are not aware of it, UniProt is going to make a substantial change to the DE line in SwissProt/TrEMBL format 'Not before: 01-Feb-2008'. See http://www.expasy.org/sprot/relnotes/sp_soon.html#DE Bruce On 10/25/07, Michiel De Hoon wrote: > Hi everybody, > > With the fixed Biopython now in CVS, I'm planning to make the new Biopython > release tomorrow (Saturday). I'd therefore like to ask you not to make > commits to CVS starting from 0:01 GMT on Saturday, until the new release is > out. If you make any commits before that time, please double-check that all > the Biopython tests still run. Also, if you have some important patches for > which you need more time, please let us know. > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mdehoon at c2b2.columbia.edu Sun Oct 28 06:32:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:32:40 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Hi everybody, Biopython release 1.44 is now available for download from the Biopython website at http://biopython.org. This release includes lots of code improvements and fixes in the Blast interface and parsers, sequence input/output, the SwissProt parser, the clustering routines, as well as a brand new module for population genetics. For reasons of compatibility, some radical changes were necessary in some parts of the code; please let us know if you find some functionality missing. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Sun Oct 28 06:35:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:35:12 -0400 Subject: [Biopython-dev] Non-ASCII character in PopGen documentation Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> While making the 1.44 release, I noticed that a non-ascii character in a formula in the PopGen documentation was causing problems for Hevea. As I couldn't guess what the formula should be, I replaced this formula by a placeholder (see CVS). Can somebody have a look at this and try to fix it? Thanks! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 09:43:56 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 09:43:56 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <472459DC.3050907@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Grand. Thank you for dedicating your weekend to this Michiel. A couple of questions, the main Wiki page is locked and needs updating to mention the new release. Who has access? Secondly, I see you have updated the open-bio news feed, http://news.open-bio.org/ What about http://biopython.org/news/ - which appears not to have been used at all since it was started. Perhaps we can just have a filtered view of http://news.open-bio.org/ here? Related to this, on the wiki News page perhaps we can just show the last few items from http://news.open-bio.org/index.rdf (even though some of them are for other Bio* projects). Peter From tiagoantao at gmail.com Sun Oct 28 18:24:55 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 18:24:55 +0000 Subject: [Biopython-dev] Non-ASCII character in PopGen documentation In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu> Message-ID: <4724D3F7.40105@gmail.com> I currently have a different version of Tutorial.tex here (I have other things already written in preparation for future versions). I don't know how the non-ascii character got there. The formula is: \[ N_{m} = \frac{1 - F_{st}}{4F_{st}} \] Michiel De Hoon wrote: > While making the 1.44 release, I noticed that a non-ascii character in a > formula in the PopGen documentation was causing problems for Hevea. As I > couldn't guess what the formula should be, I replaced this formula by a > placeholder (see CVS). Can somebody have a look at this and try to fix it? > > Thanks! > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- tiagoantao at gmail.com http://tiago.org/ps From tiagoantao at gmail.com Sun Oct 28 20:54:06 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 20:54:06 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <4724F6EE.50805@gmail.com> Hi, Michiel De Hoon wrote: > This release includes lots of code improvements and fixes in the Blast > interface and parsers, sequence input/output, the SwissProt parser, the > clustering routines, as well as a brand new module for population genetics. > For reasons of compatibility, some radical changes were necessary in some > parts of the code; please let us know if you find some functionality missing. Is it OK to resume uploading non stable code to CVS? I have a few things that I would like to add to the population genetics module (coalescent simulation), but that still needs polishing (mainly documenting and test code). Tiago -- tiagoantao at gmail.com http://tiago.org/ps From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 20:55:42 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 20:55:42 +0000 Subject: [Biopython-dev] Biopython release 1.44 ready In-Reply-To: <4724F6EE.50805@gmail.com> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <4724F74E.5010801@maubp.freeserve.co.uk> Tiago Antao wrote: > Is it OK to resume uploading non stable code to CVS? I have a few things > that I would like to add to the population genetics module (coalescent > simulation), but that still needs polishing (mainly documenting and test > code). Wait and see what Michiel's says. However, perhaps you should hold off a few more days - in case there are any teething problems with the Biopython 1.44 that would warrant making another release ASAP. Peter From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 19:59:41 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 19:59:41 +0000 Subject: [Biopython-dev] mxTextTools optional? In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Message-ID: <4724EA2D.3020609@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Given some of the changes (deprecations) made in this release, perhaps we can now change setup.py so that mxTextTools is merely suggested, but not required (the same way we treat reportlab and Numeric). Any comments? Peter From mdehoon at c2b2.columbia.edu Mon Oct 29 01:12:48 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 21:12:48 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <472459DC.3050907@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B647@mail2.exch.c2b2.columbia.edu> > Michiel De Hoon wrote: > > Hi everybody, > > > > Biopython release 1.44 is now available for download from the Biopython > > website at http://biopython.org. > > Grand. Thank you for dedicating your weekend to this Michiel. > > A couple of questions, the main Wiki page is locked and needs updating > to mention the new release. Who has access? Apparently I do. I updated this page now. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2913 bytes Desc: not available URL: From mdehoon at c2b2.columbia.edu Mon Oct 29 01:57:18 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 21:57:18 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B648@mail2.exch.c2b2.columbia.edu> > Is it OK to resume uploading non stable code to CVS? I have a few things > that I would like to add to the population genetics module (coalescent > simulation), but that still needs polishing (mainly documenting and test > code). Sure, go ahead. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Mon Oct 29 02:01:16 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 22:01:16 -0400 Subject: [Biopython-dev] Biopython release 1.44 ready References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724F6EE.50805@gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B649@mail2.exch.c2b2.columbia.edu> On second thought, I agree with Peter's suggestion of waiting a few days to see if any disasters show up with the 1.44 release. Sorry! --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Tiago Antao [mailto:tiagoantao at gmail.com] Sent: Sun 10/28/2007 4:54 PM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] Biopython release 1.44 ready Hi, Michiel De Hoon wrote: > This release includes lots of code improvements and fixes in the Blast > interface and parsers, sequence input/output, the SwissProt parser, the > clustering routines, as well as a brand new module for population genetics. > For reasons of compatibility, some radical changes were necessary in some > parts of the code; please let us know if you find some functionality missing. Is it OK to resume uploading non stable code to CVS? I have a few things that I would like to add to the population genetics module (coalescent simulation), but that still needs polishing (mainly documenting and test code). Tiago -- tiagoantao at gmail.com http://tiago.org/ps From mdehoon at c2b2.columbia.edu Mon Oct 29 02:02:12 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 22:02:12 -0400 Subject: [Biopython-dev] mxTextTools optional? References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> <4724EA2D.3020609@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64A@mail2.exch.c2b2.columbia.edu> The fewer software packages Biopython requires, the better, to keep things simple for users, not to mention developers. For a future release, we should check if the modules that still rely on mxTextTools can be replaced by pure-Python code. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter Sent: Sun 10/28/2007 3:59 PM To: biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] mxTextTools optional? Michiel De Hoon wrote: > Hi everybody, > > Biopython release 1.44 is now available for download from the Biopython > website at http://biopython.org. Given some of the changes (deprecations) made in this release, perhaps we can now change setup.py so that mxTextTools is merely suggested, but not required (the same way we treat reportlab and Numeric). Any comments? Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3310 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Mon Oct 29 17:21:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 13:21:03 -0400 Subject: [Biopython-dev] [Bug 2390] New: Error importing Swiss Prot in BioSQL Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2390 Summary: Error importing Swiss Prot in BioSQL Product: Biopython Version: Not Applicable Platform: Macintosh OS/Version: MacOS X Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: Biosql at hotmail.com Hello, I already submitted this problem in the mailing list, where I can't import the SwissProt flat file in BioSQL, even with the new version (1.44) of Biopython. Here's the script I'm using : from BioSQL import BioSeqDatabase from Bio.SwissProt import SProt server = BioSeqDatabase.open_database(driver = 'MySQLdb', user = '', passwd = '', host = 'localhost', db = 'bioseqdb') s_parser = SProt.SequenceParser() s_iterator = SProt.Iterator(open('path to/uniprot_sprot.dat', 'r'), s_parser) db = server.new_database('Swiss') db.load(s_iterator) And here's the error: Traceback (most recent call last): File '', line 1, in File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 414, in load db_loader.load_seqrecord(cur_record) File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 250, in _load_bioentry_table version)) File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 277, in execute self.cursor.execute(sql, args or ()) File '/sw/lib/python2.5/site-packages/MySQLdb/cursors.py', line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Thanks for the help ! Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 29 17:23:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 13:23:54 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710291723.l9THNsun017818@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #1 from Biosql at hotmail.com 2007-10-29 13:23 EST ------- Created an attachment (id=799) --> (http://bugzilla.open-bio.org/attachment.cgi?id=799&action=view) Sample of Swiss Prot flat file -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Oct 29 19:19:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Oct 2007 15:19:01 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710291919.l9TJJ1O2026999@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-29 15:19 EST ------- I'm trying to narrow down the problem: * Have you tried different input SwissProt files? * Have you tried a GenBank file (using the GenBank parser)? * Did you check the username/password as suggested on the mailing list (empty strings look wrong to me)? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 19:58:45 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 19:58:45 +0000 Subject: [Biopython-dev] BioRegistry, Bio.db Message-ID: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> While looking over the Tutorial this evening (and making some sequence related updates), I and noticed that the section "BioRegistry ? automatically ?nding sequence sources" (in the Cook Book chapter) doesn't work anymore. I believe that Bio.db is setup by the complicated and un-commented code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was commented out for Biopython 1.44 Does anyone use this module? I've never really looked at it in depth, but it looks interesting and perhaps worth saving. Note if we do want to resurrect it, it needs a unit test. At first glance, the only Martel dependency here is for recognising error conditions and giving nice messages instead. If that's all it is used for, then perhaps we can switch to regular expressions instead. Peter From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 21:39:50 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 21:39:50 +0000 Subject: [Biopython-dev] Removing deprecated functionality In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00710291439t6f636964i9681e2b0c90e6c96@mail.gmail.com> On 10/24/07, Michiel De Hoon wrote: > Bio.Kabat and ,,, were deprecated in previous Biopython. .... > I am planning to remove this functionality for release 1.44 I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is it OK if we remove the now empty directory as well? Peter From mdehoon at c2b2.columbia.edu Tue Oct 30 01:06:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 29 Oct 2007 21:06:38 -0400 Subject: [Biopython-dev] Removing Bio.Kabat References: <320fb6e00710291438x3f7d7d57t77b06e4b2221c470@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64E@mail2.exch.c2b2.columbia.edu> As far as I know, it is not possible to remove a directory in CVS. See http://www.thathost.com/wincvs-howto/cvsdoc/cvs_7.html#SEC69 I believe that it is possible to remove a directory by hand from the CVS source tree, but it is not the official way to do it. Hopefully, we can remove directories once we're using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: Mon 10/29/2007 5:38 PM To: Michiel De Hoon Cc: biopython-dev at biopython.org Subject: Re: [Biopython-dev] Removing Bio.Kabat On 10/24/07, Michiel De Hoon wrote: > Bio.Kabat and ,,, were deprecated in previous Biopython. .... > I am planning to remove this functionality for release 1.44 I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is it OK if we remove the now empty directory as well? Peter From bugzilla-daemon at portal.open-bio.org Tue Oct 30 12:25:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 08:25:20 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301225.l9UCPKjo026963@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 08:25 EST ------- Created an attachment (id=800) --> (http://bugzilla.open-bio.org/attachment.cgi?id=800&action=view) Patch to Bio/Seq.py count methods Lets the Seq and MutableSeq count methods take either a single letter or a multiple letter argument, which can be strings, Seq objects or MutableSeq objects. Adds doc-strings Includes a trivial mini-test which would be used in the Seq unit test instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Tue Oct 30 14:17:29 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 30 Oct 2007 10:17:29 -0400 Subject: [Biopython-dev] Biopython SVN Transition OK'd Message-ID: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> Hi all, Biopython just got the okay from OpenBio to transition from CVS to Subversion--a good step in the right direction (though I've recently started transitioning from SVN to Bazaar VCS). All we have to do is come up with a date when the CVS repository can be locked down and taken offline. Also, I need to know what is needed from me in terms of helping all the devs migrate to SVN. I produced a screencast series on Subversion at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html Would providing links to these on the wiki be sufficient? What further information would you like to know? Subversion is not a radical departure from CVS and many of the commands are a one-to-one mapping. The biggest difference is commits occur for the whole repository, not on a per-file basis, and directories are tracked, as well. Let's get a discussion on this and set a date soon. Chris From bugzilla-daemon at portal.open-bio.org Tue Oct 30 14:25:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 10:25:01 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301425.l9UEP19U002945@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #2 from dalloliogm at gmail.com 2007-10-30 10:25 EST ------- The new code is good, but please consider about implementing case-insensitive searches: >>> Seq('AACCCCaa').count('a') ... 2 >>> Seq('AACCCCaa').count('a', 'i') ... 4 they could be useful in many cases, because sometimes one has to deal mixed-case sequences. I think the easiest way to implement this would be by using regular expressions.. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 18:02:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 14:02:49 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710301802.l9UI2n1J020073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #3 from Biosql at hotmail.com 2007-10-30 14:02 EST ------- (In reply to comment #2) > I'm trying to narrow down the problem: > * Have you tried different input SwissProt files? > * Have you tried a GenBank file (using the GenBank parser)? > * Did you check the username/password as suggested on the mailing list (empty > strings look wrong to me)? > > Peter > I'm sorry Peter, the reply you sent me on the mailing list was cut in half and I didn't see the rest of your message until I've read it directly on the mailing list. I tried to parse the cor6_6.gb with the Genbank parser and I'm getting the same result, sorry I didn't tried this before. I also tried what you suggest with the SeqIO module with the cor6_6.gb and also a SwissProt file and I'm still getting the same TypeError, which is : Traceback (most recent call last): File "DB_Gen.py", line 25, in db.load(iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "build/bdist.macosx-10.4-ppc/egg/MySQLdb/cursors.py", line 151, in execute TypeError: not all arguments converted during string formatting It seems to me that the problem could be with the MySQLdb module, but I don't understant since I'm using the latest release 1.2.2c1, but I've also tried it with the stable 1.2.2 release. Am I right ? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:06:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:06:38 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301906.l9UJ6cDZ023596@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST ------- I really don't want to make the Seq count method different to the python string count method. Speaking of which, the string uses count(sub [, start[, end]]) to allow searching with a optional start and further optional end index. I should probably add that. In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is a simple enough way of doing things. Counting case insensistive variants of a longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the python re library would work directly on Seq objects (without having to explicitly turn them into strings first). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:06:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:06:52 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710301906.l9UJ6q7l023634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST ------- Thanks for that. It looks like we can *probably* rule out a problem in the sequence parsing. Unfortunately I personally haven't used BioSQL myself (yet), and don't have a system setup here I can try this on. It appears (just from reading the stack error) that there is some mis-match between the SQL query (which I assume contains python % placeholders) and the list of arguments (to go in these placeholders). If you fancy trying to investigate this further yourself, I would start by adding a break point on BioSQL/BioSeqDatabase.py line 277 to check out what contents of the sql and args variables are. Or, just add some print statements just before line 277: self.cursor.execute(sql, args or ()) I hope someone else on the mailing list will have some suggestions... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:22:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:22:30 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301922.l9UJMUoM024725@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #4 from howard.salis at gmail.com 2007-10-30 15:22 EST ------- How about the upper and lower methods for Seq classes? Then, one could do my_seq.upper().count("ATG") Would that work well? -Howard (In reply to comment #3) > I really don't want to make the Seq count method different to the python string > count method. > > Speaking of which, the string uses count(sub [, start[, end]]) to allow > searching with a optional start and further optional end index. I should > probably add that. > > In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is > a simple enough way of doing things. Counting case insensistive variants of a > longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the > python re library would work directly on Seq objects (without having to > explicitly turn them into strings first). > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 19:30:29 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 19:30:29 +0000 Subject: [Biopython-dev] Biopython SVN Transition In-Reply-To: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> Message-ID: <47278655.8090300@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi all, > > Biopython just got the okay from OpenBio to transition from CVS to > Subversion--a good step in the right direction (though I've recently > started transitioning from SVN to Bazaar VCS). All we have to do is > come up with a date when the CVS repository can be locked down and > taken offline. I was wondering if anyone would start suggesting moving to git or something else ;) Michiel - are you expecting any complications from CVS to SVN regarding the build process? Another thought; will existing developer accounts "just work" on the SVN system? Also do you (Chris) have CVS access, and if not do you need or want it? > Also, I need to know what is needed from me in terms of helping all > the devs migrate to SVN. I produced a screencast series on Subversion > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash plugin working on my 64bit Ubuntu. I'll have to check that out on Windows later in the week :) If you are able to field any queries on the mailing list, that would probably be fine. > Would providing links to these on the wiki be sufficient? If you could look after that aspect of the wiki, that would be great. > What further information would you like to know? Subversion is not a > radical departure from CVS and many of the commands are a one-to-one > mapping. The biggest difference is commits occur for the whole > repository, not on a per-file basis, and directories are tracked, as > well. The fact the CVS and SVN are relatively similar is probably one reason why no-one has raised any real objections to the move. > Let's get a discussion on this and set a date soon. In terms of timing, how long do you/the OBF guys expect the transfer to take? And would they prefer to do this over a weekend or mid week? Barring any problems with Biopython 1.44 which would force us to do another release in the very short term, I guess in the next fortnight is reasonable (especially if we only expect a couple of days downtime). Of course, I personally want to start working on the Seq objects and alignments - and Tiago wants to get back to his Population Genetics module. Peter P.S. Would you or any of the people doing the transition be able to sort out bug 2363? http://bugzilla.open-bio.org/show_bug.cgi?id=2363 From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:33:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:33:40 -0400 Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count() method only works for single residues In-Reply-To: Message-ID: <200710301933.l9UJXedO025330@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2386 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:33 EST ------- Adding .upper() and .lower() methods is on my mental todo list, just a bit lower down the my priorities than the .count() method (this bug) and biological methods covered on bug 2381. One of us should file an enhancement bug for .upper() and .lower() I agree they are needed to make the Seq object more string like. However the implementation is non-trivial due to the alphabet object (which may define a case sensitive list of expected letters). And yes, once these methods are supported then doing my_seq.upper().count("ATG") would work fine. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:45:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 15:45:35 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710301945.l9UJjZlQ026374@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Make SeqRecord subclass Seq |Make Seq more like a string, |subclass string? |even subclass string? ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:45 EST ------- I modified the title to focus on the Seq object. See also bug 2386 (about the count method), and bug 2381 (about biological methods). (In reply to comment #4) > (In reply to comment #3) > > It does not add any .short() method to give a truncated representation > > string like the current str() method gives. > > Why not? This new method should not cause any compatibility problem Mainly because I'm not convinced that we need a .short() method, and its harder to remove things at a later date (as people may be using them). Surely my_seq[:50] or depending on the context, str(my_seq[:50]), is enough? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Oct 30 22:32:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 18:32:12 -0400 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200710302232.l9UMWCb3004960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #5 from Biosql at hotmail.com 2007-10-30 18:32 EST ------- It seems that a %s is missing at line 243 in Loader.py, since there's only 6 %s in the sql query, but 7 arguments are being fed for the loading of bioentry. So I added an %s and the loading is fine, but another problem is arising after this. Traceback (most recent call last): File "DB_Gen.py", line 25, in db.load(iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 415, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 253, in _load_bioentry_table bioentry_id = self.adaptor.last_id('bioentry') File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 148, in last_id return self.dbutils.last_id(self.cursor, table) File "/sw/lib/python2.5/site-packages/BioSQL/DBUtils.py", line 35, in last_id return cursor.insert_id() AttributeError: 'Cursor' object has no attribute 'insert_id' I'm gonna check it tommorow. Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 22:36:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 22:36:43 +0000 Subject: [Biopython-dev] BioRegistry, Bio.db In-Reply-To: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> References: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com> Message-ID: <4727B1FB.2020803@maubp.freeserve.co.uk> Peter wrote: > While looking over the Tutorial this evening (and making some sequence > related updates), I noticed that the section "BioRegistry ? > automatically ?nding sequence sources" (in the Cook Book chapter) > doesn't work any more. Does anyone here use this? Should I ask on the main list? > I believe that Bio.db is setup by the complicated and un-commented > code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was > commented out for Biopython 1.44 Confirmed. After uncommenting the call to _load_registries() in Bio/__init__.py the example in the tutorial using Bio.db works. Note you do get a DeprecationWarning about the concurrent behaviour provided by Bio.MultiProc, but I have not explored any further. Thoughts? Peter From mdehoon at c2b2.columbia.edu Wed Oct 31 01:05:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 30 Oct 2007 21:05:22 -0400 Subject: [Biopython-dev] Biopython SVN Transition References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu> > Michiel - are you expecting any complications from CVS to SVN regarding > the build process? For the build process, we are not doing anything very complicated with CVS, so I doubt that there will be any major problems when we start using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Wed Oct 31 01:05:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 30 Oct 2007 21:05:22 -0400 Subject: [Biopython-dev] Biopython SVN Transition References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu> > Michiel - are you expecting any complications from CVS to SVN regarding > the build process? For the build process, we are not doing anything very complicated with CVS, so I doubt that there will be any major problems when we start using SVN. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2845 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Wed Oct 31 01:30:20 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Oct 2007 21:30:20 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710310130.l9V1UKEN014287@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-30 21:30 EST ------- First, let's think about how a Seq object should look like, before getting into implementation details. In my opinion, a Seq object is essentially a string, but with some added functionality that are useful in biological contexts. Currently, this is limited to specifying an alphabet. Personally, I never used such an alphabet, so in practice I prefer using a simple string instead of a Seq object. However, if we extend its functionality, I think a Seq class can be useful enough to warrant its existence in Biopython. In short, to my mind a Seq object should have the following properties: 1) A Seq object is basically a string, so it should behave as if it were subclassed from string. 2) As a result, functions that have a sequence as an argument, but don't need the added features of a Seq object, should work with strings as well as Seq objects. 3) The sequence should be mutable, so that we won't need a separate MutableSeq class. This also implies that a Seq class cannot subclass from string, since strings are not mutable. 4) Currently, Seq objects have an associated alphabet; SeqRecord objects have annotations, dbxrefs, a description, features, id, and name. I think a new Seq object should have both, so that we can avoid having both a Seq and a SeqRecord class. Of course, some or all of these fields can remain None. 5) A Seq class should have methods that one expects from a sequence class, in particular complement(), reverse_complement(), perhaps a modified count() that can ignore case. With respect to 3), we'd probably have to write such a Seq class in C. The end result would be a Seq class that actually has some benefit to the user, without requiring its use when a string suffices, and avoids having three classes (Seq, MutableSeq, SeqRecord) for essentially the same thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Wed Oct 31 05:55:03 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 31 Oct 2007 01:55:03 -0400 Subject: [Biopython-dev] Biopython SVN Transition In-Reply-To: <47278655.8090300@maubp.freeserve.co.uk> References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com> <47278655.8090300@maubp.freeserve.co.uk> Message-ID: <128a885f0710302255y4c34ac8axa48f48b253d5854a@mail.gmail.com> On 10/30/07, Peter wrote: > I was wondering if anyone would start suggesting moving to git or > something else ;) I tried Git and didn't like it. Bazaar suits me much better, and it even has support for SVN repositories with bzr-svn. Git is not truly cross-platform. It performs terribly on Windows. This left me looking at Mercurial (Hg) and Bazaar (bzr). I liked the direction that Bazaar was moving in and their emphasis on testing with real unit/regression tests. For those interested, you can see some of the "literature" I read through on my del.icio.us page: http://del.icio.us/gotgenes/dscm > Another thought; will existing developer accounts "just work" on the SVN > system? Also do you (Chris) have CVS access, and if not do you need or > want it? The existing developer accounts will "just work" because they're going to do SVN over SSH. I have SSH access on the machine and CVS access as well. Thanks for checking. > > Also, I need to know what is needed from me in terms of helping all > > the devs migrate to SVN. I produced a screencast series on Subversion > > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a > > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html > > Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash > plugin working on my 64bit Ubuntu. I'll have to check that out on > Windows later in the week :) Bummer! Does nspluginwrapper not work? > If you are able to field any queries on the mailing list, that would > probably be fine. I'd be happy to do that. Should this page be renamed to SVN to be in the same line as tho CVS page? > > Would providing links to these on the wiki be sufficient? > > If you could look after that aspect of the wiki, that would be great. At some point I had started this: http://biopython.org/wiki/Subversion_migration > > What further information would you like to know? Subversion is not a > > radical departure from CVS and many of the commands are a one-to-one > > mapping. The biggest difference is commits occur for the whole > > repository, not on a per-file basis, and directories are tracked, as > > well. > > The fact the CVS and SVN are relatively similar is probably one reason > why no-one has raised any real objections to the move. > > > Let's get a discussion on this and set a date soon. > > In terms of timing, how long do you/the OBF guys expect the transfer to > take? And would they prefer to do this over a weekend or mid week? Not sure, let me ask Jason Stajich. > Barring any problems with Biopython 1.44 which would force us to do > another release in the very short term, I guess in the next fortnight is > reasonable (especially if we only expect a couple of days downtime). I think we could expect less than a full day downtime. > Of course, I personally want to start working on the Seq objects and > alignments - and Tiago wants to get back to his Population Genetics module. By all means, continue using CVS until I get a firm date for the Biopython Devs. Even if you have uncommitted changes when the CVS server goes down, you can simply copy the files to your checked out copy of the SVN repository and continue as is. > P.S. Would you or any of the people doing the transition be able to sort > out bug 2363? > http://bugzilla.open-bio.org/show_bug.cgi?id=2363 That's a very good question. I wonder if cvs2svn is capable of picking up those errors in commits and choose the proper format. I had trouble getting a hold of an expert who could tell me how to identify files committed as binary files, and how to change that to text (or vice versa). I should send an email to the Subversion mailing list, perhaps, or the CVS list if it's still active. I'll also check to see if Jason knows. From bugzilla-daemon at portal.open-bio.org Wed Oct 31 09:54:24 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 31 Oct 2007 05:54:24 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200710310954.l9V9sOw7014572@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-31 05:54 EST ------- > In short, to my mind a Seq object should have the following properties: > 1) A Seq object is basically a string, so it should behave as if it were > subclassed from string. I agree, where possible the Seq object should act like a string. In particular str(my_seq) should give the full string. > 2) As a result, functions that have a sequence as an argument, but don't > need the added features of a Seq object, should work with strings as well > as Seq objects. Again, I agree. I've doubled checked this works for some of the recently updated SeqUtils functionality. I would hope we get this "for free" once the Seq object itself becomes more string like. > 3) The sequence should be mutable, so that we won't need a separate > MutableSeq class. This also implies that a Seq class cannot subclass from > string, since strings are not mutable. Why? Python strings are not mutable, and this isn't usually a problem. Personally, I have never needed a mutable sequence and have only ever used them in test cases. Having the basic Seq non-mutable means we can leverage existing string functionality and optimizations. Also writing a new mutable sequence in C seems like a bit maintainance load in the long term (and may complicate the cross platform build process). Surely we can get good enough performance via the array of characters route currently used? On related remark: The fact that the current MutableSeq methods like reverse_complement() work in-situ rather than returning a new object makes switching between the Seq and MutableSeq fiddly. > 4) Currently, Seq objects have an associated alphabet; SeqRecord objects > [also] have annotations, dbxrefs, a description, features, id, and name. > I think a new Seq object should have both, so that we can avoid having both > a Seq and a SeqRecord class. Of course, some or all of these fields can > remain None. I don't really see the benefit over the current scheme. I'm happy with the division between Seq and SeqRecord, but we could go for SeqRecord being a more annotated subclass of the Seq class. This would be similar to Bioperl's Seq, PrimarySeq, or RichSeq objects. Something I do want to add is splicing for SeqRecords, which would return a new SeqRecord with sensible name/id/description. I think for this to really be useful we need to add "per residue annotation", such as lists or strings of information the same length as the sequence (e.g. predicted secondary structure, or sequencing quality scores) which would also get spliced when splicing a SeqRecord. > 5) A Seq class should have methods that one expects from a sequence class, > in particular complement(), reverse_complement(), perhaps a modified count() > that can ignore case. Usually mixed case sequences are used for a reason, and the user may need both case sensitive counts and case insensitive counts. I would keep .count() case sensistive like a real string, and suggest .upper().count() as a simple workarround for case in-sensitive counts. Plus the Seq object should have methods for forward and back transcription and translation, see Bug 2381 A more drastic change we could consider is getting rid of the alphabet as an explicit property, and having ProteinSeq, NucleotideSeq, DnaSeq and RnaSeq (decorator/sub)classes which would have only the relevant biological sequence methods. We would lose the expected "letters" feature of the alphabet, but I don't think this is really helpful at the moment because the Seq class does not enforce it. Otherwise I would advocate when creating a Seq object (or editing a MutableSeq object) the new letters should be screened against self.alphabet.letters (if present). On balance I favour making gradual changes which don't change the current scheme (Seq with Alphabet property; SeqRecord with Seq property). Anything more drastic might best be pursued on a new branch which could become Biopython 2.0 P.S. We should try not to implicitly assume that the elements in a sequence are single letters? What about when working with protein structures which contain modified amino acids (with defined three letter codes) which do not map back to single letters. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.