From bugzilla-daemon at portal.open-bio.org Tue Oct 2 05:09:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 2 Oct 2007 05:09:48 -0400
Subject: [Biopython-dev] [Bug 2362] test_copen fails on Windows XP as tries
os.fork()
In-Reply-To:
Message-ID: <200710020909.l9299moD015903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2362
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-10-02 05:09 EST -------
I removed test_copen.py from CVS and deprecated the Bio.MultiProc code.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 2 05:06:54 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 05:06:54 -0400
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Hi everybody,
Since no users of Bio.MultiProc came forward, I deprecated it for the
upcoming release.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
Sent: Tue 9/11/2007 10:37 AM
To: BioPython Developers List; biopython at biopython.org
Subject: [BioPython] Bio.MultiProc
Hi everybody,
In preparation for the upcoming release, I was running the Biopython
test suite and found that test_copen.py hangs on Cygwin. It doesn't
fail, it just sits there forever. This may be related to the use of
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
is probably possible to fix this, I'd have to dig fairly deep into the
code, and I am not sure if it is worth it. It looks like the copen
functions are used only in Bio/config, which is needed for Bio.db. A
description of the functionality of thia module can be found in the
tutorial section 4.7.2.
Now, I don't remember users asking about this module on the mailing
list. From the tutorial documentation, it seems to be a nice piece of
code, but I doubt that it is being used often in practice.
So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release?
Hopefully, people who are using this code will notice, and let us know
that they need it.
--Michiel.
_______________________________________________
BioPython mailing list - BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
From idoerg at gmail.com Tue Oct 2 12:00:41 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Tue, 2 Oct 2007 09:00:41 -0700
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
References: <46E6A845.3030601@c2b2.columbia.edu>
<6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID:
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:
1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).
Also, is it possible to track down the original author?
./I
On 10/2/07, Michiel De Hoon wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
I. Friedberg
"The only problem with troubleshooting is that
sometimes trouble shoots back."
From biopython-dev at maubp.freeserve.co.uk Tue Oct 2 12:55:53 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 02 Oct 2007 17:55:53 +0100
Subject: [Biopython-dev] Bio.MultiProc / Bio.FormatIO
In-Reply-To:
References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <47027819.1010207@maubp.freeserve.co.uk>
Iddo Friedberg wrote:
> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?
That is sort of what Michiel did - he's just added a deprecation
warning, but not touched the code itself.
This isn't an option for some of the more "integrated" bits of code like
Bio.FormatIO which I suggested removing in Bug 2361 (see also my email
to the main list on 19 September):
http://bugzilla.open-bio.org/show_bug.cgi?id=2361#c27
Peter
From rhaygood at duke.edu Tue Oct 2 19:59:43 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 2 Oct 2007 19:59:43 -0400 (EDT)
Subject: [Biopython-dev] Statistics code
In-Reply-To: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
Message-ID:
Tiago,
Sorry to be so long replying---I've been almost drowning in work.
Use anything you find useful in my code. If you do write an article
about it, I'd be glad to be a coauthor, not just in name but actually
to help with writing the discussion of sequence statistics.
There *is* a lot of stuff in my code, not all of it generally
important. For example, few people will care about indel statistics,
beyond counting them and maybe getting the frequency distribution of
their lengths. The things most people will care about are K (the
number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu
and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing.
As for ambiguous nucleotides, my code handles them in one of two ways,
at the programmer's option. By default, a site at which any sequence
in the alignment contains an ambiguous nucleotide is ignored; for
example,
ACRGTY
ACAGTC
is effectively equivalent to
ACGT
ACGT .
However, if the 'expand_diplotypes' option is specified when the
Sample object is constructed, each sequence in the alignment is
interpreted as a diplotype and converted into a pair of pseudo-
haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K)
being interpreted as heterozygous; for example,
ACRGTY
ACAGTC
is effectively equivalent to
ACAGTC
ACGGTT
ACAGTC
ACAGTC .
In expand_diplotypes mode, sites containing three- or four-fold
ambiguous nucleotides are still ignored. Also, you'll get a warning
if you request a statistic that depends on correct SNP phasing, which
most statistics don't. So far, I've found these two operating modes
sufficient for my needs.
I think your plan sounds very reasonable, just adding sequence
statistics at a pace that's comfortable for you. Any time you have
questions, feel free to ask me, and I'll give you whatever benefit
there is in my opinion and experience.
I'm happy for all this to happen on biopython-dev, so that other
people (e.g., Alex Lancaster) can add to it. I'll leave it to the
core developers to tell us if we're too noisy. (I'd recommend still
sending messages to me with copies to biopython-dev, however, so that
I don't accidentally miss them on biopython-dev, which I don't always
read carefully.)
Ralph
On Sat, 29 Sep 2007, Tiago Ant?o wrote:
> Hi Ralph,
>
> Hope all is good with you. I am now finally starting to commit
> statistics code to Biopython. But before I go ahead I would like to
> ask some advice to you (plus some extra comments):
>
> About code merging and authorship:
>
> I am finally looking to your code. There is really lots of stuff
> there! Would it be OK with you if I merged your code with mine into
> Bio.PopGen.Stats? Obviously the copyright/authorship for the module
> would be co-shared as would any authorship of any article deriving
> from it...
>
> About a strategy to advance:
>
> 1. I personally don't have any experience, really, with working with
> sequence data (My background are SNPs, microsatellites/STRs, AFLPs and
> that sort of stuff)
> 2. Starting on Monday I am beginning a PhD which will require, part
> time, sequence analysis
> 3. What I mean from 1 and 2 is that I currently don't have maturity to
> architect and design a good framework for sequence analysis but I will
> gain it with time.
> My plan is then to defer all sequence code until I fell I know what I
> am doing (although I was still thinking in providing something like
> BioPerl's facility of extracting all SNPs from sequences)
> If this is OK with you I plan to start committing code the week
> starting on this Monday,
>
> About request for insight:
>
> If you have any comments to offer on issues regarding representing
> indels and ambiguous data (ie ambiguous nucleotides) they might be
> useful, as I suppose that is the biggest issue that makes me afraid of
> sequence code.
>
>
> Finally: I would summarize our discussion here on biopython-dev (I am
> not taking it there directly just because you might not want your code
> on Biopython or might want it in other terms).
>
> Thanks,
> Tiago
>
From mdehoon at c2b2.columbia.edu Tue Oct 2 20:18:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 20:18:59 -0400
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu>
> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?
That is what I did.
> 3) Leave an option of fixing and commenting the code back in (i.e. it is
not
> lost forever).
Even after removing the code in some future release, the code will not be
lost forever. It can always be retrieved from CVS and from older Biopython
releases.
> Also, is it possible to track down the original author?
That would be Jeff Chang.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Iddo Friedberg [mailto:idoerg at gmail.com]
Sent: Tue 10/2/2007 12:00 PM
To: Michiel De Hoon
Cc: BioPython Developers List; biopython at biopython.org
Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:
1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).
Also, is it possible to track down the original author?
./I
On 10/2/07, Michiel De Hoon wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
I. Friedberg
"The only problem with troubleshooting is that
sometimes trouble shoots back."
From tiagoantao at gmail.com Wed Oct 3 06:14:33 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 11:14:33 +0100
Subject: [Biopython-dev] Coalescent code
Message-ID: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
Hi,
I had a plan of starting to commit statistical related code this
weekend, but (contrary to my expectations) I am having requests for
the coalescent code. As such, I am planning to commit the coalescent
code instead.
It is quite straightforward code, with only one issue that I would
require advice: Some of the code (regarding modeling demographies)
requires some templates (very small text files, circa 10 of around 700
bytes each) to go along. Where should I put the files in Biopython?
Also, on installation those files have to be put somewhere...
Tiago
--
http://www.tiago.org/ps
From biopython-dev at maubp.freeserve.co.uk Wed Oct 3 10:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 03 Oct 2007 15:18:21 +0100
Subject: [Biopython-dev] Coalescent code
In-Reply-To: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
Message-ID: <4703A4AD.7030008@maubp.freeserve.co.uk>
Tiago Ant?o wrote:
> It is quite straightforward code, with only one issue that I would
> require advice: Some of the code (regarding modeling demographies)
> requires some templates (very small text files, circa 10 of around 700
> bytes each) to go along. Where should I put the files in Biopython?
> Also, on installation those files have to be put somewhere...
There is a similar precedent with Bio/EUtils/DTDs (where the data files
are XML DTD files). I guess you could have the 10 plain text data files
in with the python files (or under a subdirectory). Opinions?
I should really refresh myself on current python packaging guidelines...
Peter
From tiagoantao at gmail.com Wed Oct 3 11:37:17 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 16:37:17 +0100
Subject: [Biopython-dev] Statistics code
In-Reply-To:
References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
Message-ID: <6d941f120710030837k1aa2d4ak7eca8e6e27e35fdd@mail.gmail.com>
Ralph,
Thanks for the detailed explanation. Because of a couple of requests I
had, I am going to commit first the coalescent code, but after the
coalescent code is in, I will pick this up.
Tiago
On 10/3/07, Ralph Haygood wrote:
> Tiago,
>
> Sorry to be so long replying---I've been almost drowning in work.
>
> Use anything you find useful in my code. If you do write an article
> about it, I'd be glad to be a coauthor, not just in name but actually
> to help with writing the discussion of sequence statistics.
>
> There *is* a lot of stuff in my code, not all of it generally
> important. For example, few people will care about indel statistics,
> beyond counting them and maybe getting the frequency distribution of
> their lengths. The things most people will care about are K (the
> number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu
> and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing.
>
> As for ambiguous nucleotides, my code handles them in one of two ways,
> at the programmer's option. By default, a site at which any sequence
> in the alignment contains an ambiguous nucleotide is ignored; for
> example,
>
> ACRGTY
> ACAGTC
>
> is effectively equivalent to
>
> ACGT
> ACGT .
>
> However, if the 'expand_diplotypes' option is specified when the
> Sample object is constructed, each sequence in the alignment is
> interpreted as a diplotype and converted into a pair of pseudo-
> haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K)
> being interpreted as heterozygous; for example,
>
> ACRGTY
> ACAGTC
>
> is effectively equivalent to
>
> ACAGTC
> ACGGTT
> ACAGTC
> ACAGTC .
>
> In expand_diplotypes mode, sites containing three- or four-fold
> ambiguous nucleotides are still ignored. Also, you'll get a warning
> if you request a statistic that depends on correct SNP phasing, which
> most statistics don't. So far, I've found these two operating modes
> sufficient for my needs.
>
> I think your plan sounds very reasonable, just adding sequence
> statistics at a pace that's comfortable for you. Any time you have
> questions, feel free to ask me, and I'll give you whatever benefit
> there is in my opinion and experience.
>
> I'm happy for all this to happen on biopython-dev, so that other
> people (e.g., Alex Lancaster) can add to it. I'll leave it to the
> core developers to tell us if we're too noisy. (I'd recommend still
> sending messages to me with copies to biopython-dev, however, so that
> I don't accidentally miss them on biopython-dev, which I don't always
> read carefully.)
>
> Ralph
>
> On Sat, 29 Sep 2007, Tiago Ant?o wrote:
>
> > Hi Ralph,
> >
> > Hope all is good with you. I am now finally starting to commit
> > statistics code to Biopython. But before I go ahead I would like to
> > ask some advice to you (plus some extra comments):
> >
> > About code merging and authorship:
> >
> > I am finally looking to your code. There is really lots of stuff
> > there! Would it be OK with you if I merged your code with mine into
> > Bio.PopGen.Stats? Obviously the copyright/authorship for the module
> > would be co-shared as would any authorship of any article deriving
> > from it...
> >
> > About a strategy to advance:
> >
> > 1. I personally don't have any experience, really, with working with
> > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and
> > that sort of stuff)
> > 2. Starting on Monday I am beginning a PhD which will require, part
> > time, sequence analysis
> > 3. What I mean from 1 and 2 is that I currently don't have maturity to
> > architect and design a good framework for sequence analysis but I will
> > gain it with time.
> > My plan is then to defer all sequence code until I fell I know what I
> > am doing (although I was still thinking in providing something like
> > BioPerl's facility of extracting all SNPs from sequences)
> > If this is OK with you I plan to start committing code the week
> > starting on this Monday,
> >
> > About request for insight:
> >
> > If you have any comments to offer on issues regarding representing
> > indels and ambiguous data (ie ambiguous nucleotides) they might be
> > useful, as I suppose that is the biggest issue that makes me afraid of
> > sequence code.
> >
> >
> > Finally: I would summarize our discussion here on biopython-dev (I am
> > not taking it there directly just because you might not want your code
> > on Biopython or might want it in other terms).
> >
> > Thanks,
> > Tiago
> >
--
http://www.tiago.org/ps
From tiagoantao at gmail.com Wed Oct 3 12:04:07 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 17:04:07 +0100
Subject: [Biopython-dev] Coalescent code
In-Reply-To: <4703A4AD.7030008@maubp.freeserve.co.uk>
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
<4703A4AD.7030008@maubp.freeserve.co.uk>
Message-ID: <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com>
Hi
On 10/3/07, Peter wrote:
> There is a similar precedent with Bio/EUtils/DTDs (where the data files
> are XML DTD files). I guess you could have the 10 plain text data files
> in with the python files (or under a subdirectory). Opinions?
In the mean time, I will start committing the code (I can easily
accommodate the details of the places to put the files later, when
there is a decision).
Michiel, please, please don't include SimCoal code that I will be
committing on the next public version.
Regards,
Tiago
From mdehoon at c2b2.columbia.edu Wed Oct 3 20:39:47 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 3 Oct 2007 20:39:47 -0400
Subject: [Biopython-dev] Coalescent code
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com><4703A4AD.7030008@maubp.freeserve.co.uk>
<6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62E@mail2.exch.c2b2.columbia.edu>
> Michiel, please, please don't include SimCoal code that I will be
> committing on the next public version.
To avoid confusion, please don't commit code to CVS that you don't want to be
included in the next Biopython release.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Tiago Ant?o
Sent: Wed 10/3/2007 12:04 PM
To: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] Coalescent code
Hi
On 10/3/07, Peter wrote:
> There is a similar precedent with Bio/EUtils/DTDs (where the data files
> are XML DTD files). I guess you could have the 10 plain text data files
> in with the python files (or under a subdirectory). Opinions?
In the mean time, I will start committing the code (I can easily
accommodate the details of the places to put the files later, when
there is a decision).
Michiel, please, please don't include SimCoal code that I will be
committing on the next public version.
Regards,
Tiago
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Wed Oct 3 22:10:13 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 3 Oct 2007 22:10:13 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710040210.l942ADGF030763@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-10-03 22:10 EST -------
Looking at the patch for Bio.FormatIO:
-------------------------
#Would like to have just issued a deprecation warning, and removed this
#module later. However, due to the FormatIO code in Bio/SeqRecord.py the
#deprecation warning would be triggered whenever someone used the SeqRecord.
raise ImportError, "Bio.FormatIO has been removed. Please try Bio.SeqIO
instead"
-------------------------
Since the patch for Bio/SeqRecord.py removes its dependence on Bio.FormatIO, is
it still necessary to raise an ImportError instead of issuing a
DeprecationWarning?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 05:44:09 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 05:44:09 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710050944.l959i9BX029760@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-05 05:44 EST -------
In terms of typical usage, SeqRecord does not depend on FormatIO
However, from a code perspective, FormatIO and SeqRecord "depend" on each
other.
If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not
depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken
module, I wanted to remove it. A DeprecationWarning doesn't seem right if
FormatIO is removed, which is why I suggested an ImportError.
We might be able instead to MOVE the FormatIO hooks out of SeqRecord and then
issue a DeprecationWarning for FormatIO ... but it looks rather complicated,
and probably means tackling the Bio.config code as well.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 07:05:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 07:05:49 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710051105.l95B5nXW001755@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2007-10-05 07:05 EST -------
> If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not
> depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken
> module, I wanted to remove it. A DeprecationWarning doesn't seem right if
> FormatIO is removed, which is why I suggested an ImportError.
OK, I see. As far as I'm concerned, your patch is fine then.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 09:46:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 09:46:51 -0400
Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython
In-Reply-To:
Message-ID: <200710051346.l95Dkpc2010074@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2174
tiagoantao at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #6 from tiagoantao at gmail.com 2007-10-05 09:46 EST -------
It is implemented, documented and with test code.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Fri Oct 5 10:26:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Oct 2007 15:26:43 +0100
Subject: [Biopython-dev] Configuration files
Message-ID: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
Hi,
Is there any (Biopython standard) way to configure Biopython during
runtime? When writing code sometimes I think it would be very
convenient (especially to the programmer using Biopython) to abstract
some configuration parameters away from the code. Things like the
location of binaries, hosts, user names (and maybe passwords) of
databases, timeout parameters, etc. These could be stored on a
configuration file (or registry entry, or whatever) thus saving users
to have to deal in the code with supplying these...
Just an idea...
Tiago
--
http://www.tiago.org/ps
From bugzilla-daemon at portal.open-bio.org Mon Oct 8 07:14:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 8 Oct 2007 07:14:30 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710081114.l98BEUZh019757@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #759 is|0 |1
obsolete| |
------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:14 EST -------
(From update of attachment 759)
Applied these changes to CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 8 06:52:48 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 08 Oct 2007 11:52:48 +0100
Subject: [Biopython-dev] Configuration files
In-Reply-To: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
Message-ID: <470A0C00.50505@maubp.freeserve.co.uk>
Tiago Ant?o wrote:
> Hi,
>
> Is there any (Biopython standard) way to configure Biopython during
> runtime? When writing code sometimes I think it would be very
> convenient (especially to the programmer using Biopython) to abstract
> some configuration parameters away from the code. Things like the
> location of binaries, hosts, user names (and maybe passwords) of
> databases, timeout parameters, etc. These could be stored on a
> configuration file (or registry entry, or whatever) thus saving users
> to have to deal in the code with supplying these...
> Just an idea...
This sounds like a fairly general thing (i.e. for all of python) rather
than being Biopython specific.
For example, I find a lot of my scripts have a few if statements at the
top setting locations of files and executables based on which
user/machine I'm running on (I use both Windows and a couple of Linux
boxes with different user names).
e.g. Where are the blast executables, the blast databases, and my genome
collection, ...
Peter
From bugzilla-daemon at portal.open-bio.org Mon Oct 8 07:30:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 8 Oct 2007 07:30:03 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710081130.l98BU36u021016@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:30 EST -------
Recap, most of the issues were resolved by switching Bio.Fasta from Martel to
pure python. Additionally:
test_Fasta - 'fixed' by deprecating the Mindy indexing functions
test_KEGG - fixed by switching from Martel to pure python
test_format_registry - 'fixed' by removing FormatIO
test_geo - fixed by switching from Martel to pure python
test_GenBankFormat - this entire test is for the little-used Martel GenBank
expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 9 00:34:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 9 Oct 2007 00:34:28 -0400
Subject: [Biopython-dev] Output of Biopython tests
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
Hi everybody,
With the help of several Biopython developers, especially Peter, the problems
with Martel and the new mxTextTools release have now been solved (in the
sense that all unit tests now succeed). So we're a lot closer to a new
Biopython release. Thanks everybody!
When I was running the Biopython tests, one thing bothered me though. All
Biopython tests now have a corresponding output file that contains the output
the test should generate if it runs correctly. For some tests, this makes
perfect sense, particularly if the output is large. For others, on the other
hand, having the test output explicitly in a file doesn't actually add much
information. For example, the output for test_psw is
test_psw
test_AlignmentColumn_assertions (test_psw.TestPSW) ... ok
test_AlignmentColumn_full (test_psw.TestPSW) ... ok
test_AlignmentColumn_kinds (test_psw.TestPSW) ... ok
test_AlignmentColumn_repr (test_psw.TestPSW) ... ok
test_Alignment_assertions (test_psw.TestPSW) ... ok
test_Alignment_normal (test_psw.TestPSW) ... ok
test_ColumnUnit (test_psw.TestPSW) ... ok
Doctest: Bio.Wise.psw.parse_line ... ok
----------------------------------------------------------------------
Ran 8 tests in 0.002s
OK
For comparison, this is the test output if test_psw.py fails:
test_AlignmentColumn_assertions (__main__.TestPSW) ... ok
test_AlignmentColumn_full (__main__.TestPSW) ... ok
test_AlignmentColumn_kinds (__main__.TestPSW) ... FAIL
test_AlignmentColumn_repr (__main__.TestPSW) ... ok
test_Alignment_assertions (__main__.TestPSW) ... ok
test_Alignment_normal (__main__.TestPSW) ... ok
test_ColumnUnit (__main__.TestPSW) ... ok
Doctest: Bio.Wise.psw.parse_line ... ok
======================================================================
FAIL: test_AlignmentColumn_kinds (__main__.TestPSW)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_psw.py", line 47, in test_AlignmentColumn_kinds
self.assertEqual(ac.kind,
"some_funny_output_I_made_up_instead_of_INSERT")
AssertionError: 'INSERT' != 'some_funny_output_I_made_up_instead_of_INSERT'
----------------------------------------------------------------------
Ran 8 tests in 0.000s
The point is that for this test, having the output explicitly is not needed
in order to identify the problem.
Now, for some tests having the output explicitly actually causes a problem.
I'm thinking about those unit tests that only run if some particular software
is installed on the system (for example, SQL). In those cases, we need to
distinguish failure due to missing software from a true failure (the former
may not bother the user much if he's not interested in that particular part
of Biopython). If a test cannot be run because of missing prerequisites,
currently a unit test generates an ImportError, which is then caught inside
run_tests. Hence, we get the following output when running the Biopython
tests:
test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests
--
enable tests in Tests/test_BioSQL.py
ok
When you look inside test_BioSQL.py, you'll see that the actual error is not
an ImportError. In addition, if a true ImportError occurs during the test,
the test will inadvertently be treated as skipped.
My solution would be to skip tests inside test_BioSQL if the prerequisites
are not met. However, in that case the test output no longer agrees with the
expected test output, generating a failure message.
I'd therefore like to suggest the following:
1) Keep the test output, but let each test_* script (instead of run_tests.py)
be responsible of comparing the test output with the expected output.
2) If the expected output is trivial, simply use the assert statements to
verify the test output instead of storing them in a file and reading them
from there.
Any objections?
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mhobbs_of_lawson at bigpond.com Mon Oct 8 22:18:39 2007
From: mhobbs_of_lawson at bigpond.com (mhobbs_of_lawson)
Date: Tue, 9 Oct 2007 12:18:39 +1000
Subject: [Biopython-dev] translate
Message-ID: <5496247.1191896319102.JavaMail.root@web06sl>
Hi,
Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet.
Thanks,
Matthew
>>> from Bio import Seq
>>> from Bio.Alphabet import IUPAC
>>> from Bio import Translate
>>> s = "NNNTCAAAAAGGTGCATCTAGATG"
>>> dna = Seq.Seq(s, IUPAC.ambiguous_dna)
>>> trans = Translate.ambiguous_dna_by_id[1]
>>> print trans.translate(dna)
Traceback (most recent call last):
File "", line 1, in
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Translate.py", line 20, in translate
append(get(s[i:i+3], stop_symbol))
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 544, in get
return self.__getitem__(codon)
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 577, in __getitem__
raise TranslationError, codon # does not code
Bio.Data.CodonTable.TranslationError: NNN
From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 07:54:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 09 Oct 2007 12:54:29 +0100
Subject: [Biopython-dev] translate
In-Reply-To: <5496247.1191896319102.JavaMail.root@web06sl>
References: <5496247.1191896319102.JavaMail.root@web06sl>
Message-ID: <470B6BF5.607@maubp.freeserve.co.uk>
mhobbs_of_lawson wrote:
> Hi,
>
> Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet.
A very reasonable request. I assume you expect just an X for an NNN codon?
I have the general impression that some of Biopython's handling of
ambiguous sequences isn't all wonderful... something I have started to
tackle in bug 2356:
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
Obviously sequence manipulation is a core bit of functionality - and I
would like at least one other person to comment on that code before I
risk committing it ;)
Translation of ambiguous codons would be next on my hit list... as right
now it doesn't seem to do what I would expect at all.
In the short term, manually adding additional mappings to the forward
table (a python dictionary) would probably "fix" your specific issue.
While we are on this topic, we use "*" for stop codons and "X" for an
ambiguous amino acid - but is anyone aware of a character convention for
something that might be either a stop codon or an amino acid? (other
than just using "X" for this too)?
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 07:44:01 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 09 Oct 2007 12:44:01 +0100
Subject: [Biopython-dev] Output of Biopython tests
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
Message-ID: <470B6981.3020707@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> When I was running the Biopython tests, one thing bothered me though.
> All Biopython tests now have a corresponding output file that
> contains the output the test should generate if it runs correctly.
> For some tests, this makes perfect sense, particularly if the output
> is large. For others, on the other hand, having the test output
> explicitly in a file doesn't actually add much information.
Is this actually a problem? It gives us a simple unified test framework
where developers can use whatever fancy test frameworks they want to.
Personally I have tried to write simple scripts with meaningful output
(plus often additional assertions). I think that because these are very
simple, they can double as examples/documentation for the curious.
My personal view is that some of the "fancy frameworks" used in some
test cases are very intimidating to a beginner (and act as a barrier to
taking the code and modifying it for their own use).
> The point is that for this test, having the output explicitly is not
> needed in order to identify the problem.
True. I would have written that particular test to give some meaningful
output; I find it makes it easier to start debugging why a test fails.
> Now, for some tests having the output explicitly actually causes a
> problem. I'm thinking about those unit tests that only run if some
> particular software is installed on the system (for example, SQL). In
> those cases, we need to distinguish failure due to missing software
> from a true failure (the former may not bother the user much if he's
> not interested in that particular part of Biopython). If a test
> cannot be run because of missing prerequisites, currently a unit test
> generates an ImportError, which is then caught inside run_tests.
> ...
> When you look inside test_BioSQL.py, you'll see that the actual error
> is not an ImportError. In addition, if a true ImportError occurs
> during the test, the test will inadvertently be treated as skipped.
Perhaps we should introduce a MissingExternalDependency error instead,
used for this specific case, and catch that in run_tests.py, while
treating ImportError as a real error.
As you say, if we have done some dramatic restructuring (such as
removing a module) there could be some REAL ImportErrors which we might
risk ignoring.
> I'd therefore like to suggest the following:
> 1) Keep the test output, but let each test_* script (instead of
> run_tests.py) be responsible of comparing the test output with the
> expected output.
I'm not keen on that - it means duplication of code (or at least some
common functionality to call) and makes writing simple tests that little
bit harder. I like the fact that the more verbose test scripts can be
run on their own as an example of what the module can do.
> 2) If the expected output is trivial, simply use the assert
> statements to verify the test output instead of storing them in a
> file and reading them from there.
By all means, test trivial output with assertions. I already do this
within many of my "verbose" tests where I want to keep the console
output reasonably short.
Peter
From tiagoantao at gmail.com Tue Oct 9 10:27:18 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Oct 2007 15:27:18 +0100
Subject: [Biopython-dev] Configuration files
In-Reply-To: <470A0C00.50505@maubp.freeserve.co.uk>
References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
<470A0C00.50505@maubp.freeserve.co.uk>
Message-ID: <6d941f120710090727m787c08abn13665c662727446c@mail.gmail.com>
Would it be interesting to have something like
config = Bio.Config.getConfig()
fdist_path = config['PopGen.FDistDir']
Something that:
1. Would allow for a standard configuration mechanism (as opposed to
having different styles for each module/author)
2. Would abstract away how the configuration is stored (registry, conf
file, ...)
If there was an agreement on doing this (or something along these
lines), I would volunteer the time to do it.
On 10/8/07, Peter wrote:
> Tiago Ant?o wrote:
> > Hi,
> >
> > Is there any (Biopython standard) way to configure Biopython during
> > runtime? When writing code sometimes I think it would be very
> > convenient (especially to the programmer using Biopython) to abstract
> > some configuration parameters away from the code. Things like the
> > location of binaries, hosts, user names (and maybe passwords) of
> > databases, timeout parameters, etc. These could be stored on a
> > configuration file (or registry entry, or whatever) thus saving users
> > to have to deal in the code with supplying these...
> > Just an idea...
>
> This sounds like a fairly general thing (i.e. for all of python) rather
> than being Biopython specific.
>
> For example, I find a lot of my scripts have a few if statements at the
> top setting locations of files and executables based on which
> user/machine I'm running on (I use both Windows and a couple of Linux
> boxes with different user names).
>
> e.g. Where are the blast executables, the blast databases, and my genome
> collection, ...
>
> Peter
>
>
--
http://www.tiago.org/ps
From mhobbs_of_lawson at bigpond.com Tue Oct 9 19:07:43 2007
From: mhobbs_of_lawson at bigpond.com (Matthew Hobbs)
Date: Wed, 10 Oct 2007 09:07:43 +1000
Subject: [Biopython-dev] translate
In-Reply-To: <470B6BF5.607@maubp.freeserve.co.uk>
References: <5496247.1191896319102.JavaMail.root@web06sl>
<470B6BF5.607@maubp.freeserve.co.uk>
Message-ID: <470C09BF.8050906@bigpond.com>
Thanks Peter for your reply.
Peter wrote:
> mhobbs_of_lawson wrote:
>> Please can someone tell me what is wrong here. I simply want to be
>> able to translate ambiguous DNA which includes an 'NNN' triplet.
>
> A very reasonable request. I assume you expect just an X for an NNN codon?
yep
> In the short term, manually adding additional mappings to the forward
> table (a python dictionary) would probably "fix" your specific issue.
OK - so this works:
from Bio import Seq
from Bio.Alphabet import IUPAC
from Bio import Translate
s = "NNNTCAAAAAGGTGCATCTAGATG"
dna = Seq.Seq(s, IUPAC.ambiguous_dna)
trans = Translate.ambiguous_dna_by_id[1]
trans.table.forward_table.forward_table['NNN'] = 'X'
print trans.translate(dna)
> While we are on this topic, we use "*" for stop codons and "X" for an
> ambiguous amino acid - but is anyone aware of a character convention for
> something that might be either a stop codon or an amino acid? (other
> than just using "X" for this too)?
No I don't know
Thanks,
Matthew
From mdehoon at c2b2.columbia.edu Thu Oct 11 06:31:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:31:59 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
<470B6981.3020707@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
> Perhaps we should introduce a MissingExternalDependency error instead,
> used for this specific case, and catch that in run_tests.py, while
> treating ImportError as a real error.
OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
MissingExternalDependencyError occurs in a test, a warning is printed but it
is not counted as a failure.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Thu Oct 11 06:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:44:56 -0400
Subject: [Biopython-dev] function enumerate in Bio/GFF/GenericTools.py;
Bio/DocSQL.py
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B637@mail2.exch.c2b2.columbia.edu>
Do we still need the function "enumerate" in Bio/GFF/GenericTools.py and
Bio/DocSQL.py?
AFAICT, this function does exactly the same as the Python built-in enumerate
function.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Thu Oct 11 06:31:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:31:59 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
<470B6981.3020707@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
> Perhaps we should introduce a MissingExternalDependency error instead,
> used for this specific case, and catch that in run_tests.py, while
> treating ImportError as a real error.
OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
MissingExternalDependencyError occurs in a test, a warning is printed but it
is not counted as a failure.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2910 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071011/fc06d7c7/attachment.bin
From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 16:44:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Oct 2007 21:44:46 +0100
Subject: [Biopython-dev] Revised tutorial
Message-ID: <470E8B3E.6080709@maubp.freeserve.co.uk>
In anticipation of the next release, I've done some more work on the
tutorial today -- in particular the section on the Seq object which I
have turned into a new chapter.
If anyone has the time to go over this soon that would be great. I'll be
away tomorrow (Friday) but will probably have time to make any revisions
needed at the weekend.
Its here in CVS:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython
This is a LaTeX file which gets turned into the PDF and HTML versions of
the tutorial using pdflatex and hevea. If you want to proof read but
don't know anything about LaTeX then I can probably email you the PDF
version for comment (half a megabyte).
Peter
From sbassi at gmail.com Thu Oct 11 18:48:39 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 11 Oct 2007 19:48:39 -0300
Subject: [Biopython-dev] Revised tutorial
In-Reply-To: <470E8B3E.6080709@maubp.freeserve.co.uk>
References: <470E8B3E.6080709@maubp.freeserve.co.uk>
Message-ID:
Hello,
I can't resolve all the dependencies to install hevea so I can't
generate the dvi from the tex file. Could you please send me by email
the final PDF?
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
From mdehoon at c2b2.columbia.edu Thu Oct 11 21:53:19 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 21:53:19 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
<470E3E7E.1000301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu>
Peter wrote:
> Michiel De Hoon wrote:
> > OK. I added a MissingExternalDependencyError exception to
Bio/__init__.py,
> > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> > MissingExternalDependencyError occurs in a test, a warning is printed but
it
> > is not counted as a failure.
>
> I might have defined the exception within the test framework rather than
> Bio/__init__.py, but now that it's there we can start to use in things
> like modules that wrap external tools.
That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using
this exception (outside of the testing framework).
> I've updated Tests/requires_internet.py and Test/requires_wise.py to
> match (I don't have wise on my machine which is why I noticed it still
> threw an ImportError).
Thanks! I missed those.
> Is there anything I can do to help get things ready for the release of
> Biopython 1.44?
At some point, somebody will need to go through the documentation to check if
everything documented there still works with the Biopython in CVS, and to
remove sections in the documentation describing deprecated code. But it's
probably better to wait until after we decide what to do with
test_GenBankFormat.
> If you do have time to give the patch on bug 2366 a check, I think it
> would be worth including before the next release.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2366
No time to check it. But I'd be happy to rely on your judgement and include
it.
--Michiel.
From mdehoon at c2b2.columbia.edu Thu Oct 11 21:53:19 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 21:53:19 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
<470E3E7E.1000301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu>
Peter wrote:
> Michiel De Hoon wrote:
> > OK. I added a MissingExternalDependencyError exception to
Bio/__init__.py,
> > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> > MissingExternalDependencyError occurs in a test, a warning is printed but
it
> > is not counted as a failure.
>
> I might have defined the exception within the test framework rather than
> Bio/__init__.py, but now that it's there we can start to use in things
> like modules that wrap external tools.
That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using
this exception (outside of the testing framework).
> I've updated Tests/requires_internet.py and Test/requires_wise.py to
> match (I don't have wise on my machine which is why I noticed it still
> threw an ImportError).
Thanks! I missed those.
> Is there anything I can do to help get things ready for the release of
> Biopython 1.44?
At some point, somebody will need to go through the documentation to check if
everything documented there still works with the Biopython in CVS, and to
remove sections in the documentation describing deprecated code. But it's
probably better to wait until after we decide what to do with
test_GenBankFormat.
> If you do have time to give the patch on bug 2366 a check, I think it
> would be worth including before the next release.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2366
No time to check it. But I'd be happy to rely on your judgement and include
it.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Thu Oct 11 22:32:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Oct 2007 22:32:05 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710120232.l9C2W5e9022504@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2007-10-11 22:32 EST -------
> test_GenBankFormat - this entire test is for the little-used Martel GenBank
> expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0
If it's little-used, should we include it for the next release or can it be
removed? If we remove the test, should we then also remove the corresponding
module?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 16:37:52 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Oct 2007 21:37:52 +0100
Subject: [Biopython-dev] Output of Biopython tests
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
Message-ID: <470E89A0.1010502@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
>> Perhaps we should introduce a MissingExternalDependency error instead,
>> used for this specific case, and catch that in run_tests.py, while
>> treating ImportError as a real error.
>
> OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
> and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> MissingExternalDependencyError occurs in a test, a warning is printed but it
> is not counted as a failure.
I might have defined the exception within the test framework rather than
Bio/__init__.py, but not that its there we can start to use in things
like modules that wrap external tools.
I've updated Tests/requires_internet.py and Test/requires_wise.py to
match (I don't have wise on my machine which is why I noticed it still
threw an ImportError).
This means run_tests.py now runs without errors using CVS on my 64 bit
Linux machine (bar the mxTextTools 3.0 issue with test_GenBankFormat.py
(bug 2361).
Is there anything I can do to help get things ready for the release of
Biopython 1.44?
If you do have time to give the patch on bug 2366 a check, I think it
would be worth including before the next release.
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
Peter
From fennan at gmail.com Mon Oct 15 05:48:45 2007
From: fennan at gmail.com (Fernando)
Date: Mon, 15 Oct 2007 11:48:45 +0200
Subject: [Biopython-dev] Database into variables
Message-ID: <7b13e61d0710150248v72a550d6h38e1467edf5073eb@mail.gmail.com>
Hi everybody,
I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?
Thanks
From fennan at gmail.com Mon Oct 15 06:28:56 2007
From: fennan at gmail.com (Fernando)
Date: Mon, 15 Oct 2007 12:28:56 +0200
Subject: [Biopython-dev] Precompute database information
Message-ID: <7b13e61d0710150328l354bfb5eu1b76ed05024a65c4@mail.gmail.com>
Hi everybody,
I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?
Thanks
From bugzilla-daemon at portal.open-bio.org Mon Oct 15 07:11:26 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Oct 2007 07:11:26 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710151111.l9FBBQOE012625@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
tiagoantao at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tiagoantao at gmail.com
------- Comment #3 from tiagoantao at gmail.com 2007-10-15 07:11 EST -------
I had a look at the test code and tried to find which test case is changing the
ambiguous_dna dict.
I used this little script (putting it here as it might be useful for detecting
these types of problems):
for i in test_*py; do
python run_tests.py $i;
done
It turns out that it is text_Nexus.py. A further inspection to the code seems
to reveal that is not the test case that pollutes the dictionary but the Nexus
modules itself.
Maybe it makes sense to raise a bug on the Nexus module... Any comments on
these findings?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 15 10:16:00 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Oct 2007 10:16:00 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710151416.l9FEG01A023797@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-15 10:16 EST -------
Thanks for that Tiago,
I guess we should file a bug on Bio.Nexus on the alphabet issue; It may be that
it should create a copy or subclass of the ambiguous DNA alphabet in order to
include "?" (I imagine that Nexus uses this rather than "N"), and see if it is
using the Gapped() alphabet system or not.
Did you have any comments on this patch for (reverse) complements?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Mon Oct 15 20:08:13 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Mon, 15 Oct 2007 19:08:13 -0500
Subject: [Biopython-dev] Biopython status
Message-ID: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Hi all,
I've just started using Biopython and I am wondering about the status
of the group, since I've heard rumors that its dying. So far I have
found the library very useful, if not at times frustrating, though I
will admit I am fairly new to developing python as well. I have been
hesitant to make changes to existing code, however I have found that
in a few cases it has been by far the best way to accomplish what I
need, and have only done so in cases where it seems to be the *right*
thing to do.
With that in mind, I have a few questions I was hoping you all could
answer. First, how might I put these changes up for review in order
to contribute back to the code base? The main changes have been to
the AlignAce parser, since as it was it just ignored information
contained in the alignace file regarding the motif instances (namely
which input sequence they came from, where they started in the
sequence, and what strand they were on). I have also needed to create
a modified FASTA parser so that I can read things like quality score
files. I would be happy to submit the changes to the group or an
individual for inspection, but I would like to avoid having to
maintain my own separate version of Biopython if possible.
I am also wondering how it would be received if I did something like
add a to_fasta method to SeqRecord instead of having to go through
writing it to a file using a SeqIO when all I want is the string.
Finally, are there plans to move to a subversion repository at any
point?
Thanks!
Jared Flatow
From sbassi at gmail.com Tue Oct 16 01:09:16 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 16 Oct 2007 02:09:16 -0300
Subject: [Biopython-dev] Biopython status
In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID:
On 10/15/07, Jared Flatow wrote:
> I've just started using Biopython and I am wondering about the status
> of the group, since I've heard rumors that its dying. So far I have
You could subscribe to the rss feed of the CVS and you will see a lot
of activity. The developers list and the bug tracking program
(bugzilla) is also pretty busy, that doesn't look as a dying group to
me :)
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
From mdehoon at c2b2.columbia.edu Tue Oct 16 01:37:14 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 01:37:14 -0400
Subject: [Biopython-dev] Biopython status
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
Hi Jared,
> I've just started using Biopython and I am wondering about the status
> of the group, since I've heard rumors that its dying.
>From looking at the activity on the Biopython mailing lists in recent months,
it doesn't seem to be dying :-).
> So far I have found the library very useful, if not at times frustrating,
> though I will admit I am fairly new to developing python as well.
One thing to keep in mind is that Biopython started about eight years ago,
and some approaches that seemed to be a good idea at that time may not seem
to be so now. Nevertheless, I feel that Biopython is moving in the right
direction in terms of ease-of-use.
> First, how might I put these changes up for review in order
> to contribute back to the code base? The main changes have been to
> the AlignAce parser, since as it was it just ignored information
> contained in the alignace file regarding the motif instances (namely
> which input sequence they came from, where they started in the
> sequence, and what strand they were on).
In this case, it is a good idea to contact the current maintainer of
Bio.AlignAce, either via the mailing list or directly. From the Biopython
CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce,
so it would be a good idea to discuss with him.
> I have also needed to create a modified FASTA parser so that I
> can read things like quality score files.
At some point, Biopython had several (two or three?) Fasta parsers, two Fasta
formats, etc. This is a situation we should definitely avoid. So if your
modifications fit in well with the existing Fasta parser in Bio.SeqIO, it may
very well be accepted into Biopython. Otherwise, it's better to leave it out.
This is just my opinion though.
> I am also wondering how it would be received if I did something like
> add a to_fasta method to SeqRecord instead of having to go through
> writing it to a file using a SeqIO when all I want is the string.
This sounds like feature creep to me, so I would be against it. It's easy to
add code to Biopython, it's much harder to remove stuff. Code bloat is a real
problem in Biopython.
> Finally, are there plans to move to a subversion repository at any
> point?
There were some plans at some point, but I don't know the current status.
Best,
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Jared Flatow
Sent: Mon 10/15/2007 8:08 PM
To: biopython-dev at lists.open-bio.org
Subject: [Biopython-dev] Biopython status
Hi all,
I've just started using Biopython and I am wondering about the status
of the group, since I've heard rumors that its dying. So far I have
found the library very useful, if not at times frustrating, though I
will admit I am fairly new to developing python as well. I have been
hesitant to make changes to existing code, however I have found that
in a few cases it has been by far the best way to accomplish what I
need, and have only done so in cases where it seems to be the *right*
thing to do.
With that in mind, I have a few questions I was hoping you all could
answer. First, how might I put these changes up for review in order
to contribute back to the code base? The main changes have been to
the AlignAce parser, since as it was it just ignored information
contained in the alignace file regarding the motif instances (namely
which input sequence they came from, where they started in the
sequence, and what strand they were on). I have also needed to create
a modified FASTA parser so that I can read things like quality score
files. I would be happy to submit the changes to the group or an
individual for inspection, but I would like to avoid having to
maintain my own separate version of Biopython if possible.
I am also wondering how it would be received if I did something like
add a to_fasta method to SeqRecord instead of having to go through
writing it to a file using a SeqIO when all I want is the string.
Finally, are there plans to move to a subversion repository at any
point?
Thanks!
Jared Flatow
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 04:16:01 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 09:16:01 +0100
Subject: [Biopython-dev] Biopython status
In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID: <47147341.4020708@maubp.freeserve.co.uk>
Jared Flatow wrote:
> I have also needed to create a modified FASTA parser so that I can
> read things like quality score files.
Could you be a little more specific - what exactly do you mean by a
quality score files (links and/or examples). It may be that this
warrants setting up a new file format in Bio.SeqIO
> I would be happy to submit the changes to the group or an individual
> for inspection, but I would like to avoid having to maintain my own
> separate version of Biopython if possible.
As has already been said - please file some (enhancement) bugs and
attach your patches, or raise specific issues for discussion on this
mailing list.
Depending on the nature of your changes, you might be able to achieve
some of them by subclassing Biopython's objects - rather than literally
maintaining your own branch of the project.
> I am also wondering how it would be received if I did something like
> add a to_fasta method to SeqRecord instead of having to go through
> writing it to a file using a SeqIO when all I want is the string.
Out of interest, why do you want to create a FASTA record as a string?
Did you know you can write to a string using any Bio.SeqIO supported
file format using StringIO? Perhaps we should spell this out more
explicitly in the documentation, but a motivating example would help.
I would suggest rather than adding a to_fasta method to the SeqRecord,
simply write your own "seqrecord_to_string" function (or create a
subclass of SeqRecord with this method).
> Finally, are there plans to move to a subversion repository at any
> point?
It was raised a while ago, and our cunning plan was to let BioPerl try
the move first. Once that has been proven, it should be fairly easy for
the OBF guys to also move us over. I should email them to see how
things stand...
Peter
From bartek at rezolwenta.eu.org Tue Oct 16 05:11:01 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 16 Oct 2007 11:11:01 +0200
Subject: [Biopython-dev] Biopython status
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
Message-ID: <1192525861.4714802535dae@imp.rezolwenta.eu.org>
Michiel De Hoon wrote:
> > First, how might I put these changes up for review in order
> > to contribute back to the code base? The main changes have been to
> > the AlignAce parser, since as it was it just ignored information
> > contained in the alignace file regarding the motif instances (namely
> > which input sequence they came from, where they started in the
> > sequence, and what strand they were on).
>
> In this case, it is a good idea to contact the current maintainer of
> Bio.AlignAce, either via the mailing list or directly. From the Biopython
> CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce,
> so it would be a good idea to discuss with him.
I'm not dying either ;). I'm the author of the Bio.AlignAce module and if you
have any new code to contribute to it, I'll be glad to help you. The best way
to do it would be to submit an enhancement bug report in bugzilla. If the
changes are smaller, you can just send them (as a diff) to the list and I'll
try to fit them to the current cvs version of Bio.AlignAce
Bartek Wilczynski
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 05:55:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 05:55:37 -0400
Subject: [Biopython-dev] [Bug 2380] New: Bio.Nexus is adding "?" and "-" to
Bio.Data.IUPACData.ambiguous_dna_values
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2380
Summary: Bio.Nexus is adding "?" and "-" to
Bio.Data.IUPACData.ambiguous_dna_values
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
This issue was raised in Bug 2366 where a unit test was found to be "polluting"
ambiguous_dna_values, later identified as Bio.Nexus via test_Nexus.py
Need to see if Bio.Nexus should be making a copy of this dict, or perhaps
defining a subclass of the alphabet (using the Gapped() class maybe).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 05:56:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 05:56:37 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710160956.l9G9ub18007735@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 05:56 EST -------
Fix committed (after Michiel's OK on the mailing list), marking as fixed.
Checking in Tests/test_seq.py;
/home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py
new revision: 1.6; previous revision: 1.5
done
Checking in Tests/output/test_seq;
/home/repository/biopython/biopython/Tests/output/test_seq,v <-- test_seq
new revision: 1.6; previous revision: 1.5
done
Checking in Bio/Seq.py;
/home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py
new revision: 1.17; previous revision: 1.16
done
I've filed Bug 2380 for the Nexus issue:
Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:11:09 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:11:09 -0400
Subject: [Biopython-dev] [Bug 2381] New: translate and transcibe method for
the the Seq object (in Bio.Seq)
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
Summary: translate and transcibe method for the the Seq object
(in Bio.Seq)
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Biopython has translation and transcription modules (Bio/Translate.py and
Bio/Transcibe.py) but I find them a little bit complicated to use.
There are module level functions translate, transcribe, and back_transcribe in
Bio/Seq.py which take either a string, a Seq object or a MutableSeq object.
I would like to add similar methods to the Seq object (also defined Bio/Seq.py)
to make this functionality more accessable from a Seq object.
NOTE: Python strings have a translate method of their own which is rather
different. Having the Seq translate method doing a biological translation
makes sense.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:13:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:13:35 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161013.l9GADZtJ008751@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|translate and transcibe |translate and transcibe
|method for the the Seq |methods for the Seq object
|object (in Bio.Seq) |(in Bio.Seq)
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:13 EST -------
fixed typo in the bug summary
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:26:44 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:26:44 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161026.l9GAQixw009268@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #2 from dalloliogm at gmail.com 2007-10-16 06:26 EST -------
I find difficult to translate a sequence in the 6 reading frames with a single
command.
Actually I use something like this:
for i in xrange(2):
translate(Seq[i:])
which is not very nice.
It would be nice to add a parameter to the translate function like in the
emboss application transeq
(http://emboss.sourceforge.net/apps/cvs/emboss/apps/transeq.html), something
like this:
>>> a = Seq('CAGCTAGCT')
>>> a.translate()
[(translation of a in the frame 0)]
>>> a.translate(1)
[(translation of a in the frame 1)]
>>> a.translate(F)
[(translation of a in the 3 forward frames)]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 06:46:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:46:47 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161046.l9GAklI6010391@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:46 EST -------
Doing a three/six frame translation is however fairly common, and perhaps
warrents an "official" implementation in Bio.SeqUtils
My current inclination is try and keep the Bio.Seq translation function as
simple as possible. There are lots of possible options to worry about...
catering to them all could make the translate method rather daunting.
Perhaps things like the frame (or even the starting nucleotide) could be done
in Bio.Translate only. Another "special case" example I personally would like
is an option to check the first codon is a valid start codon for the specified
codon table, and to translate it as methionine (M). Then there is the question
of if Bio.Translate's "translate_to_stop" functionality should be exposed in a
Seq method.
Note there is yet another (!) translation function Bio.SeqUtils.translate()
which is frame aware [personally I would mark a lot of this module as
deprecated].
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Tue Oct 16 12:02:19 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 11:02:19 -0500
Subject: [Biopython-dev] Biopython status
In-Reply-To: <47147341.4020708@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<47147341.4020708@maubp.freeserve.co.uk>
Message-ID: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Please forgive me for ever doubting your health, it seems the group
is very much alive!
On Oct 16, 2007, at 3:16 AM, Peter wrote:
> Jared Flatow wrote:
>> I have also needed to create a modified FASTA parser so that I can
>> read things like quality score files.
>
> Could you be a little more specific - what exactly do you mean by a
> quality score files (links and/or examples). It may be that this
> warrants setting up a new file format in Bio.SeqIO
That is what I did. The quality score files I meant are simply FASTA-
like records that indicate the quality of each base pair read from a
sequencing machine, on a scale of something like 1 to 64. The values
are tab separated and correspond to 'reads' in another FASTA file
that contain the actual sequences read. This is the way the 454
GSFlex machines output their sequencing reads, so for every set of
reads there will be a pair of 454Reads.fna, 454Reads.qual files. The
only difference between a parser that processes these qual files and
one that processes the sequence files is that it shouldn't get rid of
spaces, and the newlines should not to be stripped but converted into
spaces (when 454 writes a newline of scores they omit the space).
Essentially I have made a duplicate of FastaIOs iterator, named it
something else, made these two small changes and put an entry for it
in the SeqIO file.
16,17c16,17
< def GSQualIterator(handle, alphabet = single_letter_alphabet,
title2ids = None) :
< """Generator function to iterate over GSFlex quality records
(as SeqRecord objects).
---
> def FastaIterator(handle, alphabet = single_letter_alphabet,
title2ids = None) :
> """Generator function to iterate over Fasta records (as
SeqRecord objects).
54c54
< lines.append(line.rstrip()) # .replace(" ","")) leave
off the replacing internal spaces so we can process qscore files (jf)
---
> lines.append(line.rstrip().replace(" ",""))
58c58
< yield SeqRecord(Seq(" ".join(lines), alphabet),
---
> yield SeqRecord(Seq("".join(lines), alphabet),
63a64,199
As you can see a parser like this might be useful for other FASTA-
like formats as well and is in no way specific to the GS quality
files (its just a space preserving parser). If it were to be
implemented in Biopython you might call it something else.
>
>> I would be happy to submit the changes to the group or an individual
>> for inspection, but I would like to avoid having to maintain my own
>> separate version of Biopython if possible.
>
> As has already been said - please file some (enhancement) bugs and
> attach your patches, or raise specific issues for discussion on this
> mailing list.
>
> Depending on the nature of your changes, you might be able to achieve
> some of them by subclassing Biopython's objects - rather than
> literally
> maintaining your own branch of the project.
>
>> I am also wondering how it would be received if I did something like
>> add a to_fasta method to SeqRecord instead of having to go
>> through writing it to a file using a SeqIO when all I want is the
>> string.
>
> Out of interest, why do you want to create a FASTA record as a string?
I am serving the fasta from a database of sequences dynamically via a
web server.
>
> Did you know you can write to a string using any Bio.SeqIO supported
> file format using StringIO? Perhaps we should spell this out more
> explicitly in the documentation, but a motivating example would help.
This is what I do now, but it seems like a hack to me to go this
route. To always have to write to a file feels strange, but I see
that it would be messy to go OO since there are so many formats.
However, giving preference to fasta over other formats by making it
innate doesn't seem like such a terrible idea. I do have mixed
feelings about 'bloating' the code which is why I asked, and you have
convinced me that this is not quite appropriate given existing
convention. However the idea would be to put the to_fasta or
to_format method inside the SeqRecord, then to call it from the IO
when needed to actually write to a file, but call it directly when
all that is wanted is a string...
>
> I would suggest rather than adding a to_fasta method to the
> SeqRecord, simply write your own "seqrecord_to_string" function (or
> create a subclass of SeqRecord with this method).
>
I'll leave it alone for now until I can come up with a real proposal =)
>> Finally, are there plans to move to a subversion repository at any
>> point?
>
> It was raised a while ago, and our cunning plan was to let BioPerl try
> the move first. Once that has been proven, it should be fairly
> easy for
> the OBF guys to also move us over. I should email them to see how
> things stand...
BioPerl seems to be the guinea pigs for everything. Leading the way
on this might put a stop to those nasty rumors about Biopython.
Best Regards,
Jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:47:48 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:47:48 +0100
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EB34.8000207@maubp.freeserve.co.uk>
Jared wrote:
> Leading the way on this ... [CVS to SVN]
I would say one reason why we aren't charging ahead with a move from CVS
to subversion is only a few posters on this mailing list actively WANT
to move to subversion, and no-one has really championed the move (yet).
I'm sure if we as a group wanted to this, then the OBF would be happy to
assist. After all, moving us rather than BioPerl as the first CVS/SVN
migration should be easier as we have a smaller code base.
Peter
From jflatow at northwestern.edu Tue Oct 16 14:46:53 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 13:46:53 -0500
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <4714EBC7.1040504@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EBC7.1040504@maubp.freeserve.co.uk>
Message-ID: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
Hi Peter,
>>>> I have also needed to create a modified FASTA parser so that I
>>>> can read things like quality score files.
>>>
>>> Could you be a little more specific - what exactly do you mean by a
>>> quality score files (links and/or examples). It may be that this
>>> warrants setting up a new file format in Bio.SeqIO
>> That is what I did. The quality score files I meant are simply
>> FASTA- like records that indicate the quality of each base pair
>> read from a sequencing machine, on a scale of something like 1 to
>> 64. The values are tab separated and correspond to 'reads' in
>> another FASTA file that contain the actual sequences read. This
>> is the way the 454 GSFlex machines output their sequencing reads,
>> so for every set of reads there will be a pair of 454Reads.fna,
>> 454Reads.qual files. The only difference between a parser that
>> processes these qual files and one that processes the sequence
>> files is that it shouldn't get rid of spaces, and the newlines
>> should not to be stripped but converted into spaces (when 454
>> writes a newline of scores they omit the space). Essentially I
>> have made a duplicate of FastaIOs iterator, named it something
>> else, made these two small changes and put an entry for it in the
>> SeqIO file.
>
> Patches and emails don't do well together. Could you file an
> enhancement bug, and then upload your code as an attachment? If
> you have a few examples of matched pairs of FASTA files and quality
> files which you can contribute that would be very helpful too.
>
Yes I'll get on that.
> It looks like you are trying to construct a "sequence" of numerical
> values (rather than a sequence of letters like nucleotides/amino
> acids). As written I don't think it would work for element access/
> splicing etc. However, with some extra work I suppose we could
> stretch the Seq object in this way - and define a new
> "IntegerAlphabet".
>
> But on balance, I don't think "lists of quality values" should be
> treated in the same way as sequences (and thus it doesn't seem to
> belong in Bio.SeqIO).
>
I agree.
> Alternatively you could regard the quality scores as sequence meta-
> data or annotation. One idea would be to generate SeqRecord
> objects containing dummy sequences of the correct length made up of
> the ambiguous character "N", with the associated quality scores
> held as a list of integers in the SeqRecord's annotation
> dictionary. Then it would fit into the Bio.SeqIO framework [I was
> thinking of something similar for PTT files, NCBI Protein tables,
> where again we have annotation but not the actual sequence].
I agree, and this way is most flexible.
>
> Maybe there should just be a separate parser for GSFlex quality
> records which returns iterator giving each record name with a list
> of integers. A more elegant scheme would read in the pair of files
> together (the FASTA file and the quality file) and generate nicely
> annotated SeqRecords with the sequence and the quality. This isn't
> really possible with the Bio.SeqIO framework.
>
Yes, at first I liked this idea best, but it puts some constraints on
the way these things are read in. Like if it is to be an iterator,
you must have a guarantee that these files contain exactly the same
sequences in exactly the same order. This seems like it could
potentially be fine for the GSFlex files, but I wonder if there might
somewhere down the line be use for quality information about
sequences in other cases. If I am not mistaken, some sources use
upper/lower case letters now to indicate a bistable degree of
confidence in a sequence letter. In any event, this seems like an
unnecessary restriction.
The way I do it now is I load the reads into a database, then update
the database when I read in a quality score file. I think Biopython
should have a simple way of implementing something similar which can
solve both our metadata problems.
In Bio.Fasta there are Parsers which really belong in
Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
general Fasta reader, nothing to do with sequences. It can iterate
over a FASTA file using the '>' as the record separator, creating
Record objects, much like it does now, except without processing them
at all or assuming they are sequences.
>Record.header
Record.data
Now Bio.SeqIO.FastaIO can use Bio.Fasta to iterate over the Record
objects in a file and transform them into SeqRecord object. If you
like, you can provide it with a function header_todict, which takes a
string (in this case Record.header) and returns a dictionary, which
gets unpacked and passed to the SeqRecord initializer. Basically the
Bio.SeqIO.FastaIO returns a generator that looks something like this:
(SeqRecord(seq=cleanup(record.data), **header_todict(record.header))
for record in Bio.Fasta.parse(file))
I can also use the Bio.Fasta.parse function now to parse my quality
files and add them as metadata:
# I create an initial SeqRecord dictionary using the
Bio.SeqIO.FastaIO parser
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file,
my_header_todict))
# Then I iterate over the sequences in the qual file and look them up
in the seq_dict using the same header parsing function
# I passed to create my initial SeqRecords, setting the quality
scores as I find them them
for record in Bio.Fasta.parse(qual_file):
seq_dict[my_header_todict(record.header)['id']].quality =
my_qualitycleanup(record.data)
I hope that makes sense. The advantage to doing it this way is that I
can reuse my header parsing function for both the sequence and the
metadata, and I can do whatever I want with the fasta record data
without writing a whole new parser. The SeqIO fasta parsing functions
just makes some default assumptions (like the data is a sequence).
Let me know what you think.
Jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:50:15 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:50:15 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EBC7.1040504@maubp.freeserve.co.uk>
Hi Jared,
>>> I have also needed to create a modified FASTA parser so that I can
>>> read things like quality score files.
>>
>> Could you be a little more specific - what exactly do you mean by a
>> quality score files (links and/or examples). It may be that this
>> warrants setting up a new file format in Bio.SeqIO
>
> That is what I did. The quality score files I meant are simply FASTA-
> like records that indicate the quality of each base pair read from a
> sequencing machine, on a scale of something like 1 to 64. The values
> are tab separated and correspond to 'reads' in another FASTA file
> that contain the actual sequences read. This is the way the 454
> GSFlex machines output their sequencing reads, so for every set of
> reads there will be a pair of 454Reads.fna, 454Reads.qual files. The
> only difference between a parser that processes these qual files and
> one that processes the sequence files is that it shouldn't get rid of
> spaces, and the newlines should not to be stripped but converted into
> spaces (when 454 writes a newline of scores they omit the space).
> Essentially I have made a duplicate of FastaIOs iterator, named it
> something else, made these two small changes and put an entry for it
> in the SeqIO file.
Patches and emails don't do well together. Could you file an
enhancement bug, and then upload your code as an attachment? If you
have a few examples of matched pairs of FASTA files and quality files
which you can contribute that would be very helpful too.
It looks like you are trying to construct a "sequence" of numerical
values (rather than a sequence of letters like nucleotides/amino acids).
As written I don't think it would work for element access/splicing
etc. However, with some extra work I suppose we could stretch the Seq
object in this way - and define a new "IntegerAlphabet".
But on balance, I don't think "lists of quality values" should be
treated in the same way as sequences (and thus it doesn't seem to belong
in Bio.SeqIO).
Alternatively you could regard the quality scores as sequence meta-data
or annotation. One idea would be to generate SeqRecord objects
containing dummy sequences of the correct length made up of the
ambiguous character "N", with the associated quality scores held as a
list of integers in the SeqRecord's annotation dictionary. Then it
would fit into the Bio.SeqIO framework [I was thinking of something
similar for PTT files, NCBI Protein tables, where again we have
annotation but not the actual sequence].
Maybe there should just be a separate parser for GSFlex quality records
which returns iterator giving each record name with a list of
integers. A more elegant scheme would read in the pair of files together
(the FASTA file and the quality file) and generate nicely annotated
SeqRecords with the sequence and the quality. This isn't really
possible with the Bio.SeqIO framework.
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 15:33:54 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 20:33:54 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk>
<48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
Message-ID: <47151222.1060502@maubp.freeserve.co.uk>
> In Bio.Fasta there are Parsers which really belong in
> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
> general Fasta reader, nothing to do with sequences. ...
In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
thinking in a few releases time of suggesting its deprecation (but not
just yet as for several years it was the best documented and most used
parser in Biopython).
If we do decided keep Bio.Fasta (or extend it), then perhaps
Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
I'm still digressing your ideas to turn Bio.Fasta into a generic parser
that copes with sequences, qualities scores, or anything else.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 15:57:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 15:57:35 -0400
Subject: [Biopython-dev] [Bug 2382] New: Generic FASTA parser
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
Summary: Generic FASTA parser
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: jflatow at northwestern.edu
I would like to be able read in and iterate over records in generic fasta files
of the format:
>header
data
>header
data
...
This iterator should return Bio.Fasta.Record objects with the corresponding
header and data fields.
I suggest putting this inside the existing Bio.Fasta module and updating
Bio.SeqIO.Fasta to use this iterator and transform the records returned into
Bio.SeqRecord objects.
This should make it easier to add metadata to SeqRecord objects parsed in from
FASTA. Consider the following example for illustration. I have data from a
genome sequencing machine that outputs pairs of files. One contains the
sequence reads which look like this, the other contains estimates of the
quality of each base call in the sequence.
The sequence file might look something like this (only with hundreds of
thousands more entries):
>ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run=R_runname
CAATATAATTTCTCTTAAAATTATTCCCATGGCCAGGTGTGGTGGCTCACACCTGTAGTC
CCGGCACTTTGGGAGGCCAAGGCACACAGGGGATAGG
>ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname
GGTCTCCAGTGCCCTGTCTCCCCATATTTCTGACACACCTTCTCACAGCCTGGCCCATCT
TGCTGGGTCCCTCTTCTCCTCCCTTCCTGCTCCATTTGTCAACACTGCTGGGACATTAGA
ATTCAGATCTCCCGGGTCACCG
>ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname
AAAGTGACTAAAGAATCAATTTACATTAATATTCTATGTGAACAGGCAAAATACTTACAA
AGAAGTAGAGAAAATATGAATTCAGTACAGAATTCAGATCTCCCGGGTCACCG
The corresponding quality score file might look something like this:
>ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run= R_runname
27 28 21 27 27 27 28 22 28 25 3 27 27 27 28 21 33 31 20 6 28 21 26 26 18 28 25
2 26 25 29 23 31 24 27 29 22 27 27 27 29 23 27 31 25 27 27 27 27 27 27 32 26 27
27 27 27 26 27 33
30 12 32 26 27 27 27 33 30 12 33 30 12 26 31 25 33 27 32 28 33 28 27 27 27 27
27 26 33 32 20 7 27 27 27 32 26
>ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname
28 9 26 24 27 27 20 26 18 25 27 32 29 10 26 26 27 18 25 32 30 17 1 25 27 22 32
30 12 27 27 22 26 25 27 23 25 28 21 32 27 27 27 25 26 27 26 25 27 20 26 26 19
28 25 3 25 27 22 27
19 24 24 24 32 29 11 24 34 31 17 23 23 30 23 27 25 30 23 27 33 31 17 27 20 28
21 27 25 26 26 30 24 27 33 31 13 26 27 27 31 25 27 25 23 26 16 26 27 30 27 7 27
27 27 32 27 26 26 32
27 30 26 27 27 27 27 27 27 27 30 27 6 34 31 17 27 21 27 32 28 18
>ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname
29 26 5 25 27 24 27 27 27 30 27 7 26 27 19 25 26 31 26 34 32 16 20 27 26 32 27
32 28 27 25 26 18 27 25 27 26 26 24 27 31 25 27 27 31 26 26 34 32 23 11 26 22
27 32 26 27 26 32 30
11 26 31 24 27 27 25 23 27 27 33 30 19 4 17 26 25 26 31 27 30 26 27 26 22 26 18
24 27 26 32 26 32 28 27 27 25 27 25 24 25 31 28 10 34 31 15 27 21 27 28 21 27
I would like to be able to do the following:
# create a function to parse the header line and return a dictionary
def parse_gsflex_header(gs_header):
parts = gs_record.description.split(' ')
assert len(parts) == 5
xy = parts[2].split('=')[1].split('_')
return {'letters': gs_record.seq.tostring(),
'name': parts[0],
'length': parts[1].split('=')[1],
'xpos': xy[0],
'ypos': xy[1],
'region': parts[3].split('=')[1],
'run': parts[4].split('=')[1]}
# Bio.SeqIO.FastaIO wraps the Bio.Fasta parser, might look something like this
class Fasta(): # or however its organized
def data_toseq(data):
# do some parsing of the data
return Seq(...)
def parse(file, header_todict):
return (SeqRecord(seq=data_toseq(record.data),
**header_todict(record.header)) for record in Bio.Fasta.parse(file))
# I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, parse_gsflex_header))
# Then I iterate over the sequences in the qual file and look them up in the
seq_dict
# setting the quality scores as I find them them
for record in Bio.Fasta.parse(qual_file):
seq_dict[my_header_todict(record.header)['id']].quality =
my_qualitycleanup(record.data)
This would work well for parsing all kinds of FASTA-like files and provides a
simple mechanism for dealing with them record by record.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 16:03:33 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 16:03:33 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162003.l9GK3XmF007588@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #1 from jflatow at northwestern.edu 2007-10-16 16:03 EST -------
My mistake, the parse_gsflex_header function should look something like this:
def parse_gsflex_header(gs_header):
parts = re.split('[,|]?\s+', header, maxsplit=1)
assert len(parts) == 2
return {'id': parts[0],
'description': header}
def my_qualitycleanup(data):
return [int x for x in data.replace('\n', '').split(' ')]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Tue Oct 16 16:11:04 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 15:11:04 -0500
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <47151222.1060502@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk>
<48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
<47151222.1060502@maubp.freeserve.co.uk>
Message-ID: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
On Oct 16, 2007, at 2:33 PM, Peter wrote:
> > In Bio.Fasta there are Parsers which really belong in
> > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
> > general Fasta reader, nothing to do with sequences. ...
>
> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
> thinking in a few releases time of suggesting its deprecation (but
> not just yet as for several years it was the best documented and
> most used parser in Biopython).
>
I see, it looks like its meant to be deprecated, I was just saying
its actually doing SeqIO functionality.
> If we do decided keep Bio.Fasta (or extend it), then perhaps
> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
>
> I'm still digressing your ideas to turn Bio.Fasta into a generic
> parser that copes with sequences, qualities scores, or anything else.
I'm not quite sure you're meaning of digressing, if you mean thinking
it over, then great =) Otherwise I hope you'll seriously consider it
anyway. Either way, I think I posted a more coherent message on
bugzilla with some example data and motivation.
jared
From jflatow at northwestern.edu Tue Oct 16 16:14:16 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 15:14:16 -0500
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <4714EB34.8000207@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB34.8000207@maubp.freeserve.co.uk>
Message-ID: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
> I would say one reason why we aren't charging ahead with a move
> from CVS to subversion is only a few posters on this mailing list
> actively WANT to move to subversion, and no-one has really
> championed the move (yet).
Does that mean most developers don't WANT to move, or just that they
don't ACTIVELY want to move?
jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:42:18 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 21:42:18 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk>
<156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
Message-ID: <4715222A.2070909@maubp.freeserve.co.uk>
Jared Flatow wrote:
> On Oct 16, 2007, at 2:33 PM, Peter wrote:
>
>>> In Bio.Fasta there are Parsers which really belong in
>>> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
>>> general Fasta reader, nothing to do with sequences. ...
>> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
>> thinking in a few releases time of suggesting its deprecation (but
>> not just yet as for several years it was the best documented and
>> most used parser in Biopython).
>
> I see, it looks like its meant to be deprecated, I was just saying
> its actually doing SeqIO functionality.
Well I'm currently just making a suggestion for the future, deprecating
Bio.Fasta, we should still canvas opinion on the main mailing list
before taking that action.
>> If we do decided keep Bio.Fasta (or extend it), then perhaps
>> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
>>
>> I'm still digressing your ideas to turn Bio.Fasta into a generic
>> parser that copes with sequences, qualities scores, or anything else.
That was a typo, but you managed to guess my meaning. I meant to say:
I'm still digesting [i.e. thinking about] your ideas to turn Bio.Fasta
into a generic parser that copes with sequences, qualities scores, or
anything else.
> I'm not quite sure you're meaning of digressing, if you mean thinking
> it over, then great =) Otherwise I hope you'll seriously consider it
> anyway. Either way, I think I posted a more coherent message on
> bugzilla with some example data and motivation.
I'll take a look, Bug 2382 - Generic FASTA parser
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 17:01:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 22:01:29 +0100
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk>
<6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
Message-ID: <471526A9.1010709@maubp.freeserve.co.uk>
Jared Flatow wrote:
>> I would say one reason why we aren't charging ahead with a move
>> from CVS to subversion is only a few posters on this mailing list
>> actively WANT to move to subversion, and no-one has really
>> championed the move (yet).
>
> Does that mean most developers don't WANT to move, or just that they
> don't ACTIVELY want to move?
Going back over the archives, Chris Lasher was most vocal in supporting
the move, and there were a few other positive voices.
Speaking for myself, I have no strong desire either way, and I don't
think Michiel objected either (except over the timing). Then as now, we
are hoping to get the next release out "shortly", so after that would be
a good time to make the switch.
[I'm assuming we won't loose any revision history or comments, and that
things like the web based ViewCVS and its RSS feed will still be available]
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:02:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:02:03 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162102.l9GL23rr010250@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:02 EST -------
Are there any other "FASTA like" formats you know of, in addition to
traditional sequence data and the 454 GSFlex quality score files?
We could do this using the old Scanner/Consumer model (see the pre-Martel
parse, CVS revision 1.8 of Bio/Fasta/__init__.py for example).
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?rev=1.8&cvsroot=biopython&content-type=text/vnd.viewcvs-markup
The scanner would be the same for all formats, and would pass the data with
whitespace (spaces, new lines etc) as is. We could then have one consumer for
each supported FASTA variant:
_Scanner Scans a FASTA-format stream.
_RecordConsumer Consumes FASTA data to a Record object.
_SequenceConsumer Consumes FASTA data to a Sequence object.
_QualityConsumer (new) could build a list of integers for each record?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:26:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:26:29 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162126.l9GLQT8O011239@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #3 from jflatow at northwestern.edu 2007-10-16 17:26 EST -------
On second thought, let me just rewrite all the code:
# The Bio.Fasta parser
class Fasta(): # or whatever
@staticmethod
def parse(file):
# return an iterator over the file as Bio.Fasta.Records
# for the records, trim newline from header, don't do anything to data
# The Bio.SeqIO.FastaIO wrapper for Bio.Fasta
class FastaIO(): # or however its organized
@staticmethod
def header_todict(header):
parts = re.split('[,|]?\s+', header, maxsplit=1)
assert len(parts) == 2
return {'id': parts[0],
'description': header}
@staticmethod
def data_toseq(data, alphabet):
return Seq(re.sub('\s+', '', data), alphabet)
@staticmethod
def parse(file, header_todict=Fasta.header_todict,
alphabet=single_letter_alphabet):
return (SeqRecord(seq=data_toseq(record.data, alphabet),
**header_todict(record.header)) for record in
Bio.Fasta.parse(file))
# Now to use these in my example I can do
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file))
for record in Bio.Fasta.parse(qual_file):
id = Bio.SeqIO.FastaIO.header_todict(record.header)['id']
seq_dict[id].quality = [int(x) for x in record.data.split()]
# Suppose instead I have an alignment file, which looks like this:
>contigname
A A 10 64
T T 9 64
C C 9 64
...
# and on, where the first column is a reference sequence, the second column is
a consensus
# sequence, the third column is the number of reads aligned, the fourth column
is the combined
# quality score
# Now its just as easy for me to parse this into an object
class ContigAlign():
def __init__(self, name, ref, consensus, numreads, qscore):
self.name = name
self.ref = ref
self.consensus = consensus
self.numreads = numreads
self.qscore = qscore
# ill make a dictionary of my contigaligns
d = {}
for record in Bio.Fasta.parse(file):
(ref, consensus, numreads, qscore) = zip(record.data.split('\n'))
d[record.header] = ContigAlign(record.header, ref, consensus, numreads,
qscore)
# maybe i would turn ref and consensus into Seqs, but you get the point
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:38:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:38:45 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162138.l9GLcj29011655@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:38 EST -------
In comment 3, did you just make up this file format as an example?
>contigname
A A 10 64
T T 9 64
C C 9 64
...
with four columns: reference sequence, consensus, number of reads aligned, and
combined quality score.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 17:58:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:58:38 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162158.l9GLwc68012343@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #5 from jflatow at northwestern.edu 2007-10-16 17:58 EST -------
Nope, they actually have a file format that looks like this:
Position Consensus Quality Score Depth Signal StdDeviation
>contig00001 1
1 G 64 2 1.00 0.00
2 A 64 2 1.00 0.00
3 G 64 2 1.00 0.00
4 A 64 2 1.00 0.00
5 G 64 2 2.00 0.00
6 G 64 2 2.00 0.00
7 A 64 2 3.00 0.00
8 A 64 2 3.00 0.00
9 A 64 2 3.00 0.00
10 C 64 2 2.00 0.00
11 C 64 2 2.00 0.00
12 T 64 2 1.00 0.00
13 C 64 2 3.00 0.00
14 C 64 2 3.00 0.00
15 C 64 2 3.00 0.00
16 G 64 2 1.00 0.00
17 T 64 2 1.00 0.00
18 G 64 2 1.00 0.00
19 A 64 2 1.00 0.00
20 T 64 2 1.00 0.00
21 C 64 2 2.00 0.00
22 C 64 2 2.00 0.00
Note the file-wide header at the top of the page (a generic FASTA-like parser
might skip to the first '>'), or we could get rid of that beforehand but it
would be nice if it were smart.
Also, here is another sample FASTA-like file format they use for pair
alignments:
>ERSGEES01EM5WC, 2..30 of 95 and ERSGEES01C1ZV2, 1..29 of 268 (29/29 ident)
2 CGGTGACCCGGGAGATCTGAATTCCTGGT 30
1 CGGTGACCCGGGAGATCTGAATTCCTGGT 29
>ERSGEES01EM5WC, 2..29 of 95 and ERSGEES01DMS5T, 1..28 of 259 (28/28 ident)
2 CGGTGACCCGGGAGATCTGAATTCCTGG 29
1 CGGTGACCCGGGAGATCTGAATTCCTGG 28
>ERSGEES01EM5WC, 29..2 of 95 and ERSGEES01D8GDV, 205..232 of 232 (28/28 ident)
29 CCAGGAATTCAGATCTCCCGGGTCACCG 2
205 CCAGGAATTCAGATCTCCCGGGTCACCG 232
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 18:09:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 18:09:06 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162209.l9GM96N5012764@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #6 from jflatow at northwestern.edu 2007-10-16 18:09 EST -------
The reference/consensus one was inspired by yet another format they have: there
are 2 tools they provide, one for mapping to an existing sequence, the other
for ab initio contig building. The mapping one has the extra reference column.
As you can see it might be hard to keep up with all these similar formats as
part of Biopython (these are only from one source). Certainly the common ones
should have wrappers but we should also be able to easily get the stream of
records.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 18:13:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 18:13:48 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162213.l9GMDmBM012914@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 18:13 EST -------
Could you attach a few of these real files? Including where they came from,
i.e. the company whose software writes such output, and what the call each file
format variant.
If you can get a matched set (i.e. all associated with the same few sequences)
then even better.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 19:09:00 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 19:09:00 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162309.l9GN90wg015092@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #8 from jflatow at northwestern.edu 2007-10-16 19:08 EST -------
The files are very large, I assure you they are just longer versions of what I
have supplied here though. The company is Roche Diagnostics. The initial
reads/quality files are the output of the 454 GSFlex genome sequencing
machines. They have two pieces of software: gsMapper and gsAssembler which
output the contigs.
Reads/Quality files from the machine are called:
454Reads.{fna,qual}
gs* output:
454{All,Large}Contigs.{fna,qual}
454PairAlign.txt
454AlignmentInfo.tsv
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 20:10:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 20:10:45 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710170010.l9H0AjYe018147@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 20:10 EST -------
> Note there is yet another (!) translation function Bio.SeqUtils.translate()
> which is frame aware [personally I would mark a lot of this module as
> deprecated].
Given the various translate functions we already have in Biopython, why do you
want to add another one? Is there something the "translate" method can do that
the "translate" function cannot? Since the "translate" function can take Seq
objects as well as simple strings, I'd prefer the "translate" function over a
"translate" method.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 12:49:18 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:49:18 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EB8E.3000700@maubp.freeserve.co.uk>
>> Did you know you can write to a string using any Bio.SeqIO supported
>> file format using StringIO? Perhaps we should spell this out more
>> explicitly in the documentation, but a motivating example would help.
>
> This is what I do now, but it seems like a hack to me to go this
> route. To always have to write to a file feels strange, but I see
> that it would be messy to go OO since there are so many formats.
> However, giving preference to fasta over other formats by making it
> innate doesn't seem like such a terrible idea. I do have mixed
> feelings about 'bloating' the code which is why I asked, and you have
> convinced me that this is not quite appropriate given existing
> convention. However the idea would be to put the to_fasta or
> to_format method inside the SeqRecord, then to call it from the IO
> when needed to actually write to a file, but call it directly when
> all that is wanted is a string...
Its debatable isn't it? I suspect that for most users, when they want a
record in a particular file format its for writing to a file. However,
adding a to_format() method to a SeqRecord some sense (suitable for
sequential file formats only). This would take a format name and return
a string, by calling Bio.SeqIO with a StringIO object internally.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:17:28 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 22:17:28 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710170217.l9H2HSAx024040@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 22:17 EST -------
If all these special fasta files are coming from Roche Diagnostics, I'd suggest
to create a module rather than trying to put this in Bio.SeqIO. Bio.SeqIO is
one of the few modules in Biopython that is used by most users, so I'd like to
keep it clean as much as possible. To avoid confusion for users who just want
to parse regular Fasta files, I think the module should not be called
Bio.Fasta. In addition, I doubt we'd get much code reuse from a generic
Bio.Fasta module beyond what is needed for the Roche files, since the only
thing they have in common is that they use ">" to separate records.
With a separate module to handle the Roche files, my preferred usage would be
something like this:
from Bio import SeqIO, GSFlex # Or whatever you'd like to call it
seqrecords = SeqIO.parse(open("mysequences.fa"), "fasta")
qualities = GSFlex.parse(open("myqualities.qual"), "quality")
for seqrecord, quality in zip(seqrecords, qualities):
seqrecord.quality = quality
Note that "quality" is currently not a field of the SeqRecord class, but with
SeqRecord being a Python class, we can just add fields on the fly.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 16 22:30:41 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 22:30:41 -0400
Subject: [Biopython-dev] CVS to SVN
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk>
<6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
<471526A9.1010709@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63B@mail2.exch.c2b2.columbia.edu>
> > Does that mean most developers don't WANT to move, or just that they
> > don't ACTIVELY want to move?
>
> Speaking for myself, I have no strong desire either way, and I don't
> think Michiel objected either (except over the timing).
I don't have an objection against SVN either now or later. If you wants to do
the CVS->SVN conversion, just make sure to inform the developers when they
should pause making commits to CVS during the actual move.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/16/2007 5:01 PM
To: Jared Flatow; biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] CVS to SVN
Jared Flatow wrote:
>> I would say one reason why we aren't charging ahead with a move
>> from CVS to subversion is only a few posters on this mailing list
>> actively WANT to move to subversion, and no-one has really
>> championed the move (yet).
>
> Does that mean most developers don't WANT to move, or just that they
> don't ACTIVELY want to move?
Going back over the archives, Chris Lasher was most vocal in supporting
the move, and there were a few other positive voices.
Speaking for myself, I have no strong desire either way, and I don't
think Michiel objected either (except over the timing). Then as now, we
are hoping to get the next release out "shortly", so after that would be
a good time to make the switch.
[I'm assuming we won't loose any revision history or comments, and that
things like the web based ViewCVS and its RSS feed will still be available]
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From mdehoon at c2b2.columbia.edu Tue Oct 16 22:45:34 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 22:45:34 -0400
Subject: [Biopython-dev] SeqRecord to file format as string
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB8E.3000700@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
How about the following:
SeqIO.write(sequences, handle, format) returns the properly formatted string
if handle==None.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/16/2007 12:49 PM
To: Jared Flatow
Cc: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] SeqRecord to file format as string
>> Did you know you can write to a string using any Bio.SeqIO supported
>> file format using StringIO? Perhaps we should spell this out more
>> explicitly in the documentation, but a motivating example would help.
>
> This is what I do now, but it seems like a hack to me to go this
> route. To always have to write to a file feels strange, but I see
> that it would be messy to go OO since there are so many formats.
> However, giving preference to fasta over other formats by making it
> innate doesn't seem like such a terrible idea. I do have mixed
> feelings about 'bloating' the code which is why I asked, and you have
> convinced me that this is not quite appropriate given existing
> convention. However the idea would be to put the to_fasta or
> to_format method inside the SeqRecord, then to call it from the IO
> when needed to actually write to a file, but call it directly when
> all that is wanted is a string...
Its debatable isn't it? I suspect that for most users, when they want a
record in a particular file format its for writing to a file. However,
adding a to_format() method to a SeqRecord some sense (suitable for
sequential file formats only). This would take a format name and return
a string, by calling Bio.SeqIO with a StringIO object internally.
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 03:01:53 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 03:01:53 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710170701.l9H71rML002584@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 03:01 EST -------
The Biopython test currently fails:
======================================================================
FAIL: test_seq
----------------------------------------------------------------------
Traceback (most recent call last):
File "run_tests.py", line 151, in runTest
self.runSafeTest()
File "run_tests.py", line 188, in runSafeTest
expected_handle)
File "run_tests.py", line 289, in compare_output
"\nOutput : "+`output_line` + "\nExpected: "+`expected_line`
AssertionError:
Output : "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XRBWAYSNKMDCHVGU',
Alphabet())\n"
Expected: "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XYVWARSNMKHCDBGU',
Alphabet())\n"
----------------------------------------------------------------------
This is with a fresh checkout from CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:01:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:01:12 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710170801.l9H81CVn005428@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:01 EST -------
> Given the various translate functions we already have in
> Biopython, why do you want to add another one? Is there
> something the "translate" method can do that the "translate"
> function cannot? Since the "translate" function can take Seq
> objects as well as simple strings, I'd prefer the "translate"
> function over a "translate" method.
Its a style thing, allowing more of an object orientated coding
style. For comparison, look at the evolution of the string module
in python which was phased out in favour of string object methods.
In terms of capabilities/arguments, I think the Bio.Seq.translate()
function and the suggested new Bio.Seq.Seq.translate() object
method should be equivalent. In fact, I would have one call the other
internally.
Currently, if you work with strings, you can use the following nice
concise style:
from Bio import Seq #The module
my_str = "ACTGACCGTGC"
print Seq.translate(my_str)
However, if you want to use Seq objects, this becomes rather a mess (in my
opinion):
from Bio import Seq #The module
from Bio.Alphabet.IUPAC import unambiguous_dna
my_seq = Seq.Seq("ACTGACCGTGC", unambiguous_dna)
print Seq.translate(my_seq)
I would like to be able to do this, using an object method:
from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import unambiguous_dna
my_seq = Seq("ACTGACCGTGC", unambiguous_dna)
print my_seq.translate()
Another bonus for people who think OO, is doing dir(my_seq) would
list these useful methods. Right now the user has to know to go
looking in the Bio.Seq module for a function.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:06:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:06:51 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710170806.l9H86ppn006217@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Generic FASTA parser |Generic Roche or GSFlex
| |"FASTA" parser
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:06 EST -------
Now that I'm clear where these files come from, I would agree with Michiel that
a separate Bio.GSFlex or Bio.Roche module would make more sense. I've added
these keywords to the bug summary (to help anyone searching in future).
P.S. From Michiel's example, you could use the existing SeqRecord annotations
dictionary if you wanted to avoid adding a new attribute to the objects on the
fly.
for seqrecord, quality in zip(seqrecords, qualities):
#seqrecord.quality = quality
#If you would rather use the existing dictionary:
seqrecord.annotations["quality"] = quality
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 04:46:41 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:46:41 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710170846.l9H8kfYq008185@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:46 EST -------
Fixed, I think.
I had some RNA/DNA with the U and T the wrong way round... and when I recently
tweaked the alphabet detection in Bio/Seq.py this had an impact.
The root issue is that we don't check the Alphabet's letters agree with the
sequence when creating a Seq object (where the Alphabet supplied has an
explicit list of letters). That would have caught the error in the test suite
much earlier. Maybe I should file an enhancement bug on this issue.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 11:20:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 11:20:51 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710171520.l9HFKpXj030514@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #11 from jflatow at northwestern.edu 2007-10-17 11:20 EST -------
That sounds very reasonable to me to put all this stuff in a separate module. I
would like to implement the design we have been discussing, and I will name it
whatever you think is appropriate, but I would like to do it the way that seems
*right* to me. I mean by that building off of streams of
>header
data
...
since I think this pattern could eventually be used to actually clean up the
rest of the FASTA stuff, not make it worse. I also believe there could
potentially be other instances when a more general FASTA parser would be
useful, even if we don't know of them yet. How does this sound?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 20:46:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 20:46:19 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710180046.l9I0kJfN027373@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 20:46 EST -------
> I also believe there could
> potentially be other instances when a more general FASTA parser would be
> useful, even if we don't know of them yet. How does this sound?
To me, it sounds premature to create a general Fasta parser if we don't know
other instances where it might be useful. (For comparison, note that
Biopython's general parser framework described in section 5.3 of the tutorial
is hardly used in recent Biopython parsers). By all means, keep the module
short and simple.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 00:33:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 00:33:34 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710180433.l9I4XYeY004357@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 00:33 EST -------
If we add translate, transcribe methods to Seq objects, can we then deprecate
Bio.Transcribe, Bio.Translate?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 01:21:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 01:21:15 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710180521.l9I5LFVS006209@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 01:21 EST -------
Looking at the test_GenBankFormat failure again. This is the only test that
fails with the Biopython currently in CVS, using mxTextTools 3.0.
This test is the only test for Bio.expressions. If we remove
test_GenBankFormat, we should deprecate Bio.expressions. Of all the Biopython
tests, only test_format_registry depends on Bio.expressions. This test relies
on the function _load_registries() in Bio/__init__.py. Skipping this function
call in Bio/__init__.py affects no other Biopython test.
So I'd like to suggest the following for the upcoming release:
-) Remove test_GenBankFormat.py and test_format_registry.py
-) Add DeprecationWarnings to Bio.expressions.
-) Skip the call to _load_registries() in Bio/__init__.py
We then have a working Biopython again with the recent mxTextTools, with
minimal disruptive changes to the code.
Any objections?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 06:35:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 06:35:01 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710181035.l9IAZ1DH022693@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-18 06:35 EST -------
Michiel in comment 6 wrote:
> If we add translate, transcribe methods to Seq objects, can we then
> deprecate Bio.Transcribe, Bio.Translate?
Bio.Transcribe - Yes
====================
Bio.Transcribe is so trivial we could recreate that code in Bio.Seq and then
deprecate Bio.Transcribe without losing any functionality:
- transcibe
- back_transcibe
Bio.Translate - Maybe
=====================
Initially I was just proposing to add:
- translate (including all stop codons)
I was simply going to have Bio.Seq call Bio.Translate to do the work.
It would be nice to simplify Biopython by also deprecating Bio.Translate, but
if we want to do this without loss of current functionality we need to consider
including the following in Bio.Seq:
- translate_to_stop (all amino acids up to but excluding the first stop)
- back_translate (gives a single possible nucleotide sequence)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Thu Oct 18 16:06:10 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Oct 2007 21:06:10 +0100
Subject: [Biopython-dev] BioSQL documentation
Message-ID: <4717BCB2.2070301@maubp.freeserve.co.uk>
I was just having a look at:
http://biopython.org/DIST/docs/biosql/python_biosql_basic.html
The source for this HTML and PDF document lives here in the BioSQL CVS:
biosql-schema/doc/biopython/python_biosql_basic.tex
It would be nice to update the following section:
> 3.3 Loading a GenBank file into the database
>
> ...
>
> Now we want to do the loading of the file into the database. The
> Biopython implementation works by taking a standard Iterator object
> that returns Biopython SeqFeature objects and then doing the loading.
I think that should actually say "... that returns Biopython SeqRecord
objects containing SeqFeature objects ..."
> ... our GenBank file, which we can do with the following code:
>
>>>> from Bio import GenBank parser = GenBank.FeatureParser()
>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
That can now be done with Bio.SeqIO which should be clearer, and
hopefully make it easier to see how to extend this to SwissProt etc:
from Bio import SeqIO
iterator = SeqIO.parse(open("cor6_6.gb"), "genbank")
I would do this myself, but I don't have a BioSQL database setup myself
right now. It would also be nice if the documentation didn't skip the
bit about setting up the database with the BioSQL schema, or at least
had links to the relevant BioSQL documentation.
Peter
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 22:15:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 22:15:01 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710190215.l9J2F1bo006275@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 22:15 EST -------
> It would be nice to simplify Biopython by also deprecating Bio.Translate,
To avoid a plethora of translation functions, I would prefer that.
> but if we want to do this without loss of current functionality we
> need to consider including the following in Bio.Seq:
> - translate_to_stop (all amino acids up to but excluding the first stop)
Whether or not to stop translating at the first stop codon could be an argument
to the translate method. As an alternative, it may be preferable to have a
split() method that splits the sequences at the stop codons. Such a method
could be applied to all protein sequences, not only those created by
translate().
> - back_translate (gives a single possible nucleotide sequence)
Does anybody actually use this function?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From salish at picasso.ucsf.edu Fri Oct 19 03:12:53 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Fri, 19 Oct 2007 00:12:53 -0700
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for
the Seq object (in Bio.Seq)
In-Reply-To: <200710190215.l9J2F1bo006275@portal.open-bio.org>
References:
<200710190215.l9J2F1bo006275@portal.open-bio.org>
Message-ID: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
Yes. Back-translating a sequence is important in codon optimization,
searching for homologous proteins, etc.
> > - back_translate (gives a single possible nucleotide sequence)
> Does anybody actually use this function?
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From bugzilla-daemon at portal.open-bio.org Fri Oct 19 08:38:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 19 Oct 2007 08:38:59 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710191238.l9JCcx4I001886@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #37 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-19 08:38 EST -------
I would have suggested adding a mxTextTools version check to
test_GenBankFormat.py and throwing the external dependancy error is 3.0 is
found. That would "solve" the problem test case, and after the next release we
could begin the process of deprecating the modules you suggested.
But I'm OK with your suggestion Michiel (comment 36), although it seems a
little drastic.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Fri Oct 19 04:08:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 19 Oct 2007 09:08:41 +0100
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
References: <200710190215.l9J2F1bo006275@portal.open-bio.org>
<9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
Message-ID: <47186609.3090408@maubp.freeserve.co.uk>
Howard Salis wrote:
> Yes. Back-translating a sequence is important in codon optimization,
> searching for homologous proteins, etc.
Unlike forward translation, transcription, back-transcription,
complements and reverse complements, back-translation is not a
one-to-one mapping.
In your examples, would you want to know all:
- all possible back translations (as unambigous nucleotides)
- all possible back translations (as ambigous nucleotides)
- a possible back translation (using ambiguous nucleotides)
- a possible back translation (using un-ambiguous nucleotides)
For example, back translating an Tyr => UAC or UAU => UAW (nice and
clear - we can represent this perfectly with a single ambiguous codon).
On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN
Oh, and would you expect DNA or RNA back?
Peter
From salish at picasso.ucsf.edu Fri Oct 19 12:31:49 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Fri, 19 Oct 2007 09:31:49 -0700
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <47186609.3090408@maubp.freeserve.co.uk>
References:
<200710190215.l9J2F1bo006275@portal.open-bio.org>
<9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
<47186609.3090408@maubp.freeserve.co.uk>
Message-ID: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
Yes, I know it's a one-to-many mapping. But that's why it's nice to
have a handy subroutine for doing it.
For codon optimization, all possible back translations with
unambiguous nucleotides would be best. Then, one evaluates some
objective function over all possible sequences to find an optimal one.
Optimality depends on the application, but eliminating restriction
sites, avoiding certain repetitive or transposon sequences, etc is
very common.
For searching for homologous proteins, it would be best to have the
back-translate function produce something that could be fed into an
alignment program or regexp expression. Then, one could align a
database of sequences with your back-translated protein to determine
which sequence is most similar to your protein. Basically, this is
what BlastP does (you might want to look up its algorithm to determine
a good way of doing this, if you wish to reproduce it in Biopython or,
otherwise, rely on the NCBI webserver).
What does the current back-translate function output?
-Howard
On 10/19/07, Peter wrote:
> Howard Salis wrote:
> > Yes. Back-translating a sequence is important in codon optimization,
> > searching for homologous proteins, etc.
>
> Unlike forward translation, transcription, back-transcription,
> complements and reverse complements, back-translation is not a
> one-to-one mapping.
>
> In your examples, would you want to know all:
> - all possible back translations (as unambigous nucleotides)
> - all possible back translations (as ambigous nucleotides)
> - a possible back translation (using ambiguous nucleotides)
> - a possible back translation (using un-ambiguous nucleotides)
>
> For example, back translating an Tyr => UAC or UAU => UAW (nice and
> clear - we can represent this perfectly with a single ambiguous codon).
> On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN
>
> Oh, and would you expect DNA or RNA back?
>
> Peter
>
>
From bugzilla-daemon at portal.open-bio.org Mon Oct 22 05:07:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Oct 2007 05:07:05 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710220907.l9M975hw013729@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-22 05:07 EST -------
Marking as fixed
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 22 08:30:59 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Oct 2007 13:30:59 +0100
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk>
<9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
Message-ID: <471C9803.8050709@maubp.freeserve.co.uk>
Howard Salis wrote:
>
> What does the current back-translate function output?
>
Short example,
>>> from Bio import Translate
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet.IUPAC import unambiguous_dna
>>> my_dna = Seq("GCCGCATGCATAGATAGATAG", unambiguous_dna)
>>> my_prot = Translate.unambiguous_dna_by_id[11].translate(my_dna)
>>> my_prot
Seq('AACIDR*', HasStopCodon(IUPACProtein(), '*'))
>>> Translate.unambiguous_dna_by_id[11].back_translate(my_prot)
Seq('GCTGCTTGTATTGATCGTTAA', IUPACUnambiguousDNA())
>>> my_dna
Seq('GCCGCATGCATAGATAGATAG', IUPACUnambiguousDNA())
i.e. The current back_translate picks one possible back translation.
Peter
From bugzilla-daemon at portal.open-bio.org Mon Oct 22 12:52:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Oct 2007 12:52:12 -0400
Subject: [Biopython-dev] [Bug 2386] New: Bio.Seq.Seq and MutableSeq count()
method only works for single residues
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
Summary: Bio.Seq.Seq and MutableSeq count() method only works for
single residues
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Logging this bug to report an issue raised on the mailing list by Jimmy
Musselwhite.
The Seq object and MutableSeq objects' count methods only works for single
residues (e.g. "G"). Zero is returned when asked to count a multicharacter
string, "GG" for example.
For compatibility with strings, my_seq.count("GG") should work as expected,
returning the same value as my_seq.tostring().count("GG")
Doing this for the Seq object is trivial. Adding support for the MutableSeq
could be done via the tostring() method but there might be a more elegant
solution with less overhead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 23 20:46:58 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 23 Oct 2007 20:46:58 -0400
Subject: [Biopython-dev] Removing deprecated functionality
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
Hi everybody,
We now have a fixed Biopython in CVS that works with both the old and the new
mxTextTools. I am planning to create the new Biopython release later this
week.
Bio.Kabat and the blast and blasturl functions in Bio.Blast.NCBIWWW were
deprecated in previous Biopython. The two functions in Bio.Blast.NCBIWWW have
been deprecated in favor of qblast starting with Biopython 1.40b (February
2005). I am planning to remove this functionality for release 1.44 -- let us
know if this would cause you some problems.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 12:58:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 12:58:19 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710251658.l9PGwJgB007432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 12:58 EST -------
Marking as fixed, Michiel made the changes outlined in comment 36 in CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 13:02:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 13:02:50 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251702.l9PH2oC8008104@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Uppdated lcc code. |Updated Bio.lcc code.
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 13:02 EST -------
Sebastian - any feedback on my above comment?
P.S. Corrected spelling in bug title.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 13:22:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 13:22:46 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251722.l9PHMkFm009816@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
------- Comment #4 from sbassi at gmail.com 2007-10-25 13:22 EST -------
(In reply to comment #3)
> Sebastian - any feedback on my above comment?
>
> P.S. Corrected spelling in bug title.
>
I do agree with most of your comments, but I can't implement them right now
because of my current workload.
LCC stands for Local Composition Complexity (see here
http://mrw.interscience.wiley.com/emrw/9780470015902/els/article/a0005260/current/abstract)
Please move it to Bio/SeqUtils/.
I don't know the values for ambiguous nucleotides, I would ckeck this for next
version.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 14:01:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 14:01:50 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251801.l9PI1oRF012742@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:01 EST -------
I've checked this in as Bio/SeqUtils/lcc.py (and deprecated Bio/lcc.py which
had a slightly different API since you dropped the start/end options in
lcc_mult).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 14:25:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 14:25:49 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251825.l9PIPnEG015022@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:25 EST -------
Also updated Bio/SeqUtils/lcc.py to cope with Seq and MutableSeq objects in
addition to strings.
Plus added a unit test, test_SeqUtils.py which covers both Bio.SeqUtils.lcc and
Bio.SeqUtils.CheckSum and replaces my older test_CheckSum.py
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:03:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 18:03:15 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710252203.l9PM3For029293@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 18:03 EST -------
Created an attachment (id=795)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=795&action=view)
Rough patch to add methods to Bio.Seq
Part of this patch is to add ambiguous_generic_by_id and
ambiguous_generic_by_name entries to Bio.Data.CodonTable, variants of the
unambiguous generic_by_id and generic_by_name tables. Using this lets us
properly translate ambiguous sequences.
This patch does NOT tackle back_translate, or have special treatment of
start/stop codons, in the Seq methods.
This patch does NOT mark Bio.Translate or Bio.Transcribe as deprecated.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Thu Oct 25 22:30:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 25 Oct 2007 22:30:56 -0400
Subject: [Biopython-dev] CVS freeze
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Hi everybody,
With the fixed Biopython now in CVS, I'm planning to make the new Biopython
release tomorrow (Saturday). I'd therefore like to ask you not to make
commits to CVS starting from 0:01 GMT on Saturday, until the new release is
out. If you make any commits before that time, please double-check that all
the Biopython tests still run. Also, if you have some important patches for
which you need more time, please let us know.
Thanks!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From bsouthey at gmail.com Fri Oct 26 11:12:14 2007
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 26 Oct 2007 10:12:14 -0500
Subject: [Biopython-dev] CVS freeze
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID:
Hi,
Just in case you are not aware of it, UniProt is going to make a
substantial change to the DE line in SwissProt/TrEMBL format 'Not
before: 01-Feb-2008'. See
http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
Bruce
On 10/25/07, Michiel De Hoon wrote:
> Hi everybody,
>
> With the fixed Biopython now in CVS, I'm planning to make the new Biopython
> release tomorrow (Saturday). I'd therefore like to ask you not to make
> commits to CVS starting from 0:01 GMT on Saturday, until the new release is
> out. If you make any commits before that time, please double-check that all
> the Biopython tests still run. Also, if you have some important patches for
> which you need more time, please let us know.
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From biopython-dev at maubp.freeserve.co.uk Fri Oct 26 11:24:57 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 16:24:57 +0100
Subject: [Biopython-dev] DE line in SwissProt/TrEMBL format
In-Reply-To:
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID: <472206C9.6060407@maubp.freeserve.co.uk>
Bruce Southey wrote:
> Hi,
> Just in case you are not aware of it, UniProt is going to make a
> substantial change to the DE line in SwissProt/TrEMBL format 'Not
> before: 01-Feb-2008'. See
> http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
>
> Bruce
Thanks for the heads up. I don't think we need to worry about that for
the planned release. We should still be able to parse the new files,
its just the new structured description will be stored as a single
concatenated string in Biopython.
Peter
From mdehoon at c2b2.columbia.edu Fri Oct 26 23:12:46 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 26 Oct 2007 23:12:46 -0400
Subject: [Biopython-dev] CVS freeze
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B644@mail2.exch.c2b2.columbia.edu>
Thanks for letting us know. I think that it is OK though to make the release
now, as we'll probably have another release before the date the
SwissProt/TrEMBL format changes.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Bruce Southey [mailto:bsouthey at gmail.com]
Sent: Fri 10/26/2007 11:12 AM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] CVS freeze
Hi,
Just in case you are not aware of it, UniProt is going to make a
substantial change to the DE line in SwissProt/TrEMBL format 'Not
before: 01-Feb-2008'. See
http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
Bruce
On 10/25/07, Michiel De Hoon wrote:
> Hi everybody,
>
> With the fixed Biopython now in CVS, I'm planning to make the new Biopython
> release tomorrow (Saturday). I'd therefore like to ask you not to make
> commits to CVS starting from 0:01 GMT on Saturday, until the new release is
> out. If you make any commits before that time, please double-check that all
> the Biopython tests still run. Also, if you have some important patches for
> which you need more time, please let us know.
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From mdehoon at c2b2.columbia.edu Sun Oct 28 02:32:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:32:40 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Hi everybody,
Biopython release 1.44 is now available for download from the Biopython
website at http://biopython.org.
This release includes lots of code improvements and fixes in the Blast
interface and parsers, sequence input/output, the SwissProt parser, the
clustering routines, as well as a brand new module for population genetics.
For reasons of compatibility, some radical changes were necessary in some
parts of the code; please let us know if you find some functionality missing.
My thanks to all code contributers who made this new release possible.
--Michiel on behalf of the Biopython developers
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Sun Oct 28 02:35:12 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:35:12 -0400
Subject: [Biopython-dev] Non-ASCII character in PopGen documentation
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
While making the 1.44 release, I noticed that a non-ascii character in a
formula in the PopGen documentation was causing problems for Hevea. As I
couldn't guess what the formula should be, I replaced this formula by a
placeholder (see CVS). Can somebody have a look at this and try to fix it?
Thanks!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 05:43:56 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 09:43:56 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <472459DC.3050907@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Grand. Thank you for dedicating your weekend to this Michiel.
A couple of questions, the main Wiki page is locked and needs updating
to mention the new release. Who has access?
Secondly, I see you have updated the open-bio news feed,
http://news.open-bio.org/
What about http://biopython.org/news/ - which appears not to have been
used at all since it was started. Perhaps we can just have a filtered
view of http://news.open-bio.org/ here?
Related to this, on the wiki News page perhaps we can just show the last
few items from http://news.open-bio.org/index.rdf (even though some of
them are for other Bio* projects).
Peter
From tiagoantao at gmail.com Sun Oct 28 14:24:55 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 18:24:55 +0000
Subject: [Biopython-dev] Non-ASCII character in PopGen documentation
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724D3F7.40105@gmail.com>
I currently have a different version of Tutorial.tex here (I have other
things already written in preparation for future versions). I don't know
how the non-ascii character got there. The formula is:
\[ N_{m} = \frac{1 - F_{st}}{4F_{st}} \]
Michiel De Hoon wrote:
> While making the 1.44 release, I noticed that a non-ascii character in a
> formula in the PopGen documentation was causing problems for Hevea. As I
> couldn't guess what the formula should be, I replaced this formula by a
> placeholder (see CVS). Can somebody have a look at this and try to fix it?
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
tiagoantao at gmail.com
http://tiago.org/ps
From tiagoantao at gmail.com Sun Oct 28 16:54:06 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 20:54:06 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724F6EE.50805@gmail.com>
Hi,
Michiel De Hoon wrote:
> This release includes lots of code improvements and fixes in the Blast
> interface and parsers, sequence input/output, the SwissProt parser, the
> clustering routines, as well as a brand new module for population genetics.
> For reasons of compatibility, some radical changes were necessary in some
> parts of the code; please let us know if you find some functionality missing.
Is it OK to resume uploading non stable code to CVS? I have a few things
that I would like to add to the population genetics module (coalescent
simulation), but that still needs polishing (mainly documenting and test
code).
Tiago
--
tiagoantao at gmail.com
http://tiago.org/ps
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 16:55:42 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 20:55:42 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <4724F6EE.50805@gmail.com>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <4724F74E.5010801@maubp.freeserve.co.uk>
Tiago Antao wrote:
> Is it OK to resume uploading non stable code to CVS? I have a few things
> that I would like to add to the population genetics module (coalescent
> simulation), but that still needs polishing (mainly documenting and test
> code).
Wait and see what Michiel's says. However, perhaps you should hold off
a few more days - in case there are any teething problems with the
Biopython 1.44 that would warrant making another release ASAP.
Peter
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 15:59:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 19:59:41 +0000
Subject: [Biopython-dev] mxTextTools optional?
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724EA2D.3020609@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Given some of the changes (deprecations) made in this release, perhaps
we can now change setup.py so that mxTextTools is merely suggested, but
not required (the same way we treat reportlab and Numeric).
Any comments?
Peter
From mdehoon at c2b2.columbia.edu Sun Oct 28 21:12:48 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 21:12:48 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<472459DC.3050907@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B647@mail2.exch.c2b2.columbia.edu>
> Michiel De Hoon wrote:
> > Hi everybody,
> >
> > Biopython release 1.44 is now available for download from the Biopython
> > website at http://biopython.org.
>
> Grand. Thank you for dedicating your weekend to this Michiel.
>
> A couple of questions, the main Wiki page is locked and needs updating
> to mention the new release. Who has access?
Apparently I do. I updated this page now.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2913 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071028/20e03942/attachment.bin
From mdehoon at c2b2.columbia.edu Sun Oct 28 21:57:18 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 21:57:18 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B648@mail2.exch.c2b2.columbia.edu>
> Is it OK to resume uploading non stable code to CVS? I have a few things
> that I would like to add to the population genetics module (coalescent
> simulation), but that still needs polishing (mainly documenting and test
> code).
Sure, go ahead.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Sun Oct 28 22:01:16 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 22:01:16 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B649@mail2.exch.c2b2.columbia.edu>
On second thought, I agree with Peter's suggestion of waiting a few days to
see if any disasters show up with the 1.44 release. Sorry!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Tiago Antao [mailto:tiagoantao at gmail.com]
Sent: Sun 10/28/2007 4:54 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Biopython release 1.44 ready
Hi,
Michiel De Hoon wrote:
> This release includes lots of code improvements and fixes in the Blast
> interface and parsers, sequence input/output, the SwissProt parser, the
> clustering routines, as well as a brand new module for population genetics.
> For reasons of compatibility, some radical changes were necessary in some
> parts of the code; please let us know if you find some functionality
missing.
Is it OK to resume uploading non stable code to CVS? I have a few things
that I would like to add to the population genetics module (coalescent
simulation), but that still needs polishing (mainly documenting and test
code).
Tiago
--
tiagoantao at gmail.com
http://tiago.org/ps
From mdehoon at c2b2.columbia.edu Sun Oct 28 22:02:12 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 22:02:12 -0400
Subject: [Biopython-dev] mxTextTools optional?
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724EA2D.3020609@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64A@mail2.exch.c2b2.columbia.edu>
The fewer software packages Biopython requires, the better, to keep things
simple for users, not to mention developers. For a future release, we should
check if the modules that still rely on mxTextTools can be replaced by
pure-Python code.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Sun 10/28/2007 3:59 PM
To: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] mxTextTools optional?
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Given some of the changes (deprecations) made in this release, perhaps
we can now change setup.py so that mxTextTools is merely suggested, but
not required (the same way we treat reportlab and Numeric).
Any comments?
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 3310 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071028/1f83a5f2/attachment.bin
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 13:21:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 13:21:03 -0400
Subject: [Biopython-dev] [Bug 2390] New: Error importing Swiss Prot in BioSQL
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
Summary: Error importing Swiss Prot in BioSQL
Product: Biopython
Version: Not Applicable
Platform: Macintosh
OS/Version: MacOS X
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: Biosql at hotmail.com
Hello,
I already submitted this problem in the mailing list, where I can't import the
SwissProt flat file in BioSQL, even with the new version (1.44) of Biopython.
Here's the script I'm using :
from BioSQL import BioSeqDatabase
from Bio.SwissProt import SProt
server = BioSeqDatabase.open_database(driver = 'MySQLdb', user = '', passwd =
'', host = 'localhost', db = 'bioseqdb')
s_parser = SProt.SequenceParser()
s_iterator = SProt.Iterator(open('path to/uniprot_sprot.dat', 'r'), s_parser)
db = server.new_database('Swiss')
db.load(s_iterator)
And here's the error:
Traceback (most recent call last):
File '', line 1, in
File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 414, in
load
db_loader.load_seqrecord(cur_record)
File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 250, in
_load_bioentry_table
version))
File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 277, in
execute
self.cursor.execute(sql, args or ())
File '/sw/lib/python2.5/site-packages/MySQLdb/cursors.py', line 151, in
execute
query = query % db.literal(args)
TypeError: not all arguments converted during string formatting
Thanks for the help !
Jonathan
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 13:23:54 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 13:23:54 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710291723.l9THNsun017818@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #1 from Biosql at hotmail.com 2007-10-29 13:23 EST -------
Created an attachment (id=799)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=799&action=view)
Sample of Swiss Prot flat file
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 15:19:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 15:19:01 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710291919.l9TJJ1O2026999@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-29 15:19 EST -------
I'm trying to narrow down the problem:
* Have you tried different input SwissProt files?
* Have you tried a GenBank file (using the GenBank parser)?
* Did you check the username/password as suggested on the mailing list (empty
strings look wrong to me)?
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 15:58:45 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 19:58:45 +0000
Subject: [Biopython-dev] BioRegistry, Bio.db
Message-ID: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
While looking over the Tutorial this evening (and making some sequence
related updates), I and noticed that the section "BioRegistry ?
automatically ?nding sequence sources" (in the Cook Book chapter)
doesn't work anymore.
I believe that Bio.db is setup by the complicated and un-commented
code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was
commented out for Biopython 1.44
Does anyone use this module? I've never really looked at it in depth,
but it looks interesting and perhaps worth saving. Note if we do want
to resurrect it, it needs a unit test.
At first glance, the only Martel dependency here is for recognising
error conditions and giving nice messages instead. If that's all it
is used for, then perhaps we can switch to regular expressions
instead.
Peter
From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 17:39:50 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 21:39:50 +0000
Subject: [Biopython-dev] Removing deprecated functionality
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00710291439t6f636964i9681e2b0c90e6c96@mail.gmail.com>
On 10/24/07, Michiel De Hoon wrote:
> Bio.Kabat and ,,, were deprecated in previous Biopython. ....
> I am planning to remove this functionality for release 1.44
I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is
it OK if we remove the now empty directory as well?
Peter
From mdehoon at c2b2.columbia.edu Mon Oct 29 21:06:38 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon, 29 Oct 2007 21:06:38 -0400
Subject: [Biopython-dev] Removing Bio.Kabat
References: <320fb6e00710291438x3f7d7d57t77b06e4b2221c470@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64E@mail2.exch.c2b2.columbia.edu>
As far as I know, it is not possible to remove a directory in CVS. See
http://www.thathost.com/wincvs-howto/cvsdoc/cvs_7.html#SEC69
I believe that it is possible to remove a directory by hand from the CVS
source tree, but it is not the official way to do it. Hopefully, we can
remove directories once we're using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com]
Sent: Mon 10/29/2007 5:38 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Removing Bio.Kabat
On 10/24/07, Michiel De Hoon wrote:
> Bio.Kabat and ,,, were deprecated in previous Biopython. ....
> I am planning to remove this functionality for release 1.44
I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is
it OK if we remove the now empty directory as well?
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 08:25:20 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 08:25:20 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301225.l9UCPKjo026963@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 08:25 EST -------
Created an attachment (id=800)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=800&action=view)
Patch to Bio/Seq.py count methods
Lets the Seq and MutableSeq count methods take either a single letter or a
multiple letter argument, which can be strings, Seq objects or MutableSeq
objects. Adds doc-strings
Includes a trivial mini-test which would be used in the Seq unit test instead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chris.lasher at gmail.com Tue Oct 30 10:17:29 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Tue, 30 Oct 2007 10:17:29 -0400
Subject: [Biopython-dev] Biopython SVN Transition OK'd
Message-ID: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
Hi all,
Biopython just got the okay from OpenBio to transition from CVS to
Subversion--a good step in the right direction (though I've recently
started transitioning from SVN to Bazaar VCS). All we have to do is
come up with a date when the CVS repository can be locked down and
taken offline.
Also, I need to know what is needed from me in terms of helping all
the devs migrate to SVN. I produced a screencast series on Subversion
at
http://showmedo.com/videos/series?name=bfNi2X3Xg
and there is a transition guide at
http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
Would providing links to these on the wiki be sufficient? What further
information would you like to know? Subversion is not a radical
departure from CVS and many of the commands are a one-to-one mapping.
The biggest difference is commits occur for the whole repository, not
on a per-file basis, and directories are tracked, as well.
Let's get a discussion on this and set a date soon.
Chris
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 10:25:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 10:25:01 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301425.l9UEP19U002945@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #2 from dalloliogm at gmail.com 2007-10-30 10:25 EST -------
The new code is good, but please consider about implementing case-insensitive
searches:
>>> Seq('AACCCCaa').count('a')
... 2
>>> Seq('AACCCCaa').count('a', 'i')
... 4
they could be useful in many cases, because sometimes one has to deal
mixed-case sequences.
I think the easiest way to implement this would be by using regular
expressions..
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 14:02:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 14:02:49 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710301802.l9UI2n1J020073@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #3 from Biosql at hotmail.com 2007-10-30 14:02 EST -------
(In reply to comment #2)
> I'm trying to narrow down the problem:
> * Have you tried different input SwissProt files?
> * Have you tried a GenBank file (using the GenBank parser)?
> * Did you check the username/password as suggested on the mailing list (empty
> strings look wrong to me)?
>
> Peter
>
I'm sorry Peter, the reply you sent me on the mailing list was cut in half and
I didn't see the rest of your message until I've read it directly on the
mailing list.
I tried to parse the cor6_6.gb with the Genbank parser and I'm getting the same
result, sorry I didn't tried this before. I also tried what you suggest with
the SeqIO module with the cor6_6.gb and also a SwissProt file and I'm still
getting the same TypeError, which is :
Traceback (most recent call last):
File "DB_Gen.py", line 25, in
db.load(iterator)
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in
load
db_loader.load_seqrecord(cur_record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in
_load_bioentry_table
version))
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in
execute
self.cursor.execute(sql, args or ())
File "build/bdist.macosx-10.4-ppc/egg/MySQLdb/cursors.py", line 151, in
execute
TypeError: not all arguments converted during string formatting
It seems to me that the problem could be with the MySQLdb module, but I don't
understant since I'm using the latest release 1.2.2c1, but I've also tried it
with the stable 1.2.2 release.
Am I right ?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:06:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:06:38 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301906.l9UJ6cDZ023596@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST -------
I really don't want to make the Seq count method different to the python string
count method.
Speaking of which, the string uses count(sub [, start[, end]]) to allow
searching with a optional start and further optional end index. I should
probably add that.
In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is
a simple enough way of doing things. Counting case insensistive variants of a
longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the
python re library would work directly on Seq objects (without having to
explicitly turn them into strings first).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:06:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:06:52 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710301906.l9UJ6q7l023634@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST -------
Thanks for that. It looks like we can *probably* rule out a problem in the
sequence parsing.
Unfortunately I personally haven't used BioSQL myself (yet), and don't have a
system setup here I can try this on.
It appears (just from reading the stack error) that there is some mis-match
between the SQL query (which I assume contains python % placeholders) and the
list of arguments (to go in these placeholders).
If you fancy trying to investigate this further yourself, I would start by
adding a break point on BioSQL/BioSeqDatabase.py line 277 to check out what
contents of the sql and args variables are. Or, just add some print statements
just before line 277: self.cursor.execute(sql, args or ())
I hope someone else on the mailing list will have some suggestions...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:22:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:22:30 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301922.l9UJMUoM024725@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #4 from howard.salis at gmail.com 2007-10-30 15:22 EST -------
How about the upper and lower methods for Seq classes?
Then, one could do my_seq.upper().count("ATG")
Would that work well?
-Howard
(In reply to comment #3)
> I really don't want to make the Seq count method different to the python string
> count method.
>
> Speaking of which, the string uses count(sub [, start[, end]]) to allow
> searching with a optional start and further optional end index. I should
> probably add that.
>
> In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is
> a simple enough way of doing things. Counting case insensistive variants of a
> longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the
> python re library would work directly on Seq objects (without having to
> explicitly turn them into strings first).
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 15:30:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 19:30:29 +0000
Subject: [Biopython-dev] Biopython SVN Transition
In-Reply-To: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
Message-ID: <47278655.8090300@maubp.freeserve.co.uk>
Chris Lasher wrote:
> Hi all,
>
> Biopython just got the okay from OpenBio to transition from CVS to
> Subversion--a good step in the right direction (though I've recently
> started transitioning from SVN to Bazaar VCS). All we have to do is
> come up with a date when the CVS repository can be locked down and
> taken offline.
I was wondering if anyone would start suggesting moving to git or
something else ;)
Michiel - are you expecting any complications from CVS to SVN regarding
the build process?
Another thought; will existing developer accounts "just work" on the SVN
system? Also do you (Chris) have CVS access, and if not do you need or
want it?
> Also, I need to know what is needed from me in terms of helping all
> the devs migrate to SVN. I produced a screencast series on Subversion
> at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a
> transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash
plugin working on my 64bit Ubuntu. I'll have to check that out on
Windows later in the week :)
If you are able to field any queries on the mailing list, that would
probably be fine.
> Would providing links to these on the wiki be sufficient?
If you could look after that aspect of the wiki, that would be great.
> What further information would you like to know? Subversion is not a
> radical departure from CVS and many of the commands are a one-to-one
> mapping. The biggest difference is commits occur for the whole
> repository, not on a per-file basis, and directories are tracked, as
> well.
The fact the CVS and SVN are relatively similar is probably one reason
why no-one has raised any real objections to the move.
> Let's get a discussion on this and set a date soon.
In terms of timing, how long do you/the OBF guys expect the transfer to
take? And would they prefer to do this over a weekend or mid week?
Barring any problems with Biopython 1.44 which would force us to do
another release in the very short term, I guess in the next fortnight is
reasonable (especially if we only expect a couple of days downtime).
Of course, I personally want to start working on the Seq objects and
alignments - and Tiago wants to get back to his Population Genetics module.
Peter
P.S. Would you or any of the people doing the transition be able to sort
out bug 2363?
http://bugzilla.open-bio.org/show_bug.cgi?id=2363
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:33:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:33:40 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301933.l9UJXedO025330@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:33 EST -------
Adding .upper() and .lower() methods is on my mental todo list, just a bit
lower down the my priorities than the .count() method (this bug) and biological
methods covered on bug 2381.
One of us should file an enhancement bug for .upper() and .lower()
I agree they are needed to make the Seq object more string like. However the
implementation is non-trivial due to the alphabet object (which may define a
case sensitive list of expected letters).
And yes, once these methods are supported then doing
my_seq.upper().count("ATG") would work fine.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 15:45:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:45:35 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710301945.l9UJjZlQ026374@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Make SeqRecord subclass Seq |Make Seq more like a string,
|subclass string? |even subclass string?
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:45 EST -------
I modified the title to focus on the Seq object.
See also bug 2386 (about the count method), and bug 2381 (about biological
methods).
(In reply to comment #4)
> (In reply to comment #3)
> > It does not add any .short() method to give a truncated representation
> > string like the current str() method gives.
>
> Why not? This new method should not cause any compatibility problem
Mainly because I'm not convinced that we need a .short() method, and its harder
to remove things at a later date (as people may be using them).
Surely my_seq[:50] or depending on the context, str(my_seq[:50]), is enough?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 18:32:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 18:32:12 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710302232.l9UMWCb3004960@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #5 from Biosql at hotmail.com 2007-10-30 18:32 EST -------
It seems that a %s is missing at line 243 in Loader.py, since there's only 6 %s
in the sql query, but 7 arguments are being fed for the loading of bioentry.
So I added an %s and the loading is fine, but another problem is arising after
this.
Traceback (most recent call last):
File "DB_Gen.py", line 25, in
db.load(iterator)
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 415, in
load
db_loader.load_seqrecord(cur_record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 253, in
_load_bioentry_table
bioentry_id = self.adaptor.last_id('bioentry')
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 148, in
last_id
return self.dbutils.last_id(self.cursor, table)
File "/sw/lib/python2.5/site-packages/BioSQL/DBUtils.py", line 35, in last_id
return cursor.insert_id()
AttributeError: 'Cursor' object has no attribute 'insert_id'
I'm gonna check it tommorow.
Jonathan
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 18:36:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 22:36:43 +0000
Subject: [Biopython-dev] BioRegistry, Bio.db
In-Reply-To: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
References: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
Message-ID: <4727B1FB.2020803@maubp.freeserve.co.uk>
Peter wrote:
> While looking over the Tutorial this evening (and making some sequence
> related updates), I noticed that the section "BioRegistry ?
> automatically ?nding sequence sources" (in the Cook Book chapter)
> doesn't work any more.
Does anyone here use this? Should I ask on the main list?
> I believe that Bio.db is setup by the complicated and un-commented
> code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was
> commented out for Biopython 1.44
Confirmed. After uncommenting the call to _load_registries() in
Bio/__init__.py the example in the tutorial using Bio.db works.
Note you do get a DeprecationWarning about the concurrent behaviour
provided by Bio.MultiProc, but I have not explored any further.
Thoughts?
Peter
From mdehoon at c2b2.columbia.edu Tue Oct 30 21:05:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 30 Oct 2007 21:05:22 -0400
Subject: [Biopython-dev] Biopython SVN Transition
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu>
> Michiel - are you expecting any complications from CVS to SVN regarding
> the build process?
For the build process, we are not doing anything very complicated with CVS,
so I doubt that there will be any major problems when we start using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Tue Oct 30 21:05:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 30 Oct 2007 21:05:22 -0400
Subject: [Biopython-dev] Biopython SVN Transition
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu>
> Michiel - are you expecting any complications from CVS to SVN regarding
> the build process?
For the build process, we are not doing anything very complicated with CVS,
so I doubt that there will be any major problems when we start using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2845 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071030/e1307bde/attachment.bin
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 21:30:20 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 21:30:20 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710310130.l9V1UKEN014287@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-30 21:30 EST -------
First, let's think about how a Seq object should look like, before getting into
implementation details.
In my opinion, a Seq object is essentially a string, but with some added
functionality that are useful in biological contexts. Currently, this is
limited to specifying an alphabet. Personally, I never used such an alphabet,
so in practice I prefer using a simple string instead of a Seq object.
However, if we extend its functionality, I think a Seq class can be useful
enough to warrant its existence in Biopython.
In short, to my mind a Seq object should have the following properties:
1) A Seq object is basically a string, so it should behave as if it were
subclassed from string.
2) As a result, functions that have a sequence as an argument, but don't need
the added features of a Seq object, should work with strings as well as Seq
objects.
3) The sequence should be mutable, so that we won't need a separate MutableSeq
class. This also implies that a Seq class cannot subclass from string, since
strings are not mutable.
4) Currently, Seq objects have an associated alphabet; SeqRecord objects have
annotations, dbxrefs, a description, features, id, and name. I think a new Seq
object should have both, so that we can avoid having both a Seq and a SeqRecord
class. Of course, some or all of these fields can remain None.
5) A Seq class should have methods that one expects from a sequence class, in
particular complement(), reverse_complement(), perhaps a modified count() that
can ignore case.
With respect to 3), we'd probably have to write such a Seq class in C.
The end result would be a Seq class that actually has some benefit to the user,
without requiring its use when a string suffices, and avoids having three
classes (Seq, MutableSeq, SeqRecord) for essentially the same thing.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chris.lasher at gmail.com Wed Oct 31 01:55:03 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 31 Oct 2007 01:55:03 -0400
Subject: [Biopython-dev] Biopython SVN Transition
In-Reply-To: <47278655.8090300@maubp.freeserve.co.uk>
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <128a885f0710302255y4c34ac8axa48f48b253d5854a@mail.gmail.com>
On 10/30/07, Peter wrote:
> I was wondering if anyone would start suggesting moving to git or
> something else ;)
I tried Git and didn't like it. Bazaar suits me much better, and it
even has support for SVN repositories with bzr-svn. Git is not truly
cross-platform. It performs terribly on Windows. This left me looking
at Mercurial (Hg) and Bazaar (bzr). I liked the direction that Bazaar
was moving in and their emphasis on testing with real unit/regression
tests. For those interested, you can see some of the "literature" I
read through on my del.icio.us page:
http://del.icio.us/gotgenes/dscm
> Another thought; will existing developer accounts "just work" on the SVN
> system? Also do you (Chris) have CVS access, and if not do you need or
> want it?
The existing developer accounts will "just work" because they're going
to do SVN over SSH. I have SSH access on the machine and CVS access as
well. Thanks for checking.
> > Also, I need to know what is needed from me in terms of helping all
> > the devs migrate to SVN. I produced a screencast series on Subversion
> > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a
> > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
>
> Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash
> plugin working on my 64bit Ubuntu. I'll have to check that out on
> Windows later in the week :)
Bummer! Does nspluginwrapper not work?
> If you are able to field any queries on the mailing list, that would
> probably be fine.
I'd be happy to do that.
Should this page be renamed to SVN to be in the same line as tho CVS page?
> > Would providing links to these on the wiki be sufficient?
>
> If you could look after that aspect of the wiki, that would be great.
At some point I had started this:
http://biopython.org/wiki/Subversion_migration
> > What further information would you like to know? Subversion is not a
> > radical departure from CVS and many of the commands are a one-to-one
> > mapping. The biggest difference is commits occur for the whole
> > repository, not on a per-file basis, and directories are tracked, as
> > well.
>
> The fact the CVS and SVN are relatively similar is probably one reason
> why no-one has raised any real objections to the move.
>
> > Let's get a discussion on this and set a date soon.
>
> In terms of timing, how long do you/the OBF guys expect the transfer to
> take? And would they prefer to do this over a weekend or mid week?
Not sure, let me ask Jason Stajich.
> Barring any problems with Biopython 1.44 which would force us to do
> another release in the very short term, I guess in the next fortnight is
> reasonable (especially if we only expect a couple of days downtime).
I think we could expect less than a full day downtime.
> Of course, I personally want to start working on the Seq objects and
> alignments - and Tiago wants to get back to his Population Genetics module.
By all means, continue using CVS until I get a firm date for the
Biopython Devs. Even if you have uncommitted changes when the CVS
server goes down, you can simply copy the files to your checked out
copy of the SVN repository and continue as is.
> P.S. Would you or any of the people doing the transition be able to sort
> out bug 2363?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2363
That's a very good question. I wonder if cvs2svn is capable of picking
up those errors in commits and choose the proper format. I had trouble
getting a hold of an expert who could tell me how to identify files
committed as binary files, and how to change that to text (or vice
versa). I should send an email to the Subversion mailing list,
perhaps, or the CVS list if it's still active. I'll also check to see
if Jason knows.
From bugzilla-daemon at portal.open-bio.org Wed Oct 31 05:54:24 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 31 Oct 2007 05:54:24 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710310954.l9V9sOw7014572@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-31 05:54 EST -------
> In short, to my mind a Seq object should have the following properties:
> 1) A Seq object is basically a string, so it should behave as if it were
> subclassed from string.
I agree, where possible the Seq object should act like a string.
In particular str(my_seq) should give the full string.
> 2) As a result, functions that have a sequence as an argument, but don't
> need the added features of a Seq object, should work with strings as well
> as Seq objects.
Again, I agree. I've doubled checked this works for some of the recently
updated SeqUtils functionality. I would hope we get this "for free" once the
Seq object itself becomes more string like.
> 3) The sequence should be mutable, so that we won't need a separate
> MutableSeq class. This also implies that a Seq class cannot subclass from
> string, since strings are not mutable.
Why? Python strings are not mutable, and this isn't usually a problem.
Personally, I have never needed a mutable sequence and have only ever used them
in test cases. Having the basic Seq non-mutable means we can leverage existing
string functionality and optimizations.
Also writing a new mutable sequence in C seems like a bit maintainance load in
the long term (and may complicate the cross platform build process). Surely we
can get good enough performance via the array of characters route currently
used?
On related remark: The fact that the current MutableSeq methods like
reverse_complement() work in-situ rather than returning a new object makes
switching between the Seq and MutableSeq fiddly.
> 4) Currently, Seq objects have an associated alphabet; SeqRecord objects
> [also] have annotations, dbxrefs, a description, features, id, and name.
> I think a new Seq object should have both, so that we can avoid having both
> a Seq and a SeqRecord class. Of course, some or all of these fields can
> remain None.
I don't really see the benefit over the current scheme. I'm happy with the
division between Seq and SeqRecord, but we could go for SeqRecord being a more
annotated subclass of the Seq class. This would be similar to Bioperl's Seq,
PrimarySeq, or RichSeq objects.
Something I do want to add is splicing for SeqRecords, which would return a new
SeqRecord with sensible name/id/description. I think for this to really be
useful we need to add "per residue annotation", such as lists or strings of
information the same length as the sequence (e.g. predicted secondary
structure, or sequencing quality scores) which would also get spliced when
splicing a SeqRecord.
> 5) A Seq class should have methods that one expects from a sequence class,
> in particular complement(), reverse_complement(), perhaps a modified count()
> that can ignore case.
Usually mixed case sequences are used for a reason, and the user may need both
case sensitive counts and case insensitive counts. I would keep .count() case
sensistive like a real string, and suggest .upper().count() as a simple
workarround for case in-sensitive counts.
Plus the Seq object should have methods for forward and back transcription and
translation, see Bug 2381
A more drastic change we could consider is getting rid of the alphabet as an
explicit property, and having ProteinSeq, NucleotideSeq, DnaSeq and RnaSeq
(decorator/sub)classes which would have only the relevant biological sequence
methods. We would lose the expected "letters" feature of the alphabet, but I
don't think this is really helpful at the moment because the Seq class does not
enforce it.
Otherwise I would advocate when creating a Seq object (or editing a MutableSeq
object) the new letters should be screened against self.alphabet.letters (if
present).
On balance I favour making gradual changes which don't change the current
scheme (Seq with Alphabet property; SeqRecord with Seq property). Anything
more drastic might best be pursued on a new branch which could become Biopython
2.0
P.S. We should try not to implicitly assume that the elements in a sequence are
single letters? What about when working with protein structures which contain
modified amino acids (with defined three letter codes) which do not map back to
single letters.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 2 09:09:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 2 Oct 2007 05:09:48 -0400
Subject: [Biopython-dev] [Bug 2362] test_copen fails on Windows XP as tries
os.fork()
In-Reply-To:
Message-ID: <200710020909.l9299moD015903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2362
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-10-02 05:09 EST -------
I removed test_copen.py from CVS and deprecated the Bio.MultiProc code.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 2 09:06:54 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 05:06:54 -0400
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Hi everybody,
Since no users of Bio.MultiProc came forward, I deprecated it for the
upcoming release.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
Sent: Tue 9/11/2007 10:37 AM
To: BioPython Developers List; biopython at biopython.org
Subject: [BioPython] Bio.MultiProc
Hi everybody,
In preparation for the upcoming release, I was running the Biopython
test suite and found that test_copen.py hangs on Cygwin. It doesn't
fail, it just sits there forever. This may be related to the use of
fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
is probably possible to fix this, I'd have to dig fairly deep into the
code, and I am not sure if it is worth it. It looks like the copen
functions are used only in Bio/config, which is needed for Bio.db. A
description of the functionality of thia module can be found in the
tutorial section 4.7.2.
Now, I don't remember users asking about this module on the mailing
list. From the tutorial documentation, it seems to be a nice piece of
code, but I doubt that it is being used often in practice.
So I was wondering:
1) Is anybody on this list using this code?
2) If not, can I mark it as deprecated for the upcoming release?
Hopefully, people who are using this code will notice, and let us know
that they need it.
--Michiel.
_______________________________________________
BioPython mailing list - BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
From idoerg at gmail.com Tue Oct 2 16:00:41 2007
From: idoerg at gmail.com (Iddo Friedberg)
Date: Tue, 2 Oct 2007 09:00:41 -0700
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
References: <46E6A845.3030601@c2b2.columbia.edu>
<6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID:
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:
1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).
Also, is it possible to track down the original author?
./I
On 10/2/07, Michiel De Hoon wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
I. Friedberg
"The only problem with troubleshooting is that
sometimes trouble shoots back."
From biopython-dev at maubp.freeserve.co.uk Tue Oct 2 16:55:53 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 02 Oct 2007 17:55:53 +0100
Subject: [Biopython-dev] Bio.MultiProc / Bio.FormatIO
In-Reply-To:
References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <47027819.1010207@maubp.freeserve.co.uk>
Iddo Friedberg wrote:
> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?
That is sort of what Michiel did - he's just added a deprecation
warning, but not touched the code itself.
This isn't an option for some of the more "integrated" bits of code like
Bio.FormatIO which I suggested removing in Bug 2361 (see also my email
to the main list on 19 September):
http://bugzilla.open-bio.org/show_bug.cgi?id=2361#c27
Peter
From rhaygood at duke.edu Tue Oct 2 23:59:43 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 2 Oct 2007 19:59:43 -0400 (EDT)
Subject: [Biopython-dev] Statistics code
In-Reply-To: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
Message-ID:
Tiago,
Sorry to be so long replying---I've been almost drowning in work.
Use anything you find useful in my code. If you do write an article
about it, I'd be glad to be a coauthor, not just in name but actually
to help with writing the discussion of sequence statistics.
There *is* a lot of stuff in my code, not all of it generally
important. For example, few people will care about indel statistics,
beyond counting them and maybe getting the frequency distribution of
their lengths. The things most people will care about are K (the
number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu
and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing.
As for ambiguous nucleotides, my code handles them in one of two ways,
at the programmer's option. By default, a site at which any sequence
in the alignment contains an ambiguous nucleotide is ignored; for
example,
ACRGTY
ACAGTC
is effectively equivalent to
ACGT
ACGT .
However, if the 'expand_diplotypes' option is specified when the
Sample object is constructed, each sequence in the alignment is
interpreted as a diplotype and converted into a pair of pseudo-
haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K)
being interpreted as heterozygous; for example,
ACRGTY
ACAGTC
is effectively equivalent to
ACAGTC
ACGGTT
ACAGTC
ACAGTC .
In expand_diplotypes mode, sites containing three- or four-fold
ambiguous nucleotides are still ignored. Also, you'll get a warning
if you request a statistic that depends on correct SNP phasing, which
most statistics don't. So far, I've found these two operating modes
sufficient for my needs.
I think your plan sounds very reasonable, just adding sequence
statistics at a pace that's comfortable for you. Any time you have
questions, feel free to ask me, and I'll give you whatever benefit
there is in my opinion and experience.
I'm happy for all this to happen on biopython-dev, so that other
people (e.g., Alex Lancaster) can add to it. I'll leave it to the
core developers to tell us if we're too noisy. (I'd recommend still
sending messages to me with copies to biopython-dev, however, so that
I don't accidentally miss them on biopython-dev, which I don't always
read carefully.)
Ralph
On Sat, 29 Sep 2007, Tiago Ant?o wrote:
> Hi Ralph,
>
> Hope all is good with you. I am now finally starting to commit
> statistics code to Biopython. But before I go ahead I would like to
> ask some advice to you (plus some extra comments):
>
> About code merging and authorship:
>
> I am finally looking to your code. There is really lots of stuff
> there! Would it be OK with you if I merged your code with mine into
> Bio.PopGen.Stats? Obviously the copyright/authorship for the module
> would be co-shared as would any authorship of any article deriving
> from it...
>
> About a strategy to advance:
>
> 1. I personally don't have any experience, really, with working with
> sequence data (My background are SNPs, microsatellites/STRs, AFLPs and
> that sort of stuff)
> 2. Starting on Monday I am beginning a PhD which will require, part
> time, sequence analysis
> 3. What I mean from 1 and 2 is that I currently don't have maturity to
> architect and design a good framework for sequence analysis but I will
> gain it with time.
> My plan is then to defer all sequence code until I fell I know what I
> am doing (although I was still thinking in providing something like
> BioPerl's facility of extracting all SNPs from sequences)
> If this is OK with you I plan to start committing code the week
> starting on this Monday,
>
> About request for insight:
>
> If you have any comments to offer on issues regarding representing
> indels and ambiguous data (ie ambiguous nucleotides) they might be
> useful, as I suppose that is the biggest issue that makes me afraid of
> sequence code.
>
>
> Finally: I would summarize our discussion here on biopython-dev (I am
> not taking it there directly just because you might not want your code
> on Biopython or might want it in other terms).
>
> Thanks,
> Tiago
>
From mdehoon at c2b2.columbia.edu Wed Oct 3 00:18:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 2 Oct 2007 20:18:59 -0400
Subject: [Biopython-dev] [BioPython] Bio.MultiProc
References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu>
> Would it be possible to include the module, comment out the unworkable
> source code and print a deprecation warning when it is imported?
That is what I did.
> 3) Leave an option of fixing and commenting the code back in (i.e. it is
not
> lost forever).
Even after removing the code in some future release, the code will not be
lost forever. It can always be retrieved from CVS and from older Biopython
releases.
> Also, is it possible to track down the original author?
That would be Jeff Chang.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Iddo Friedberg [mailto:idoerg at gmail.com]
Sent: Tue 10/2/2007 12:00 PM
To: Michiel De Hoon
Cc: BioPython Developers List; biopython at biopython.org
Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc
Would it be possible to include the module, comment out the unworkable
source code and print a deprecation warning when it is imported? That was
we:
1) Don't have a clunky module BUT
2) we warn anyone who uses it (but didn't happen to read your post) that it
is deprecated when they install a new biopython version AND
3) Leave an option of fixing and commenting the code back in (i.e. it is not
lost forever).
Also, is it possible to track down the original author?
./I
On 10/2/07, Michiel De Hoon wrote:
>
> Hi everybody,
>
> Since no users of Bio.MultiProc came forward, I deprecated it for the
> upcoming release.
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon
> Sent: Tue 9/11/2007 10:37 AM
> To: BioPython Developers List; biopython at biopython.org
> Subject: [BioPython] Bio.MultiProc
>
> Hi everybody,
>
> In preparation for the upcoming release, I was running the Biopython
> test suite and found that test_copen.py hangs on Cygwin. It doesn't
> fail, it just sits there forever. This may be related to the use of
> fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it
> is probably possible to fix this, I'd have to dig fairly deep into the
> code, and I am not sure if it is worth it. It looks like the copen
> functions are used only in Bio/config, which is needed for Bio.db. A
> description of the functionality of thia module can be found in the
> tutorial section 4.7.2.
>
> Now, I don't remember users asking about this module on the mailing
> list. From the tutorial documentation, it seems to be a nice piece of
> code, but I doubt that it is being used often in practice.
>
> So I was wondering:
> 1) Is anybody on this list using this code?
> 2) If not, can I mark it as deprecated for the upcoming release?
> Hopefully, people who are using this code will notice, and let us know
> that they need it.
>
> --Michiel.
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
I. Friedberg
"The only problem with troubleshooting is that
sometimes trouble shoots back."
From tiagoantao at gmail.com Wed Oct 3 10:14:33 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 11:14:33 +0100
Subject: [Biopython-dev] Coalescent code
Message-ID: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
Hi,
I had a plan of starting to commit statistical related code this
weekend, but (contrary to my expectations) I am having requests for
the coalescent code. As such, I am planning to commit the coalescent
code instead.
It is quite straightforward code, with only one issue that I would
require advice: Some of the code (regarding modeling demographies)
requires some templates (very small text files, circa 10 of around 700
bytes each) to go along. Where should I put the files in Biopython?
Also, on installation those files have to be put somewhere...
Tiago
--
http://www.tiago.org/ps
From biopython-dev at maubp.freeserve.co.uk Wed Oct 3 14:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 03 Oct 2007 15:18:21 +0100
Subject: [Biopython-dev] Coalescent code
In-Reply-To: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
Message-ID: <4703A4AD.7030008@maubp.freeserve.co.uk>
Tiago Ant?o wrote:
> It is quite straightforward code, with only one issue that I would
> require advice: Some of the code (regarding modeling demographies)
> requires some templates (very small text files, circa 10 of around 700
> bytes each) to go along. Where should I put the files in Biopython?
> Also, on installation those files have to be put somewhere...
There is a similar precedent with Bio/EUtils/DTDs (where the data files
are XML DTD files). I guess you could have the 10 plain text data files
in with the python files (or under a subdirectory). Opinions?
I should really refresh myself on current python packaging guidelines...
Peter
From tiagoantao at gmail.com Wed Oct 3 15:37:17 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 16:37:17 +0100
Subject: [Biopython-dev] Statistics code
In-Reply-To:
References: <6d941f120709291328q6a9aae97kdcf489549cc9b3f0@mail.gmail.com>
Message-ID: <6d941f120710030837k1aa2d4ak7eca8e6e27e35fdd@mail.gmail.com>
Ralph,
Thanks for the detailed explanation. Because of a couple of requests I
had, I am going to commit first the coalescent code, but after the
coalescent code is in, I will pick this up.
Tiago
On 10/3/07, Ralph Haygood wrote:
> Tiago,
>
> Sorry to be so long replying---I've been almost drowning in work.
>
> Use anything you find useful in my code. If you do write an article
> about it, I'd be glad to be a coauthor, not just in name but actually
> to help with writing the discussion of sequence statistics.
>
> There *is* a lot of stuff in my code, not all of it generally
> important. For example, few people will care about indel statistics,
> beyond counting them and maybe getting the frequency distribution of
> their lengths. The things most people will care about are K (the
> number of polymorphic sites), Watterson's theta, pi, Tajima's D, Fu
> and Li's D, Fay and Wu's H, F_ST, and McDonald--Kreitman testing.
>
> As for ambiguous nucleotides, my code handles them in one of two ways,
> at the programmer's option. By default, a site at which any sequence
> in the alignment contains an ambiguous nucleotide is ignored; for
> example,
>
> ACRGTY
> ACAGTC
>
> is effectively equivalent to
>
> ACGT
> ACGT .
>
> However, if the 'expand_diplotypes' option is specified when the
> Sample object is constructed, each sequence in the alignment is
> interpreted as a diplotype and converted into a pair of pseudo-
> haplotypes, two-fold ambiguous nucleotides (R, Y, W, S, M, and K)
> being interpreted as heterozygous; for example,
>
> ACRGTY
> ACAGTC
>
> is effectively equivalent to
>
> ACAGTC
> ACGGTT
> ACAGTC
> ACAGTC .
>
> In expand_diplotypes mode, sites containing three- or four-fold
> ambiguous nucleotides are still ignored. Also, you'll get a warning
> if you request a statistic that depends on correct SNP phasing, which
> most statistics don't. So far, I've found these two operating modes
> sufficient for my needs.
>
> I think your plan sounds very reasonable, just adding sequence
> statistics at a pace that's comfortable for you. Any time you have
> questions, feel free to ask me, and I'll give you whatever benefit
> there is in my opinion and experience.
>
> I'm happy for all this to happen on biopython-dev, so that other
> people (e.g., Alex Lancaster) can add to it. I'll leave it to the
> core developers to tell us if we're too noisy. (I'd recommend still
> sending messages to me with copies to biopython-dev, however, so that
> I don't accidentally miss them on biopython-dev, which I don't always
> read carefully.)
>
> Ralph
>
> On Sat, 29 Sep 2007, Tiago Ant?o wrote:
>
> > Hi Ralph,
> >
> > Hope all is good with you. I am now finally starting to commit
> > statistics code to Biopython. But before I go ahead I would like to
> > ask some advice to you (plus some extra comments):
> >
> > About code merging and authorship:
> >
> > I am finally looking to your code. There is really lots of stuff
> > there! Would it be OK with you if I merged your code with mine into
> > Bio.PopGen.Stats? Obviously the copyright/authorship for the module
> > would be co-shared as would any authorship of any article deriving
> > from it...
> >
> > About a strategy to advance:
> >
> > 1. I personally don't have any experience, really, with working with
> > sequence data (My background are SNPs, microsatellites/STRs, AFLPs and
> > that sort of stuff)
> > 2. Starting on Monday I am beginning a PhD which will require, part
> > time, sequence analysis
> > 3. What I mean from 1 and 2 is that I currently don't have maturity to
> > architect and design a good framework for sequence analysis but I will
> > gain it with time.
> > My plan is then to defer all sequence code until I fell I know what I
> > am doing (although I was still thinking in providing something like
> > BioPerl's facility of extracting all SNPs from sequences)
> > If this is OK with you I plan to start committing code the week
> > starting on this Monday,
> >
> > About request for insight:
> >
> > If you have any comments to offer on issues regarding representing
> > indels and ambiguous data (ie ambiguous nucleotides) they might be
> > useful, as I suppose that is the biggest issue that makes me afraid of
> > sequence code.
> >
> >
> > Finally: I would summarize our discussion here on biopython-dev (I am
> > not taking it there directly just because you might not want your code
> > on Biopython or might want it in other terms).
> >
> > Thanks,
> > Tiago
> >
--
http://www.tiago.org/ps
From tiagoantao at gmail.com Wed Oct 3 16:04:07 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 3 Oct 2007 17:04:07 +0100
Subject: [Biopython-dev] Coalescent code
In-Reply-To: <4703A4AD.7030008@maubp.freeserve.co.uk>
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com>
<4703A4AD.7030008@maubp.freeserve.co.uk>
Message-ID: <6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com>
Hi
On 10/3/07, Peter wrote:
> There is a similar precedent with Bio/EUtils/DTDs (where the data files
> are XML DTD files). I guess you could have the 10 plain text data files
> in with the python files (or under a subdirectory). Opinions?
In the mean time, I will start committing the code (I can easily
accommodate the details of the places to put the files later, when
there is a decision).
Michiel, please, please don't include SimCoal code that I will be
committing on the next public version.
Regards,
Tiago
From mdehoon at c2b2.columbia.edu Thu Oct 4 00:39:47 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Wed, 3 Oct 2007 20:39:47 -0400
Subject: [Biopython-dev] Coalescent code
References: <6d941f120710030314g73e38aa4w8c3b473eeaa18cc9@mail.gmail.com><4703A4AD.7030008@maubp.freeserve.co.uk>
<6d941f120710030904k70b098dcnbbc40bc3420ea831@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62E@mail2.exch.c2b2.columbia.edu>
> Michiel, please, please don't include SimCoal code that I will be
> committing on the next public version.
To avoid confusion, please don't commit code to CVS that you don't want to be
included in the next Biopython release.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Tiago Ant?o
Sent: Wed 10/3/2007 12:04 PM
To: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] Coalescent code
Hi
On 10/3/07, Peter wrote:
> There is a similar precedent with Bio/EUtils/DTDs (where the data files
> are XML DTD files). I guess you could have the 10 plain text data files
> in with the python files (or under a subdirectory). Opinions?
In the mean time, I will start committing the code (I can easily
accommodate the details of the places to put the files later, when
there is a decision).
Michiel, please, please don't include SimCoal code that I will be
committing on the next public version.
Regards,
Tiago
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Thu Oct 4 02:10:13 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 3 Oct 2007 22:10:13 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710040210.l942ADGF030763@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2007-10-03 22:10 EST -------
Looking at the patch for Bio.FormatIO:
-------------------------
#Would like to have just issued a deprecation warning, and removed this
#module later. However, due to the FormatIO code in Bio/SeqRecord.py the
#deprecation warning would be triggered whenever someone used the SeqRecord.
raise ImportError, "Bio.FormatIO has been removed. Please try Bio.SeqIO
instead"
-------------------------
Since the patch for Bio/SeqRecord.py removes its dependence on Bio.FormatIO, is
it still necessary to raise an ImportError instead of issuing a
DeprecationWarning?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 09:44:09 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 05:44:09 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710050944.l959i9BX029760@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-05 05:44 EST -------
In terms of typical usage, SeqRecord does not depend on FormatIO
However, from a code perspective, FormatIO and SeqRecord "depend" on each
other.
If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not
depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken
module, I wanted to remove it. A DeprecationWarning doesn't seem right if
FormatIO is removed, which is why I suggested an ImportError.
We might be able instead to MOVE the FormatIO hooks out of SeqRecord and then
issue a DeprecationWarning for FormatIO ... but it looks rather complicated,
and probably means tackling the Bio.config code as well.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 11:05:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 07:05:49 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710051105.l95B5nXW001755@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2007-10-05 07:05 EST -------
> If we remove the FormatIO "hooks" from SeqRecord.py (so that SeqRecord does not
> depend on FormatIO), then FormatIO breaks. Rather than leaving in a broken
> module, I wanted to remove it. A DeprecationWarning doesn't seem right if
> FormatIO is removed, which is why I suggested an ImportError.
OK, I see. As far as I'm concerned, your patch is fine then.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Oct 5 13:46:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Oct 2007 09:46:51 -0400
Subject: [Biopython-dev] [Bug 2174] FDist Support in BioPython
In-Reply-To:
Message-ID: <200710051346.l95Dkpc2010074@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2174
tiagoantao at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #6 from tiagoantao at gmail.com 2007-10-05 09:46 EST -------
It is implemented, documented and with test code.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Fri Oct 5 14:26:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Oct 2007 15:26:43 +0100
Subject: [Biopython-dev] Configuration files
Message-ID: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
Hi,
Is there any (Biopython standard) way to configure Biopython during
runtime? When writing code sometimes I think it would be very
convenient (especially to the programmer using Biopython) to abstract
some configuration parameters away from the code. Things like the
location of binaries, hosts, user names (and maybe passwords) of
databases, timeout parameters, etc. These could be stored on a
configuration file (or registry entry, or whatever) thus saving users
to have to deal in the code with supplying these...
Just an idea...
Tiago
--
http://www.tiago.org/ps
From bugzilla-daemon at portal.open-bio.org Mon Oct 8 11:14:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 8 Oct 2007 07:14:30 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710081114.l98BEUZh019757@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #759 is|0 |1
obsolete| |
------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:14 EST -------
(From update of attachment 759)
Applied these changes to CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 8 10:52:48 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 08 Oct 2007 11:52:48 +0100
Subject: [Biopython-dev] Configuration files
In-Reply-To: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
Message-ID: <470A0C00.50505@maubp.freeserve.co.uk>
Tiago Ant?o wrote:
> Hi,
>
> Is there any (Biopython standard) way to configure Biopython during
> runtime? When writing code sometimes I think it would be very
> convenient (especially to the programmer using Biopython) to abstract
> some configuration parameters away from the code. Things like the
> location of binaries, hosts, user names (and maybe passwords) of
> databases, timeout parameters, etc. These could be stored on a
> configuration file (or registry entry, or whatever) thus saving users
> to have to deal in the code with supplying these...
> Just an idea...
This sounds like a fairly general thing (i.e. for all of python) rather
than being Biopython specific.
For example, I find a lot of my scripts have a few if statements at the
top setting locations of files and executables based on which
user/machine I'm running on (I use both Windows and a couple of Linux
boxes with different user names).
e.g. Where are the blast executables, the blast databases, and my genome
collection, ...
Peter
From bugzilla-daemon at portal.open-bio.org Mon Oct 8 11:30:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 8 Oct 2007 07:30:03 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710081130.l98BU36u021016@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-08 07:30 EST -------
Recap, most of the issues were resolved by switching Bio.Fasta from Martel to
pure python. Additionally:
test_Fasta - 'fixed' by deprecating the Mindy indexing functions
test_KEGG - fixed by switching from Martel to pure python
test_format_registry - 'fixed' by removing FormatIO
test_geo - fixed by switching from Martel to pure python
test_GenBankFormat - this entire test is for the little-used Martel GenBank
expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Tue Oct 9 04:34:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 9 Oct 2007 00:34:28 -0400
Subject: [Biopython-dev] Output of Biopython tests
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
Hi everybody,
With the help of several Biopython developers, especially Peter, the problems
with Martel and the new mxTextTools release have now been solved (in the
sense that all unit tests now succeed). So we're a lot closer to a new
Biopython release. Thanks everybody!
When I was running the Biopython tests, one thing bothered me though. All
Biopython tests now have a corresponding output file that contains the output
the test should generate if it runs correctly. For some tests, this makes
perfect sense, particularly if the output is large. For others, on the other
hand, having the test output explicitly in a file doesn't actually add much
information. For example, the output for test_psw is
test_psw
test_AlignmentColumn_assertions (test_psw.TestPSW) ... ok
test_AlignmentColumn_full (test_psw.TestPSW) ... ok
test_AlignmentColumn_kinds (test_psw.TestPSW) ... ok
test_AlignmentColumn_repr (test_psw.TestPSW) ... ok
test_Alignment_assertions (test_psw.TestPSW) ... ok
test_Alignment_normal (test_psw.TestPSW) ... ok
test_ColumnUnit (test_psw.TestPSW) ... ok
Doctest: Bio.Wise.psw.parse_line ... ok
----------------------------------------------------------------------
Ran 8 tests in 0.002s
OK
For comparison, this is the test output if test_psw.py fails:
test_AlignmentColumn_assertions (__main__.TestPSW) ... ok
test_AlignmentColumn_full (__main__.TestPSW) ... ok
test_AlignmentColumn_kinds (__main__.TestPSW) ... FAIL
test_AlignmentColumn_repr (__main__.TestPSW) ... ok
test_Alignment_assertions (__main__.TestPSW) ... ok
test_Alignment_normal (__main__.TestPSW) ... ok
test_ColumnUnit (__main__.TestPSW) ... ok
Doctest: Bio.Wise.psw.parse_line ... ok
======================================================================
FAIL: test_AlignmentColumn_kinds (__main__.TestPSW)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_psw.py", line 47, in test_AlignmentColumn_kinds
self.assertEqual(ac.kind,
"some_funny_output_I_made_up_instead_of_INSERT")
AssertionError: 'INSERT' != 'some_funny_output_I_made_up_instead_of_INSERT'
----------------------------------------------------------------------
Ran 8 tests in 0.000s
The point is that for this test, having the output explicitly is not needed
in order to identify the problem.
Now, for some tests having the output explicitly actually causes a problem.
I'm thinking about those unit tests that only run if some particular software
is installed on the system (for example, SQL). In those cases, we need to
distinguish failure due to missing software from a true failure (the former
may not bother the user much if he's not interested in that particular part
of Biopython). If a test cannot be run because of missing prerequisites,
currently a unit test generates an ImportError, which is then caught inside
run_tests. Hence, we get the following output when running the Biopython
tests:
test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests
--
enable tests in Tests/test_BioSQL.py
ok
When you look inside test_BioSQL.py, you'll see that the actual error is not
an ImportError. In addition, if a true ImportError occurs during the test,
the test will inadvertently be treated as skipped.
My solution would be to skip tests inside test_BioSQL if the prerequisites
are not met. However, in that case the test output no longer agrees with the
expected test output, generating a failure message.
I'd therefore like to suggest the following:
1) Keep the test output, but let each test_* script (instead of run_tests.py)
be responsible of comparing the test output with the expected output.
2) If the expected output is trivial, simply use the assert statements to
verify the test output instead of storing them in a file and reading them
from there.
Any objections?
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mhobbs_of_lawson at bigpond.com Tue Oct 9 02:18:39 2007
From: mhobbs_of_lawson at bigpond.com (mhobbs_of_lawson)
Date: Tue, 9 Oct 2007 12:18:39 +1000
Subject: [Biopython-dev] translate
Message-ID: <5496247.1191896319102.JavaMail.root@web06sl>
Hi,
Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet.
Thanks,
Matthew
>>> from Bio import Seq
>>> from Bio.Alphabet import IUPAC
>>> from Bio import Translate
>>> s = "NNNTCAAAAAGGTGCATCTAGATG"
>>> dna = Seq.Seq(s, IUPAC.ambiguous_dna)
>>> trans = Translate.ambiguous_dna_by_id[1]
>>> print trans.translate(dna)
Traceback (most recent call last):
File "", line 1, in
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Translate.py", line 20, in translate
append(get(s[i:i+3], stop_symbol))
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 544, in get
return self.__getitem__(codon)
File "/cygdrive/c/Python24/Lib/site-packages/Bio/Data/CodonTable.py", line 577, in __getitem__
raise TranslationError, codon # does not code
Bio.Data.CodonTable.TranslationError: NNN
From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 11:54:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 09 Oct 2007 12:54:29 +0100
Subject: [Biopython-dev] translate
In-Reply-To: <5496247.1191896319102.JavaMail.root@web06sl>
References: <5496247.1191896319102.JavaMail.root@web06sl>
Message-ID: <470B6BF5.607@maubp.freeserve.co.uk>
mhobbs_of_lawson wrote:
> Hi,
>
> Please can someone tell me what is wrong here. I simply want to be able to translate ambiguous DNA which includes an 'NNN' triplet.
A very reasonable request. I assume you expect just an X for an NNN codon?
I have the general impression that some of Biopython's handling of
ambiguous sequences isn't all wonderful... something I have started to
tackle in bug 2356:
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
Obviously sequence manipulation is a core bit of functionality - and I
would like at least one other person to comment on that code before I
risk committing it ;)
Translation of ambiguous codons would be next on my hit list... as right
now it doesn't seem to do what I would expect at all.
In the short term, manually adding additional mappings to the forward
table (a python dictionary) would probably "fix" your specific issue.
While we are on this topic, we use "*" for stop codons and "X" for an
ambiguous amino acid - but is anyone aware of a character convention for
something that might be either a stop codon or an amino acid? (other
than just using "X" for this too)?
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 9 11:44:01 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 09 Oct 2007 12:44:01 +0100
Subject: [Biopython-dev] Output of Biopython tests
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
Message-ID: <470B6981.3020707@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> When I was running the Biopython tests, one thing bothered me though.
> All Biopython tests now have a corresponding output file that
> contains the output the test should generate if it runs correctly.
> For some tests, this makes perfect sense, particularly if the output
> is large. For others, on the other hand, having the test output
> explicitly in a file doesn't actually add much information.
Is this actually a problem? It gives us a simple unified test framework
where developers can use whatever fancy test frameworks they want to.
Personally I have tried to write simple scripts with meaningful output
(plus often additional assertions). I think that because these are very
simple, they can double as examples/documentation for the curious.
My personal view is that some of the "fancy frameworks" used in some
test cases are very intimidating to a beginner (and act as a barrier to
taking the code and modifying it for their own use).
> The point is that for this test, having the output explicitly is not
> needed in order to identify the problem.
True. I would have written that particular test to give some meaningful
output; I find it makes it easier to start debugging why a test fails.
> Now, for some tests having the output explicitly actually causes a
> problem. I'm thinking about those unit tests that only run if some
> particular software is installed on the system (for example, SQL). In
> those cases, we need to distinguish failure due to missing software
> from a true failure (the former may not bother the user much if he's
> not interested in that particular part of Biopython). If a test
> cannot be run because of missing prerequisites, currently a unit test
> generates an ImportError, which is then caught inside run_tests.
> ...
> When you look inside test_BioSQL.py, you'll see that the actual error
> is not an ImportError. In addition, if a true ImportError occurs
> during the test, the test will inadvertently be treated as skipped.
Perhaps we should introduce a MissingExternalDependency error instead,
used for this specific case, and catch that in run_tests.py, while
treating ImportError as a real error.
As you say, if we have done some dramatic restructuring (such as
removing a module) there could be some REAL ImportErrors which we might
risk ignoring.
> I'd therefore like to suggest the following:
> 1) Keep the test output, but let each test_* script (instead of
> run_tests.py) be responsible of comparing the test output with the
> expected output.
I'm not keen on that - it means duplication of code (or at least some
common functionality to call) and makes writing simple tests that little
bit harder. I like the fact that the more verbose test scripts can be
run on their own as an example of what the module can do.
> 2) If the expected output is trivial, simply use the assert
> statements to verify the test output instead of storing them in a
> file and reading them from there.
By all means, test trivial output with assertions. I already do this
within many of my "verbose" tests where I want to keep the console
output reasonably short.
Peter
From tiagoantao at gmail.com Tue Oct 9 14:27:18 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Oct 2007 15:27:18 +0100
Subject: [Biopython-dev] Configuration files
In-Reply-To: <470A0C00.50505@maubp.freeserve.co.uk>
References: <6d941f120710050726s4ca53349h1b8d499650e5726a@mail.gmail.com>
<470A0C00.50505@maubp.freeserve.co.uk>
Message-ID: <6d941f120710090727m787c08abn13665c662727446c@mail.gmail.com>
Would it be interesting to have something like
config = Bio.Config.getConfig()
fdist_path = config['PopGen.FDistDir']
Something that:
1. Would allow for a standard configuration mechanism (as opposed to
having different styles for each module/author)
2. Would abstract away how the configuration is stored (registry, conf
file, ...)
If there was an agreement on doing this (or something along these
lines), I would volunteer the time to do it.
On 10/8/07, Peter wrote:
> Tiago Ant?o wrote:
> > Hi,
> >
> > Is there any (Biopython standard) way to configure Biopython during
> > runtime? When writing code sometimes I think it would be very
> > convenient (especially to the programmer using Biopython) to abstract
> > some configuration parameters away from the code. Things like the
> > location of binaries, hosts, user names (and maybe passwords) of
> > databases, timeout parameters, etc. These could be stored on a
> > configuration file (or registry entry, or whatever) thus saving users
> > to have to deal in the code with supplying these...
> > Just an idea...
>
> This sounds like a fairly general thing (i.e. for all of python) rather
> than being Biopython specific.
>
> For example, I find a lot of my scripts have a few if statements at the
> top setting locations of files and executables based on which
> user/machine I'm running on (I use both Windows and a couple of Linux
> boxes with different user names).
>
> e.g. Where are the blast executables, the blast databases, and my genome
> collection, ...
>
> Peter
>
>
--
http://www.tiago.org/ps
From mhobbs_of_lawson at bigpond.com Tue Oct 9 23:07:43 2007
From: mhobbs_of_lawson at bigpond.com (Matthew Hobbs)
Date: Wed, 10 Oct 2007 09:07:43 +1000
Subject: [Biopython-dev] translate
In-Reply-To: <470B6BF5.607@maubp.freeserve.co.uk>
References: <5496247.1191896319102.JavaMail.root@web06sl>
<470B6BF5.607@maubp.freeserve.co.uk>
Message-ID: <470C09BF.8050906@bigpond.com>
Thanks Peter for your reply.
Peter wrote:
> mhobbs_of_lawson wrote:
>> Please can someone tell me what is wrong here. I simply want to be
>> able to translate ambiguous DNA which includes an 'NNN' triplet.
>
> A very reasonable request. I assume you expect just an X for an NNN codon?
yep
> In the short term, manually adding additional mappings to the forward
> table (a python dictionary) would probably "fix" your specific issue.
OK - so this works:
from Bio import Seq
from Bio.Alphabet import IUPAC
from Bio import Translate
s = "NNNTCAAAAAGGTGCATCTAGATG"
dna = Seq.Seq(s, IUPAC.ambiguous_dna)
trans = Translate.ambiguous_dna_by_id[1]
trans.table.forward_table.forward_table['NNN'] = 'X'
print trans.translate(dna)
> While we are on this topic, we use "*" for stop codons and "X" for an
> ambiguous amino acid - but is anyone aware of a character convention for
> something that might be either a stop codon or an amino acid? (other
> than just using "X" for this too)?
No I don't know
Thanks,
Matthew
From mdehoon at c2b2.columbia.edu Thu Oct 11 10:31:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:31:59 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
<470B6981.3020707@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
> Perhaps we should introduce a MissingExternalDependency error instead,
> used for this specific case, and catch that in run_tests.py, while
> treating ImportError as a real error.
OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
MissingExternalDependencyError occurs in a test, a warning is printed but it
is not counted as a failure.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Thu Oct 11 10:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:44:56 -0400
Subject: [Biopython-dev] function enumerate in Bio/GFF/GenericTools.py;
Bio/DocSQL.py
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B637@mail2.exch.c2b2.columbia.edu>
Do we still need the function "enumerate" in Bio/GFF/GenericTools.py and
Bio/DocSQL.py?
AFAICT, this function does exactly the same as the Python built-in enumerate
function.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Thu Oct 11 10:31:59 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 06:31:59 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu>
<470B6981.3020707@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
> Perhaps we should introduce a MissingExternalDependency error instead,
> used for this specific case, and catch that in run_tests.py, while
> treating ImportError as a real error.
OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
MissingExternalDependencyError occurs in a test, a warning is printed but it
is not counted as a failure.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2910 bytes
Desc: not available
URL:
From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 20:44:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Oct 2007 21:44:46 +0100
Subject: [Biopython-dev] Revised tutorial
Message-ID: <470E8B3E.6080709@maubp.freeserve.co.uk>
In anticipation of the next release, I've done some more work on the
tutorial today -- in particular the section on the Seq object which I
have turned into a new chapter.
If anyone has the time to go over this soon that would be great. I'll be
away tomorrow (Friday) but will probably have time to make any revisions
needed at the weekend.
Its here in CVS:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython
This is a LaTeX file which gets turned into the PDF and HTML versions of
the tutorial using pdflatex and hevea. If you want to proof read but
don't know anything about LaTeX then I can probably email you the PDF
version for comment (half a megabyte).
Peter
From sbassi at gmail.com Thu Oct 11 22:48:39 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Thu, 11 Oct 2007 19:48:39 -0300
Subject: [Biopython-dev] Revised tutorial
In-Reply-To: <470E8B3E.6080709@maubp.freeserve.co.uk>
References: <470E8B3E.6080709@maubp.freeserve.co.uk>
Message-ID:
Hello,
I can't resolve all the dependencies to install hevea so I can't
generate the dvi from the tex file. Could you please send me by email
the final PDF?
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
From mdehoon at c2b2.columbia.edu Fri Oct 12 01:53:19 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 21:53:19 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
<470E3E7E.1000301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu>
Peter wrote:
> Michiel De Hoon wrote:
> > OK. I added a MissingExternalDependencyError exception to
Bio/__init__.py,
> > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> > MissingExternalDependencyError occurs in a test, a warning is printed but
it
> > is not counted as a failure.
>
> I might have defined the exception within the test framework rather than
> Bio/__init__.py, but now that it's there we can start to use in things
> like modules that wrap external tools.
That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using
this exception (outside of the testing framework).
> I've updated Tests/requires_internet.py and Test/requires_wise.py to
> match (I don't have wise on my machine which is why I noticed it still
> threw an ImportError).
Thanks! I missed those.
> Is there anything I can do to help get things ready for the release of
> Biopython 1.44?
At some point, somebody will need to go through the documentation to check if
everything documented there still works with the Biopython in CVS, and to
remove sections in the documentation describing deprecated code. But it's
probably better to wait until after we decide what to do with
test_GenBankFormat.
> If you do have time to give the patch on bug 2366 a check, I think it
> would be worth including before the next release.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2366
No time to check it. But I'd be happy to rely on your judgement and include
it.
--Michiel.
From mdehoon at c2b2.columbia.edu Fri Oct 12 01:53:19 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 11 Oct 2007 21:53:19 -0400
Subject: [Biopython-dev] Output of Biopython tests
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
<470E3E7E.1000301@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B638@mail2.exch.c2b2.columbia.edu>
Peter wrote:
> Michiel De Hoon wrote:
> > OK. I added a MissingExternalDependencyError exception to
Bio/__init__.py,
> > and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> > MissingExternalDependencyError occurs in a test, a warning is printed but
it
> > is not counted as a failure.
>
> I might have defined the exception within the test framework rather than
> Bio/__init__.py, but now that it's there we can start to use in things
> like modules that wrap external tools.
That is why I put it in Bio/__init__.py; Bio/GFF/__init__.py is already using
this exception (outside of the testing framework).
> I've updated Tests/requires_internet.py and Test/requires_wise.py to
> match (I don't have wise on my machine which is why I noticed it still
> threw an ImportError).
Thanks! I missed those.
> Is there anything I can do to help get things ready for the release of
> Biopython 1.44?
At some point, somebody will need to go through the documentation to check if
everything documented there still works with the Biopython in CVS, and to
remove sections in the documentation describing deprecated code. But it's
probably better to wait until after we decide what to do with
test_GenBankFormat.
> If you do have time to give the patch on bug 2366 a check, I think it
> would be worth including before the next release.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2366
No time to check it. But I'd be happy to rely on your judgement and include
it.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Fri Oct 12 02:32:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Oct 2007 22:32:05 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710120232.l9C2W5e9022504@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2007-10-11 22:32 EST -------
> test_GenBankFormat - this entire test is for the little-used Martel GenBank
> expression, and this works with mxTextTools 2.0 but fails with mxTextTools 3.0
If it's little-used, should we include it for the next release or can it be
removed? If we remove the test, should we then also remove the corresponding
module?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Thu Oct 11 20:37:52 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Oct 2007 21:37:52 +0100
Subject: [Biopython-dev] Output of Biopython tests
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B634@mail2.exch.c2b2.columbia.edu> <470B6981.3020707@maubp.freeserve.co.uk>
<6243BAA9F5E0D24DA41B27997D1FD14402B636@mail2.exch.c2b2.columbia.edu>
Message-ID: <470E89A0.1010502@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
>> Perhaps we should introduce a MissingExternalDependency error instead,
>> used for this specific case, and catch that in run_tests.py, while
>> treating ImportError as a real error.
>
> OK. I added a MissingExternalDependencyError exception to Bio/__init__.py,
> and modified BioSQL, Bio.GFF, and some test scripts accordingly. When
> MissingExternalDependencyError occurs in a test, a warning is printed but it
> is not counted as a failure.
I might have defined the exception within the test framework rather than
Bio/__init__.py, but not that its there we can start to use in things
like modules that wrap external tools.
I've updated Tests/requires_internet.py and Test/requires_wise.py to
match (I don't have wise on my machine which is why I noticed it still
threw an ImportError).
This means run_tests.py now runs without errors using CVS on my 64 bit
Linux machine (bar the mxTextTools 3.0 issue with test_GenBankFormat.py
(bug 2361).
Is there anything I can do to help get things ready for the release of
Biopython 1.44?
If you do have time to give the patch on bug 2366 a check, I think it
would be worth including before the next release.
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
Peter
From fennan at gmail.com Mon Oct 15 09:48:45 2007
From: fennan at gmail.com (Fernando)
Date: Mon, 15 Oct 2007 11:48:45 +0200
Subject: [Biopython-dev] Database into variables
Message-ID: <7b13e61d0710150248v72a550d6h38e1467edf5073eb@mail.gmail.com>
Hi everybody,
I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?
Thanks
From fennan at gmail.com Mon Oct 15 10:28:56 2007
From: fennan at gmail.com (Fernando)
Date: Mon, 15 Oct 2007 12:28:56 +0200
Subject: [Biopython-dev] Precompute database information
Message-ID: <7b13e61d0710150328l354bfb5eu1b76ed05024a65c4@mail.gmail.com>
Hi everybody,
I am thinking in including some algorithms that I work with into biopython.
My first concern is that I'm using a local image of the Gene Ontology
database to perform several operations. In order to avoid such database
accesses I could precompute the information I need and load it once the
module is called. How should I do it? Is there a guideline style to load
external variables or something like that? Any other ideas/suggestions?
Thanks
From bugzilla-daemon at portal.open-bio.org Mon Oct 15 11:11:26 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Oct 2007 07:11:26 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710151111.l9FBBQOE012625@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
tiagoantao at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tiagoantao at gmail.com
------- Comment #3 from tiagoantao at gmail.com 2007-10-15 07:11 EST -------
I had a look at the test code and tried to find which test case is changing the
ambiguous_dna dict.
I used this little script (putting it here as it might be useful for detecting
these types of problems):
for i in test_*py; do
python run_tests.py $i;
done
It turns out that it is text_Nexus.py. A further inspection to the code seems
to reveal that is not the test case that pollutes the dictionary but the Nexus
modules itself.
Maybe it makes sense to raise a bug on the Nexus module... Any comments on
these findings?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 15 14:16:00 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Oct 2007 10:16:00 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710151416.l9FEG01A023797@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-15 10:16 EST -------
Thanks for that Tiago,
I guess we should file a bug on Bio.Nexus on the alphabet issue; It may be that
it should create a copy or subclass of the ambiguous DNA alphabet in order to
include "?" (I imagine that Nexus uses this rather than "N"), and see if it is
using the Gapped() alphabet system or not.
Did you have any comments on this patch for (reverse) complements?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Tue Oct 16 00:08:13 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Mon, 15 Oct 2007 19:08:13 -0500
Subject: [Biopython-dev] Biopython status
Message-ID: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Hi all,
I've just started using Biopython and I am wondering about the status
of the group, since I've heard rumors that its dying. So far I have
found the library very useful, if not at times frustrating, though I
will admit I am fairly new to developing python as well. I have been
hesitant to make changes to existing code, however I have found that
in a few cases it has been by far the best way to accomplish what I
need, and have only done so in cases where it seems to be the *right*
thing to do.
With that in mind, I have a few questions I was hoping you all could
answer. First, how might I put these changes up for review in order
to contribute back to the code base? The main changes have been to
the AlignAce parser, since as it was it just ignored information
contained in the alignace file regarding the motif instances (namely
which input sequence they came from, where they started in the
sequence, and what strand they were on). I have also needed to create
a modified FASTA parser so that I can read things like quality score
files. I would be happy to submit the changes to the group or an
individual for inspection, but I would like to avoid having to
maintain my own separate version of Biopython if possible.
I am also wondering how it would be received if I did something like
add a to_fasta method to SeqRecord instead of having to go through
writing it to a file using a SeqIO when all I want is the string.
Finally, are there plans to move to a subversion repository at any
point?
Thanks!
Jared Flatow
From sbassi at gmail.com Tue Oct 16 05:09:16 2007
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 16 Oct 2007 02:09:16 -0300
Subject: [Biopython-dev] Biopython status
In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID:
On 10/15/07, Jared Flatow wrote:
> I've just started using Biopython and I am wondering about the status
> of the group, since I've heard rumors that its dying. So far I have
You could subscribe to the rss feed of the CVS and you will see a lot
of activity. The developers list and the bug tracking program
(bugzilla) is also pretty busy, that doesn't look as a dying group to
me :)
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Lriser: http://www.linspire.com/lraiser_success.php?serial=318
From mdehoon at c2b2.columbia.edu Tue Oct 16 05:37:14 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 01:37:14 -0400
Subject: [Biopython-dev] Biopython status
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
Hi Jared,
> I've just started using Biopython and I am wondering about the status
> of the group, since I've heard rumors that its dying.
>From looking at the activity on the Biopython mailing lists in recent months,
it doesn't seem to be dying :-).
> So far I have found the library very useful, if not at times frustrating,
> though I will admit I am fairly new to developing python as well.
One thing to keep in mind is that Biopython started about eight years ago,
and some approaches that seemed to be a good idea at that time may not seem
to be so now. Nevertheless, I feel that Biopython is moving in the right
direction in terms of ease-of-use.
> First, how might I put these changes up for review in order
> to contribute back to the code base? The main changes have been to
> the AlignAce parser, since as it was it just ignored information
> contained in the alignace file regarding the motif instances (namely
> which input sequence they came from, where they started in the
> sequence, and what strand they were on).
In this case, it is a good idea to contact the current maintainer of
Bio.AlignAce, either via the mailing list or directly. From the Biopython
CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce,
so it would be a good idea to discuss with him.
> I have also needed to create a modified FASTA parser so that I
> can read things like quality score files.
At some point, Biopython had several (two or three?) Fasta parsers, two Fasta
formats, etc. This is a situation we should definitely avoid. So if your
modifications fit in well with the existing Fasta parser in Bio.SeqIO, it may
very well be accepted into Biopython. Otherwise, it's better to leave it out.
This is just my opinion though.
> I am also wondering how it would be received if I did something like
> add a to_fasta method to SeqRecord instead of having to go through
> writing it to a file using a SeqIO when all I want is the string.
This sounds like feature creep to me, so I would be against it. It's easy to
add code to Biopython, it's much harder to remove stuff. Code bloat is a real
problem in Biopython.
> Finally, are there plans to move to a subversion repository at any
> point?
There were some plans at some point, but I don't know the current status.
Best,
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Jared Flatow
Sent: Mon 10/15/2007 8:08 PM
To: biopython-dev at lists.open-bio.org
Subject: [Biopython-dev] Biopython status
Hi all,
I've just started using Biopython and I am wondering about the status
of the group, since I've heard rumors that its dying. So far I have
found the library very useful, if not at times frustrating, though I
will admit I am fairly new to developing python as well. I have been
hesitant to make changes to existing code, however I have found that
in a few cases it has been by far the best way to accomplish what I
need, and have only done so in cases where it seems to be the *right*
thing to do.
With that in mind, I have a few questions I was hoping you all could
answer. First, how might I put these changes up for review in order
to contribute back to the code base? The main changes have been to
the AlignAce parser, since as it was it just ignored information
contained in the alignace file regarding the motif instances (namely
which input sequence they came from, where they started in the
sequence, and what strand they were on). I have also needed to create
a modified FASTA parser so that I can read things like quality score
files. I would be happy to submit the changes to the group or an
individual for inspection, but I would like to avoid having to
maintain my own separate version of Biopython if possible.
I am also wondering how it would be received if I did something like
add a to_fasta method to SeqRecord instead of having to go through
writing it to a file using a SeqIO when all I want is the string.
Finally, are there plans to move to a subversion repository at any
point?
Thanks!
Jared Flatow
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 08:16:01 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 09:16:01 +0100
Subject: [Biopython-dev] Biopython status
In-Reply-To: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
Message-ID: <47147341.4020708@maubp.freeserve.co.uk>
Jared Flatow wrote:
> I have also needed to create a modified FASTA parser so that I can
> read things like quality score files.
Could you be a little more specific - what exactly do you mean by a
quality score files (links and/or examples). It may be that this
warrants setting up a new file format in Bio.SeqIO
> I would be happy to submit the changes to the group or an individual
> for inspection, but I would like to avoid having to maintain my own
> separate version of Biopython if possible.
As has already been said - please file some (enhancement) bugs and
attach your patches, or raise specific issues for discussion on this
mailing list.
Depending on the nature of your changes, you might be able to achieve
some of them by subclassing Biopython's objects - rather than literally
maintaining your own branch of the project.
> I am also wondering how it would be received if I did something like
> add a to_fasta method to SeqRecord instead of having to go through
> writing it to a file using a SeqIO when all I want is the string.
Out of interest, why do you want to create a FASTA record as a string?
Did you know you can write to a string using any Bio.SeqIO supported
file format using StringIO? Perhaps we should spell this out more
explicitly in the documentation, but a motivating example would help.
I would suggest rather than adding a to_fasta method to the SeqRecord,
simply write your own "seqrecord_to_string" function (or create a
subclass of SeqRecord with this method).
> Finally, are there plans to move to a subversion repository at any
> point?
It was raised a while ago, and our cunning plan was to let BioPerl try
the move first. Once that has been proven, it should be fairly easy for
the OBF guys to also move us over. I should email them to see how
things stand...
Peter
From bartek at rezolwenta.eu.org Tue Oct 16 09:11:01 2007
From: bartek at rezolwenta.eu.org (bartek wilczynski)
Date: Tue, 16 Oct 2007 11:11:01 +0200
Subject: [Biopython-dev] Biopython status
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<6243BAA9F5E0D24DA41B27997D1FD14402B639@mail2.exch.c2b2.columbia.edu>
Message-ID: <1192525861.4714802535dae@imp.rezolwenta.eu.org>
Michiel De Hoon wrote:
> > First, how might I put these changes up for review in order
> > to contribute back to the code base? The main changes have been to
> > the AlignAce parser, since as it was it just ignored information
> > contained in the alignace file regarding the motif instances (namely
> > which input sequence they came from, where they started in the
> > sequence, and what strand they were on).
>
> In this case, it is a good idea to contact the current maintainer of
> Bio.AlignAce, either via the mailing list or directly. From the Biopython
> CVS, it seems that Bartek is currently the main maintainer of Bio.AlignAce,
> so it would be a good idea to discuss with him.
I'm not dying either ;). I'm the author of the Bio.AlignAce module and if you
have any new code to contribute to it, I'll be glad to help you. The best way
to do it would be to submit an enhancement bug report in bugzilla. If the
changes are smaller, you can just send them (as a diff) to the list and I'll
try to fit them to the current cvs version of Bio.AlignAce
Bartek Wilczynski
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 09:55:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 05:55:37 -0400
Subject: [Biopython-dev] [Bug 2380] New: Bio.Nexus is adding "?" and "-" to
Bio.Data.IUPACData.ambiguous_dna_values
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2380
Summary: Bio.Nexus is adding "?" and "-" to
Bio.Data.IUPACData.ambiguous_dna_values
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
This issue was raised in Bug 2366 where a unit test was found to be "polluting"
ambiguous_dna_values, later identified as Bio.Nexus via test_Nexus.py
Need to see if Bio.Nexus should be making a copy of this dict, or perhaps
defining a subclass of the alphabet (using the Gapped() class maybe).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 09:56:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 05:56:37 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710160956.l9G9ub18007735@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 05:56 EST -------
Fix committed (after Michiel's OK on the mailing list), marking as fixed.
Checking in Tests/test_seq.py;
/home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py
new revision: 1.6; previous revision: 1.5
done
Checking in Tests/output/test_seq;
/home/repository/biopython/biopython/Tests/output/test_seq,v <-- test_seq
new revision: 1.6; previous revision: 1.5
done
Checking in Bio/Seq.py;
/home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py
new revision: 1.17; previous revision: 1.16
done
I've filed Bug 2380 for the Nexus issue:
Bio.Nexus is adding "?" and "-" to Bio.Data.IUPACData.ambiguous_dna_values
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:11:09 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:11:09 -0400
Subject: [Biopython-dev] [Bug 2381] New: translate and transcibe method for
the the Seq object (in Bio.Seq)
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
Summary: translate and transcibe method for the the Seq object
(in Bio.Seq)
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Biopython has translation and transcription modules (Bio/Translate.py and
Bio/Transcibe.py) but I find them a little bit complicated to use.
There are module level functions translate, transcribe, and back_transcribe in
Bio/Seq.py which take either a string, a Seq object or a MutableSeq object.
I would like to add similar methods to the Seq object (also defined Bio/Seq.py)
to make this functionality more accessable from a Seq object.
NOTE: Python strings have a translate method of their own which is rather
different. Having the Seq translate method doing a biological translation
makes sense.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:13:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:13:35 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161013.l9GADZtJ008751@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|translate and transcibe |translate and transcibe
|method for the the Seq |methods for the Seq object
|object (in Bio.Seq) |(in Bio.Seq)
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:13 EST -------
fixed typo in the bug summary
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:26:44 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:26:44 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161026.l9GAQixw009268@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #2 from dalloliogm at gmail.com 2007-10-16 06:26 EST -------
I find difficult to translate a sequence in the 6 reading frames with a single
command.
Actually I use something like this:
for i in xrange(2):
translate(Seq[i:])
which is not very nice.
It would be nice to add a parameter to the translate function like in the
emboss application transeq
(http://emboss.sourceforge.net/apps/cvs/emboss/apps/transeq.html), something
like this:
>>> a = Seq('CAGCTAGCT')
>>> a.translate()
[(translation of a in the frame 0)]
>>> a.translate(1)
[(translation of a in the frame 1)]
>>> a.translate(F)
[(translation of a in the 3 forward frames)]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 10:46:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 06:46:47 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710161046.l9GAklI6010391@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 06:46 EST -------
Doing a three/six frame translation is however fairly common, and perhaps
warrents an "official" implementation in Bio.SeqUtils
My current inclination is try and keep the Bio.Seq translation function as
simple as possible. There are lots of possible options to worry about...
catering to them all could make the translate method rather daunting.
Perhaps things like the frame (or even the starting nucleotide) could be done
in Bio.Translate only. Another "special case" example I personally would like
is an option to check the first codon is a valid start codon for the specified
codon table, and to translate it as methionine (M). Then there is the question
of if Bio.Translate's "translate_to_stop" functionality should be exposed in a
Seq method.
Note there is yet another (!) translation function Bio.SeqUtils.translate()
which is frame aware [personally I would mark a lot of this module as
deprecated].
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Tue Oct 16 16:02:19 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 11:02:19 -0500
Subject: [Biopython-dev] Biopython status
In-Reply-To: <47147341.4020708@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu>
<47147341.4020708@maubp.freeserve.co.uk>
Message-ID: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Please forgive me for ever doubting your health, it seems the group
is very much alive!
On Oct 16, 2007, at 3:16 AM, Peter wrote:
> Jared Flatow wrote:
>> I have also needed to create a modified FASTA parser so that I can
>> read things like quality score files.
>
> Could you be a little more specific - what exactly do you mean by a
> quality score files (links and/or examples). It may be that this
> warrants setting up a new file format in Bio.SeqIO
That is what I did. The quality score files I meant are simply FASTA-
like records that indicate the quality of each base pair read from a
sequencing machine, on a scale of something like 1 to 64. The values
are tab separated and correspond to 'reads' in another FASTA file
that contain the actual sequences read. This is the way the 454
GSFlex machines output their sequencing reads, so for every set of
reads there will be a pair of 454Reads.fna, 454Reads.qual files. The
only difference between a parser that processes these qual files and
one that processes the sequence files is that it shouldn't get rid of
spaces, and the newlines should not to be stripped but converted into
spaces (when 454 writes a newline of scores they omit the space).
Essentially I have made a duplicate of FastaIOs iterator, named it
something else, made these two small changes and put an entry for it
in the SeqIO file.
16,17c16,17
< def GSQualIterator(handle, alphabet = single_letter_alphabet,
title2ids = None) :
< """Generator function to iterate over GSFlex quality records
(as SeqRecord objects).
---
> def FastaIterator(handle, alphabet = single_letter_alphabet,
title2ids = None) :
> """Generator function to iterate over Fasta records (as
SeqRecord objects).
54c54
< lines.append(line.rstrip()) # .replace(" ","")) leave
off the replacing internal spaces so we can process qscore files (jf)
---
> lines.append(line.rstrip().replace(" ",""))
58c58
< yield SeqRecord(Seq(" ".join(lines), alphabet),
---
> yield SeqRecord(Seq("".join(lines), alphabet),
63a64,199
As you can see a parser like this might be useful for other FASTA-
like formats as well and is in no way specific to the GS quality
files (its just a space preserving parser). If it were to be
implemented in Biopython you might call it something else.
>
>> I would be happy to submit the changes to the group or an individual
>> for inspection, but I would like to avoid having to maintain my own
>> separate version of Biopython if possible.
>
> As has already been said - please file some (enhancement) bugs and
> attach your patches, or raise specific issues for discussion on this
> mailing list.
>
> Depending on the nature of your changes, you might be able to achieve
> some of them by subclassing Biopython's objects - rather than
> literally
> maintaining your own branch of the project.
>
>> I am also wondering how it would be received if I did something like
>> add a to_fasta method to SeqRecord instead of having to go
>> through writing it to a file using a SeqIO when all I want is the
>> string.
>
> Out of interest, why do you want to create a FASTA record as a string?
I am serving the fasta from a database of sequences dynamically via a
web server.
>
> Did you know you can write to a string using any Bio.SeqIO supported
> file format using StringIO? Perhaps we should spell this out more
> explicitly in the documentation, but a motivating example would help.
This is what I do now, but it seems like a hack to me to go this
route. To always have to write to a file feels strange, but I see
that it would be messy to go OO since there are so many formats.
However, giving preference to fasta over other formats by making it
innate doesn't seem like such a terrible idea. I do have mixed
feelings about 'bloating' the code which is why I asked, and you have
convinced me that this is not quite appropriate given existing
convention. However the idea would be to put the to_fasta or
to_format method inside the SeqRecord, then to call it from the IO
when needed to actually write to a file, but call it directly when
all that is wanted is a string...
>
> I would suggest rather than adding a to_fasta method to the
> SeqRecord, simply write your own "seqrecord_to_string" function (or
> create a subclass of SeqRecord with this method).
>
I'll leave it alone for now until I can come up with a real proposal =)
>> Finally, are there plans to move to a subversion repository at any
>> point?
>
> It was raised a while ago, and our cunning plan was to let BioPerl try
> the move first. Once that has been proven, it should be fairly
> easy for
> the OBF guys to also move us over. I should email them to see how
> things stand...
BioPerl seems to be the guinea pigs for everything. Leading the way
on this might put a stop to those nasty rumors about Biopython.
Best Regards,
Jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:47:48 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:47:48 +0100
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EB34.8000207@maubp.freeserve.co.uk>
Jared wrote:
> Leading the way on this ... [CVS to SVN]
I would say one reason why we aren't charging ahead with a move from CVS
to subversion is only a few posters on this mailing list actively WANT
to move to subversion, and no-one has really championed the move (yet).
I'm sure if we as a group wanted to this, then the OBF would be happy to
assist. After all, moving us rather than BioPerl as the first CVS/SVN
migration should be easier as we have a smaller code base.
Peter
From jflatow at northwestern.edu Tue Oct 16 18:46:53 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 13:46:53 -0500
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <4714EBC7.1040504@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EBC7.1040504@maubp.freeserve.co.uk>
Message-ID: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
Hi Peter,
>>>> I have also needed to create a modified FASTA parser so that I
>>>> can read things like quality score files.
>>>
>>> Could you be a little more specific - what exactly do you mean by a
>>> quality score files (links and/or examples). It may be that this
>>> warrants setting up a new file format in Bio.SeqIO
>> That is what I did. The quality score files I meant are simply
>> FASTA- like records that indicate the quality of each base pair
>> read from a sequencing machine, on a scale of something like 1 to
>> 64. The values are tab separated and correspond to 'reads' in
>> another FASTA file that contain the actual sequences read. This
>> is the way the 454 GSFlex machines output their sequencing reads,
>> so for every set of reads there will be a pair of 454Reads.fna,
>> 454Reads.qual files. The only difference between a parser that
>> processes these qual files and one that processes the sequence
>> files is that it shouldn't get rid of spaces, and the newlines
>> should not to be stripped but converted into spaces (when 454
>> writes a newline of scores they omit the space). Essentially I
>> have made a duplicate of FastaIOs iterator, named it something
>> else, made these two small changes and put an entry for it in the
>> SeqIO file.
>
> Patches and emails don't do well together. Could you file an
> enhancement bug, and then upload your code as an attachment? If
> you have a few examples of matched pairs of FASTA files and quality
> files which you can contribute that would be very helpful too.
>
Yes I'll get on that.
> It looks like you are trying to construct a "sequence" of numerical
> values (rather than a sequence of letters like nucleotides/amino
> acids). As written I don't think it would work for element access/
> splicing etc. However, with some extra work I suppose we could
> stretch the Seq object in this way - and define a new
> "IntegerAlphabet".
>
> But on balance, I don't think "lists of quality values" should be
> treated in the same way as sequences (and thus it doesn't seem to
> belong in Bio.SeqIO).
>
I agree.
> Alternatively you could regard the quality scores as sequence meta-
> data or annotation. One idea would be to generate SeqRecord
> objects containing dummy sequences of the correct length made up of
> the ambiguous character "N", with the associated quality scores
> held as a list of integers in the SeqRecord's annotation
> dictionary. Then it would fit into the Bio.SeqIO framework [I was
> thinking of something similar for PTT files, NCBI Protein tables,
> where again we have annotation but not the actual sequence].
I agree, and this way is most flexible.
>
> Maybe there should just be a separate parser for GSFlex quality
> records which returns iterator giving each record name with a list
> of integers. A more elegant scheme would read in the pair of files
> together (the FASTA file and the quality file) and generate nicely
> annotated SeqRecords with the sequence and the quality. This isn't
> really possible with the Bio.SeqIO framework.
>
Yes, at first I liked this idea best, but it puts some constraints on
the way these things are read in. Like if it is to be an iterator,
you must have a guarantee that these files contain exactly the same
sequences in exactly the same order. This seems like it could
potentially be fine for the GSFlex files, but I wonder if there might
somewhere down the line be use for quality information about
sequences in other cases. If I am not mistaken, some sources use
upper/lower case letters now to indicate a bistable degree of
confidence in a sequence letter. In any event, this seems like an
unnecessary restriction.
The way I do it now is I load the reads into a database, then update
the database when I read in a quality score file. I think Biopython
should have a simple way of implementing something similar which can
solve both our metadata problems.
In Bio.Fasta there are Parsers which really belong in
Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
general Fasta reader, nothing to do with sequences. It can iterate
over a FASTA file using the '>' as the record separator, creating
Record objects, much like it does now, except without processing them
at all or assuming they are sequences.
>Record.header
Record.data
Now Bio.SeqIO.FastaIO can use Bio.Fasta to iterate over the Record
objects in a file and transform them into SeqRecord object. If you
like, you can provide it with a function header_todict, which takes a
string (in this case Record.header) and returns a dictionary, which
gets unpacked and passed to the SeqRecord initializer. Basically the
Bio.SeqIO.FastaIO returns a generator that looks something like this:
(SeqRecord(seq=cleanup(record.data), **header_todict(record.header))
for record in Bio.Fasta.parse(file))
I can also use the Bio.Fasta.parse function now to parse my quality
files and add them as metadata:
# I create an initial SeqRecord dictionary using the
Bio.SeqIO.FastaIO parser
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file,
my_header_todict))
# Then I iterate over the sequences in the qual file and look them up
in the seq_dict using the same header parsing function
# I passed to create my initial SeqRecords, setting the quality
scores as I find them them
for record in Bio.Fasta.parse(qual_file):
seq_dict[my_header_todict(record.header)['id']].quality =
my_qualitycleanup(record.data)
I hope that makes sense. The advantage to doing it this way is that I
can reuse my header parsing function for both the sequence and the
metadata, and I can do whatever I want with the fasta record data
without writing a whole new parser. The SeqIO fasta parsing functions
just makes some default assumptions (like the data is a sequence).
Let me know what you think.
Jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:50:15 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:50:15 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EBC7.1040504@maubp.freeserve.co.uk>
Hi Jared,
>>> I have also needed to create a modified FASTA parser so that I can
>>> read things like quality score files.
>>
>> Could you be a little more specific - what exactly do you mean by a
>> quality score files (links and/or examples). It may be that this
>> warrants setting up a new file format in Bio.SeqIO
>
> That is what I did. The quality score files I meant are simply FASTA-
> like records that indicate the quality of each base pair read from a
> sequencing machine, on a scale of something like 1 to 64. The values
> are tab separated and correspond to 'reads' in another FASTA file
> that contain the actual sequences read. This is the way the 454
> GSFlex machines output their sequencing reads, so for every set of
> reads there will be a pair of 454Reads.fna, 454Reads.qual files. The
> only difference between a parser that processes these qual files and
> one that processes the sequence files is that it shouldn't get rid of
> spaces, and the newlines should not to be stripped but converted into
> spaces (when 454 writes a newline of scores they omit the space).
> Essentially I have made a duplicate of FastaIOs iterator, named it
> something else, made these two small changes and put an entry for it
> in the SeqIO file.
Patches and emails don't do well together. Could you file an
enhancement bug, and then upload your code as an attachment? If you
have a few examples of matched pairs of FASTA files and quality files
which you can contribute that would be very helpful too.
It looks like you are trying to construct a "sequence" of numerical
values (rather than a sequence of letters like nucleotides/amino acids).
As written I don't think it would work for element access/splicing
etc. However, with some extra work I suppose we could stretch the Seq
object in this way - and define a new "IntegerAlphabet".
But on balance, I don't think "lists of quality values" should be
treated in the same way as sequences (and thus it doesn't seem to belong
in Bio.SeqIO).
Alternatively you could regard the quality scores as sequence meta-data
or annotation. One idea would be to generate SeqRecord objects
containing dummy sequences of the correct length made up of the
ambiguous character "N", with the associated quality scores held as a
list of integers in the SeqRecord's annotation dictionary. Then it
would fit into the Bio.SeqIO framework [I was thinking of something
similar for PTT files, NCBI Protein tables, where again we have
annotation but not the actual sequence].
Maybe there should just be a separate parser for GSFlex quality records
which returns iterator giving each record name with a list of
integers. A more elegant scheme would read in the pair of files together
(the FASTA file and the quality file) and generate nicely annotated
SeqRecords with the sequence and the quality. This isn't really
possible with the Bio.SeqIO framework.
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 19:33:54 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 20:33:54 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk>
<48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
Message-ID: <47151222.1060502@maubp.freeserve.co.uk>
> In Bio.Fasta there are Parsers which really belong in
> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
> general Fasta reader, nothing to do with sequences. ...
In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
thinking in a few releases time of suggesting its deprecation (but not
just yet as for several years it was the best documented and most used
parser in Biopython).
If we do decided keep Bio.Fasta (or extend it), then perhaps
Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
I'm still digressing your ideas to turn Bio.Fasta into a generic parser
that copes with sequences, qualities scores, or anything else.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 19:57:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 15:57:35 -0400
Subject: [Biopython-dev] [Bug 2382] New: Generic FASTA parser
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
Summary: Generic FASTA parser
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: jflatow at northwestern.edu
I would like to be able read in and iterate over records in generic fasta files
of the format:
>header
data
>header
data
...
This iterator should return Bio.Fasta.Record objects with the corresponding
header and data fields.
I suggest putting this inside the existing Bio.Fasta module and updating
Bio.SeqIO.Fasta to use this iterator and transform the records returned into
Bio.SeqRecord objects.
This should make it easier to add metadata to SeqRecord objects parsed in from
FASTA. Consider the following example for illustration. I have data from a
genome sequencing machine that outputs pairs of files. One contains the
sequence reads which look like this, the other contains estimates of the
quality of each base call in the sequence.
The sequence file might look something like this (only with hundreds of
thousands more entries):
>ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run=R_runname
CAATATAATTTCTCTTAAAATTATTCCCATGGCCAGGTGTGGTGGCTCACACCTGTAGTC
CCGGCACTTTGGGAGGCCAAGGCACACAGGGGATAGG
>ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname
GGTCTCCAGTGCCCTGTCTCCCCATATTTCTGACACACCTTCTCACAGCCTGGCCCATCT
TGCTGGGTCCCTCTTCTCCTCCCTTCCTGCTCCATTTGTCAACACTGCTGGGACATTAGA
ATTCAGATCTCCCGGGTCACCG
>ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname
AAAGTGACTAAAGAATCAATTTACATTAATATTCTATGTGAACAGGCAAAATACTTACAA
AGAAGTAGAGAAAATATGAATTCAGTACAGAATTCAGATCTCCCGGGTCACCG
The corresponding quality score file might look something like this:
>ERSGEES02IKV6B length=97 xy=3401_1361 region=2 run= R_runname
27 28 21 27 27 27 28 22 28 25 3 27 27 27 28 21 33 31 20 6 28 21 26 26 18 28 25
2 26 25 29 23 31 24 27 29 22 27 27 27 29 23 27 31 25 27 27 27 27 27 27 32 26 27
27 27 27 26 27 33
30 12 32 26 27 27 27 33 30 12 33 30 12 26 31 25 33 27 32 28 33 28 27 27 27 27
27 26 33 32 20 7 27 27 27 32 26
>ERSGEES02GGZDB length=142 xy=2536_2685 region=2 run= R_runname
28 9 26 24 27 27 20 26 18 25 27 32 29 10 26 26 27 18 25 32 30 17 1 25 27 22 32
30 12 27 27 22 26 25 27 23 25 28 21 32 27 27 27 25 26 27 26 25 27 20 26 26 19
28 25 3 25 27 22 27
19 24 24 24 32 29 11 24 34 31 17 23 23 30 23 27 25 30 23 27 33 31 17 27 20 28
21 27 25 26 26 30 24 27 33 31 13 26 27 27 31 25 27 25 23 26 16 26 27 30 27 7 27
27 27 32 27 26 26 32
27 30 26 27 27 27 27 27 27 27 30 27 6 34 31 17 27 21 27 32 28 18
>ERSGEES02JQUCP length=113 xy=3879_0663 region=2 run= R_runname
29 26 5 25 27 24 27 27 27 30 27 7 26 27 19 25 26 31 26 34 32 16 20 27 26 32 27
32 28 27 25 26 18 27 25 27 26 26 24 27 31 25 27 27 31 26 26 34 32 23 11 26 22
27 32 26 27 26 32 30
11 26 31 24 27 27 25 23 27 27 33 30 19 4 17 26 25 26 31 27 30 26 27 26 22 26 18
24 27 26 32 26 32 28 27 27 25 27 25 24 25 31 28 10 34 31 15 27 21 27 28 21 27
I would like to be able to do the following:
# create a function to parse the header line and return a dictionary
def parse_gsflex_header(gs_header):
parts = gs_record.description.split(' ')
assert len(parts) == 5
xy = parts[2].split('=')[1].split('_')
return {'letters': gs_record.seq.tostring(),
'name': parts[0],
'length': parts[1].split('=')[1],
'xpos': xy[0],
'ypos': xy[1],
'region': parts[3].split('=')[1],
'run': parts[4].split('=')[1]}
# Bio.SeqIO.FastaIO wraps the Bio.Fasta parser, might look something like this
class Fasta(): # or however its organized
def data_toseq(data):
# do some parsing of the data
return Seq(...)
def parse(file, header_todict):
return (SeqRecord(seq=data_toseq(record.data),
**header_todict(record.header)) for record in Bio.Fasta.parse(file))
# I create an initial SeqRecord dictionary using the Bio.SeqIO.FastaIO parser
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file, parse_gsflex_header))
# Then I iterate over the sequences in the qual file and look them up in the
seq_dict
# setting the quality scores as I find them them
for record in Bio.Fasta.parse(qual_file):
seq_dict[my_header_todict(record.header)['id']].quality =
my_qualitycleanup(record.data)
This would work well for parsing all kinds of FASTA-like files and provides a
simple mechanism for dealing with them record by record.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 20:03:33 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 16:03:33 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162003.l9GK3XmF007588@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #1 from jflatow at northwestern.edu 2007-10-16 16:03 EST -------
My mistake, the parse_gsflex_header function should look something like this:
def parse_gsflex_header(gs_header):
parts = re.split('[,|]?\s+', header, maxsplit=1)
assert len(parts) == 2
return {'id': parts[0],
'description': header}
def my_qualitycleanup(data):
return [int x for x in data.replace('\n', '').split(' ')]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Tue Oct 16 20:11:04 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 15:11:04 -0500
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <47151222.1060502@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk>
<48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu>
<47151222.1060502@maubp.freeserve.co.uk>
Message-ID: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
On Oct 16, 2007, at 2:33 PM, Peter wrote:
> > In Bio.Fasta there are Parsers which really belong in
> > Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
> > general Fasta reader, nothing to do with sequences. ...
>
> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
> thinking in a few releases time of suggesting its deprecation (but
> not just yet as for several years it was the best documented and
> most used parser in Biopython).
>
I see, it looks like its meant to be deprecated, I was just saying
its actually doing SeqIO functionality.
> If we do decided keep Bio.Fasta (or extend it), then perhaps
> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
>
> I'm still digressing your ideas to turn Bio.Fasta into a generic
> parser that copes with sequences, qualities scores, or anything else.
I'm not quite sure you're meaning of digressing, if you mean thinking
it over, then great =) Otherwise I hope you'll seriously consider it
anyway. Either way, I think I posted a more coherent message on
bugzilla with some example data and motivation.
jared
From jflatow at northwestern.edu Tue Oct 16 20:14:16 2007
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 16 Oct 2007 15:14:16 -0500
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <4714EB34.8000207@maubp.freeserve.co.uk>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB34.8000207@maubp.freeserve.co.uk>
Message-ID: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
> I would say one reason why we aren't charging ahead with a move
> from CVS to subversion is only a few posters on this mailing list
> actively WANT to move to subversion, and no-one has really
> championed the move (yet).
Does that mean most developers don't WANT to move, or just that they
don't ACTIVELY want to move?
jared
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 20:42:18 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 21:42:18 +0100
Subject: [Biopython-dev] 454 GSFlex quality score files
In-Reply-To: <156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EBC7.1040504@maubp.freeserve.co.uk> <48D92CF4-04B5-42F9-92D2-3A2D9D2FE7E2@northwestern.edu> <47151222.1060502@maubp.freeserve.co.uk>
<156C46BF-1798-43D5-BA10-2A94FC63A3AB@northwestern.edu>
Message-ID: <4715222A.2070909@maubp.freeserve.co.uk>
Jared Flatow wrote:
> On Oct 16, 2007, at 2:33 PM, Peter wrote:
>
>>> In Bio.Fasta there are Parsers which really belong in
>>> Bio.SeqIO.FastaIO, if anywhere. How about Bio.Fasta becomes the more
>>> general Fasta reader, nothing to do with sequences. ...
>> In actual fact, the Bio.Fasta module predates Bio.SeqIO, and I was
>> thinking in a few releases time of suggesting its deprecation (but
>> not just yet as for several years it was the best documented and
>> most used parser in Biopython).
>
> I see, it looks like its meant to be deprecated, I was just saying
> its actually doing SeqIO functionality.
Well I'm currently just making a suggestion for the future, deprecating
Bio.Fasta, we should still canvas opinion on the main mailing list
before taking that action.
>> If we do decided keep Bio.Fasta (or extend it), then perhaps
>> Bio.SeqIO.FastaIO should become just a wrapper for Bio.Fasta
>>
>> I'm still digressing your ideas to turn Bio.Fasta into a generic
>> parser that copes with sequences, qualities scores, or anything else.
That was a typo, but you managed to guess my meaning. I meant to say:
I'm still digesting [i.e. thinking about] your ideas to turn Bio.Fasta
into a generic parser that copes with sequences, qualities scores, or
anything else.
> I'm not quite sure you're meaning of digressing, if you mean thinking
> it over, then great =) Otherwise I hope you'll seriously consider it
> anyway. Either way, I think I posted a more coherent message on
> bugzilla with some example data and motivation.
I'll take a look, Bug 2382 - Generic FASTA parser
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
Peter
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 21:01:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 22:01:29 +0100
Subject: [Biopython-dev] CVS to SVN
In-Reply-To: <6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk>
<6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
Message-ID: <471526A9.1010709@maubp.freeserve.co.uk>
Jared Flatow wrote:
>> I would say one reason why we aren't charging ahead with a move
>> from CVS to subversion is only a few posters on this mailing list
>> actively WANT to move to subversion, and no-one has really
>> championed the move (yet).
>
> Does that mean most developers don't WANT to move, or just that they
> don't ACTIVELY want to move?
Going back over the archives, Chris Lasher was most vocal in supporting
the move, and there were a few other positive voices.
Speaking for myself, I have no strong desire either way, and I don't
think Michiel objected either (except over the timing). Then as now, we
are hoping to get the next release out "shortly", so after that would be
a good time to make the switch.
[I'm assuming we won't loose any revision history or comments, and that
things like the web based ViewCVS and its RSS feed will still be available]
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:02:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:02:03 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162102.l9GL23rr010250@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:02 EST -------
Are there any other "FASTA like" formats you know of, in addition to
traditional sequence data and the 454 GSFlex quality score files?
We could do this using the old Scanner/Consumer model (see the pre-Martel
parse, CVS revision 1.8 of Bio/Fasta/__init__.py for example).
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?rev=1.8&cvsroot=biopython&content-type=text/vnd.viewcvs-markup
The scanner would be the same for all formats, and would pass the data with
whitespace (spaces, new lines etc) as is. We could then have one consumer for
each supported FASTA variant:
_Scanner Scans a FASTA-format stream.
_RecordConsumer Consumes FASTA data to a Record object.
_SequenceConsumer Consumes FASTA data to a Sequence object.
_QualityConsumer (new) could build a list of integers for each record?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:26:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:26:29 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162126.l9GLQT8O011239@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #3 from jflatow at northwestern.edu 2007-10-16 17:26 EST -------
On second thought, let me just rewrite all the code:
# The Bio.Fasta parser
class Fasta(): # or whatever
@staticmethod
def parse(file):
# return an iterator over the file as Bio.Fasta.Records
# for the records, trim newline from header, don't do anything to data
# The Bio.SeqIO.FastaIO wrapper for Bio.Fasta
class FastaIO(): # or however its organized
@staticmethod
def header_todict(header):
parts = re.split('[,|]?\s+', header, maxsplit=1)
assert len(parts) == 2
return {'id': parts[0],
'description': header}
@staticmethod
def data_toseq(data, alphabet):
return Seq(re.sub('\s+', '', data), alphabet)
@staticmethod
def parse(file, header_todict=Fasta.header_todict,
alphabet=single_letter_alphabet):
return (SeqRecord(seq=data_toseq(record.data, alphabet),
**header_todict(record.header)) for record in
Bio.Fasta.parse(file))
# Now to use these in my example I can do
seq_dict = SeqIO.to_dict(SeqIO.FastaIO.parse(seq_file))
for record in Bio.Fasta.parse(qual_file):
id = Bio.SeqIO.FastaIO.header_todict(record.header)['id']
seq_dict[id].quality = [int(x) for x in record.data.split()]
# Suppose instead I have an alignment file, which looks like this:
>contigname
A A 10 64
T T 9 64
C C 9 64
...
# and on, where the first column is a reference sequence, the second column is
a consensus
# sequence, the third column is the number of reads aligned, the fourth column
is the combined
# quality score
# Now its just as easy for me to parse this into an object
class ContigAlign():
def __init__(self, name, ref, consensus, numreads, qscore):
self.name = name
self.ref = ref
self.consensus = consensus
self.numreads = numreads
self.qscore = qscore
# ill make a dictionary of my contigaligns
d = {}
for record in Bio.Fasta.parse(file):
(ref, consensus, numreads, qscore) = zip(record.data.split('\n'))
d[record.header] = ContigAlign(record.header, ref, consensus, numreads,
qscore)
# maybe i would turn ref and consensus into Seqs, but you get the point
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:38:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:38:45 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162138.l9GLcj29011655@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 17:38 EST -------
In comment 3, did you just make up this file format as an example?
>contigname
A A 10 64
T T 9 64
C C 9 64
...
with four columns: reference sequence, consensus, number of reads aligned, and
combined quality score.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 21:58:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 17:58:38 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162158.l9GLwc68012343@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #5 from jflatow at northwestern.edu 2007-10-16 17:58 EST -------
Nope, they actually have a file format that looks like this:
Position Consensus Quality Score Depth Signal StdDeviation
>contig00001 1
1 G 64 2 1.00 0.00
2 A 64 2 1.00 0.00
3 G 64 2 1.00 0.00
4 A 64 2 1.00 0.00
5 G 64 2 2.00 0.00
6 G 64 2 2.00 0.00
7 A 64 2 3.00 0.00
8 A 64 2 3.00 0.00
9 A 64 2 3.00 0.00
10 C 64 2 2.00 0.00
11 C 64 2 2.00 0.00
12 T 64 2 1.00 0.00
13 C 64 2 3.00 0.00
14 C 64 2 3.00 0.00
15 C 64 2 3.00 0.00
16 G 64 2 1.00 0.00
17 T 64 2 1.00 0.00
18 G 64 2 1.00 0.00
19 A 64 2 1.00 0.00
20 T 64 2 1.00 0.00
21 C 64 2 2.00 0.00
22 C 64 2 2.00 0.00
Note the file-wide header at the top of the page (a generic FASTA-like parser
might skip to the first '>'), or we could get rid of that beforehand but it
would be nice if it were smart.
Also, here is another sample FASTA-like file format they use for pair
alignments:
>ERSGEES01EM5WC, 2..30 of 95 and ERSGEES01C1ZV2, 1..29 of 268 (29/29 ident)
2 CGGTGACCCGGGAGATCTGAATTCCTGGT 30
1 CGGTGACCCGGGAGATCTGAATTCCTGGT 29
>ERSGEES01EM5WC, 2..29 of 95 and ERSGEES01DMS5T, 1..28 of 259 (28/28 ident)
2 CGGTGACCCGGGAGATCTGAATTCCTGG 29
1 CGGTGACCCGGGAGATCTGAATTCCTGG 28
>ERSGEES01EM5WC, 29..2 of 95 and ERSGEES01D8GDV, 205..232 of 232 (28/28 ident)
29 CCAGGAATTCAGATCTCCCGGGTCACCG 2
205 CCAGGAATTCAGATCTCCCGGGTCACCG 232
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:09:06 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 18:09:06 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162209.l9GM96N5012764@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #6 from jflatow at northwestern.edu 2007-10-16 18:09 EST -------
The reference/consensus one was inspired by yet another format they have: there
are 2 tools they provide, one for mapping to an existing sequence, the other
for ab initio contig building. The mapping one has the extra reference column.
As you can see it might be hard to keep up with all these similar formats as
part of Biopython (these are only from one source). Certainly the common ones
should have wrappers but we should also be able to easily get the stream of
records.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 22:13:48 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 18:13:48 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162213.l9GMDmBM012914@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-16 18:13 EST -------
Could you attach a few of these real files? Including where they came from,
i.e. the company whose software writes such output, and what the call each file
format variant.
If you can get a matched set (i.e. all associated with the same few sequences)
then even better.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 16 23:09:00 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 19:09:00 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710162309.l9GN90wg015092@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #8 from jflatow at northwestern.edu 2007-10-16 19:08 EST -------
The files are very large, I assure you they are just longer versions of what I
have supplied here though. The company is Roche Diagnostics. The initial
reads/quality files are the output of the 454 GSFlex genome sequencing
machines. They have two pieces of software: gsMapper and gsAssembler which
output the contigs.
Reads/Quality files from the machine are called:
454Reads.{fna,qual}
gs* output:
454{All,Large}Contigs.{fna,qual}
454PairAlign.txt
454AlignmentInfo.tsv
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 00:10:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 20:10:45 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710170010.l9H0AjYe018147@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 20:10 EST -------
> Note there is yet another (!) translation function Bio.SeqUtils.translate()
> which is frame aware [personally I would mark a lot of this module as
> deprecated].
Given the various translate functions we already have in Biopython, why do you
want to add another one? Is there something the "translate" method can do that
the "translate" function cannot? Since the "translate" function can take Seq
objects as well as simple strings, I'd prefer the "translate" function over a
"translate" method.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 16 16:49:18 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 16 Oct 2007 17:49:18 +0100
Subject: [Biopython-dev] SeqRecord to file format as string
In-Reply-To: <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
Message-ID: <4714EB8E.3000700@maubp.freeserve.co.uk>
>> Did you know you can write to a string using any Bio.SeqIO supported
>> file format using StringIO? Perhaps we should spell this out more
>> explicitly in the documentation, but a motivating example would help.
>
> This is what I do now, but it seems like a hack to me to go this
> route. To always have to write to a file feels strange, but I see
> that it would be messy to go OO since there are so many formats.
> However, giving preference to fasta over other formats by making it
> innate doesn't seem like such a terrible idea. I do have mixed
> feelings about 'bloating' the code which is why I asked, and you have
> convinced me that this is not quite appropriate given existing
> convention. However the idea would be to put the to_fasta or
> to_format method inside the SeqRecord, then to call it from the IO
> when needed to actually write to a file, but call it directly when
> all that is wanted is a string...
Its debatable isn't it? I suspect that for most users, when they want a
record in a particular file format its for writing to a file. However,
adding a to_format() method to a SeqRecord some sense (suitable for
sequential file formats only). This would take a format name and return
a string, by calling Bio.SeqIO with a StringIO object internally.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 02:17:28 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Oct 2007 22:17:28 -0400
Subject: [Biopython-dev] [Bug 2382] Generic FASTA parser
In-Reply-To:
Message-ID: <200710170217.l9H2HSAx024040@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp 2007-10-16 22:17 EST -------
If all these special fasta files are coming from Roche Diagnostics, I'd suggest
to create a module rather than trying to put this in Bio.SeqIO. Bio.SeqIO is
one of the few modules in Biopython that is used by most users, so I'd like to
keep it clean as much as possible. To avoid confusion for users who just want
to parse regular Fasta files, I think the module should not be called
Bio.Fasta. In addition, I doubt we'd get much code reuse from a generic
Bio.Fasta module beyond what is needed for the Roche files, since the only
thing they have in common is that they use ">" to separate records.
With a separate module to handle the Roche files, my preferred usage would be
something like this:
from Bio import SeqIO, GSFlex # Or whatever you'd like to call it
seqrecords = SeqIO.parse(open("mysequences.fa"), "fasta")
qualities = GSFlex.parse(open("myqualities.qual"), "quality")
for seqrecord, quality in zip(seqrecords, qualities):
seqrecord.quality = quality
Note that "quality" is currently not a field of the SeqRecord class, but with
SeqRecord being a Python class, we can just add fields on the fly.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Wed Oct 17 02:30:41 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 22:30:41 -0400
Subject: [Biopython-dev] CVS to SVN
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk> <7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu> <4714EB34.8000207@maubp.freeserve.co.uk>
<6DFB6FBB-CC55-41D1-8D35-4906E6B502CF@northwestern.edu>
<471526A9.1010709@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63B@mail2.exch.c2b2.columbia.edu>
> > Does that mean most developers don't WANT to move, or just that they
> > don't ACTIVELY want to move?
>
> Speaking for myself, I have no strong desire either way, and I don't
> think Michiel objected either (except over the timing).
I don't have an objection against SVN either now or later. If you wants to do
the CVS->SVN conversion, just make sure to inform the developers when they
should pause making commits to CVS during the actual move.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/16/2007 5:01 PM
To: Jared Flatow; biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] CVS to SVN
Jared Flatow wrote:
>> I would say one reason why we aren't charging ahead with a move
>> from CVS to subversion is only a few posters on this mailing list
>> actively WANT to move to subversion, and no-one has really
>> championed the move (yet).
>
> Does that mean most developers don't WANT to move, or just that they
> don't ACTIVELY want to move?
Going back over the archives, Chris Lasher was most vocal in supporting
the move, and there were a few other positive voices.
Speaking for myself, I have no strong desire either way, and I don't
think Michiel objected either (except over the timing). Then as now, we
are hoping to get the next release out "shortly", so after that would be
a good time to make the switch.
[I'm assuming we won't loose any revision history or comments, and that
things like the web based ViewCVS and its RSS feed will still be available]
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From mdehoon at c2b2.columbia.edu Wed Oct 17 02:45:34 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 16 Oct 2007 22:45:34 -0400
Subject: [Biopython-dev] SeqRecord to file format as string
References: <0616CDF3-C4CB-4954-916C-A307A9CB9DD0@northwestern.edu> <47147341.4020708@maubp.freeserve.co.uk>
<7981A30E-BA08-4748-8FA3-4D7B82AF0F59@northwestern.edu>
<4714EB8E.3000700@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63C@mail2.exch.c2b2.columbia.edu>
How about the following:
SeqIO.write(sequences, handle, format) returns the properly formatted string
if handle==None.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Tue 10/16/2007 12:49 PM
To: Jared Flatow
Cc: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] SeqRecord to file format as string
>> Did you know you can write to a string using any Bio.SeqIO supported
>> file format using StringIO? Perhaps we should spell this out more
>> explicitly in the documentation, but a motivating example would help.
>
> This is what I do now, but it seems like a hack to me to go this
> route. To always have to write to a file feels strange, but I see
> that it would be messy to go OO since there are so many formats.
> However, giving preference to fasta over other formats by making it
> innate doesn't seem like such a terrible idea. I do have mixed
> feelings about 'bloating' the code which is why I asked, and you have
> convinced me that this is not quite appropriate given existing
> convention. However the idea would be to put the to_fasta or
> to_format method inside the SeqRecord, then to call it from the IO
> when needed to actually write to a file, but call it directly when
> all that is wanted is a string...
Its debatable isn't it? I suspect that for most users, when they want a
record in a particular file format its for writing to a file. However,
adding a to_format() method to a SeqRecord some sense (suitable for
sequential file formats only). This would take a format name and return
a string, by calling Bio.SeqIO with a StringIO object internally.
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 07:01:53 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 03:01:53 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710170701.l9H71rML002584@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 03:01 EST -------
The Biopython test currently fails:
======================================================================
FAIL: test_seq
----------------------------------------------------------------------
Traceback (most recent call last):
File "run_tests.py", line 151, in runTest
self.runSafeTest()
File "run_tests.py", line 188, in runSafeTest
expected_handle)
File "run_tests.py", line 289, in compare_output
"\nOutput : "+`output_line` + "\nExpected: "+`expected_line`
AssertionError:
Output : "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XRBWAYSNKMDCHVGU',
Alphabet())\n"
Expected: "Seq('ACBDGHKMNSRUWVYX', Alphabet()) -> Seq('XYVWARSNMKHCDBGU',
Alphabet())\n"
----------------------------------------------------------------------
This is with a fresh checkout from CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:01:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:01:12 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710170801.l9H81CVn005428@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:01 EST -------
> Given the various translate functions we already have in
> Biopython, why do you want to add another one? Is there
> something the "translate" method can do that the "translate"
> function cannot? Since the "translate" function can take Seq
> objects as well as simple strings, I'd prefer the "translate"
> function over a "translate" method.
Its a style thing, allowing more of an object orientated coding
style. For comparison, look at the evolution of the string module
in python which was phased out in favour of string object methods.
In terms of capabilities/arguments, I think the Bio.Seq.translate()
function and the suggested new Bio.Seq.Seq.translate() object
method should be equivalent. In fact, I would have one call the other
internally.
Currently, if you work with strings, you can use the following nice
concise style:
from Bio import Seq #The module
my_str = "ACTGACCGTGC"
print Seq.translate(my_str)
However, if you want to use Seq objects, this becomes rather a mess (in my
opinion):
from Bio import Seq #The module
from Bio.Alphabet.IUPAC import unambiguous_dna
my_seq = Seq.Seq("ACTGACCGTGC", unambiguous_dna)
print Seq.translate(my_seq)
I would like to be able to do this, using an object method:
from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import unambiguous_dna
my_seq = Seq("ACTGACCGTGC", unambiguous_dna)
print my_seq.translate()
Another bonus for people who think OO, is doing dir(my_seq) would
list these useful methods. Right now the user has to know to go
looking in the Bio.Seq module for a function.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:06:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:06:51 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710170806.l9H86ppn006217@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Generic FASTA parser |Generic Roche or GSFlex
| |"FASTA" parser
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:06 EST -------
Now that I'm clear where these files come from, I would agree with Michiel that
a separate Bio.GSFlex or Bio.Roche module would make more sense. I've added
these keywords to the bug summary (to help anyone searching in future).
P.S. From Michiel's example, you could use the existing SeqRecord annotations
dictionary if you wanted to avoid adding a new attribute to the objects on the
fly.
for seqrecord, quality in zip(seqrecords, qualities):
#seqrecord.quality = quality
#If you would rather use the existing dictionary:
seqrecord.annotations["quality"] = quality
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 08:46:41 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 04:46:41 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710170846.l9H8kfYq008185@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-17 04:46 EST -------
Fixed, I think.
I had some RNA/DNA with the U and T the wrong way round... and when I recently
tweaked the alphabet detection in Bio/Seq.py this had an impact.
The root issue is that we don't check the Alphabet's letters agree with the
sequence when creating a Seq object (where the Alphabet supplied has an
explicit list of letters). That would have caught the error in the test suite
much earlier. Maybe I should file an enhancement bug on this issue.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Oct 17 15:20:51 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 11:20:51 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710171520.l9HFKpXj030514@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #11 from jflatow at northwestern.edu 2007-10-17 11:20 EST -------
That sounds very reasonable to me to put all this stuff in a separate module. I
would like to implement the design we have been discussing, and I will name it
whatever you think is appropriate, but I would like to do it the way that seems
*right* to me. I mean by that building off of streams of
>header
data
...
since I think this pattern could eventually be used to actually clean up the
rest of the FASTA stuff, not make it worse. I also believe there could
potentially be other instances when a more general FASTA parser would be
useful, even if we don't know of them yet. How does this sound?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 00:46:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Oct 2007 20:46:19 -0400
Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser
In-Reply-To:
Message-ID: <200710180046.l9I0kJfN027373@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2382
------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2007-10-17 20:46 EST -------
> I also believe there could
> potentially be other instances when a more general FASTA parser would be
> useful, even if we don't know of them yet. How does this sound?
To me, it sounds premature to create a general Fasta parser if we don't know
other instances where it might be useful. (For comparison, note that
Biopython's general parser framework described in section 5.3 of the tutorial
is hardly used in recent Biopython parsers). By all means, keep the module
short and simple.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 04:33:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 00:33:34 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710180433.l9I4XYeY004357@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 00:33 EST -------
If we add translate, transcribe methods to Seq objects, can we then deprecate
Bio.Transcribe, Bio.Translate?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 05:21:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 01:21:15 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710180521.l9I5LFVS006209@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 01:21 EST -------
Looking at the test_GenBankFormat failure again. This is the only test that
fails with the Biopython currently in CVS, using mxTextTools 3.0.
This test is the only test for Bio.expressions. If we remove
test_GenBankFormat, we should deprecate Bio.expressions. Of all the Biopython
tests, only test_format_registry depends on Bio.expressions. This test relies
on the function _load_registries() in Bio/__init__.py. Skipping this function
call in Bio/__init__.py affects no other Biopython test.
So I'd like to suggest the following for the upcoming release:
-) Remove test_GenBankFormat.py and test_format_registry.py
-) Add DeprecationWarnings to Bio.expressions.
-) Skip the call to _load_registries() in Bio/__init__.py
We then have a working Biopython again with the recent mxTextTools, with
minimal disruptive changes to the code.
Any objections?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 18 10:35:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 06:35:01 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710181035.l9IAZ1DH022693@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-18 06:35 EST -------
Michiel in comment 6 wrote:
> If we add translate, transcribe methods to Seq objects, can we then
> deprecate Bio.Transcribe, Bio.Translate?
Bio.Transcribe - Yes
====================
Bio.Transcribe is so trivial we could recreate that code in Bio.Seq and then
deprecate Bio.Transcribe without losing any functionality:
- transcibe
- back_transcibe
Bio.Translate - Maybe
=====================
Initially I was just proposing to add:
- translate (including all stop codons)
I was simply going to have Bio.Seq call Bio.Translate to do the work.
It would be nice to simplify Biopython by also deprecating Bio.Translate, but
if we want to do this without loss of current functionality we need to consider
including the following in Bio.Seq:
- translate_to_stop (all amino acids up to but excluding the first stop)
- back_translate (gives a single possible nucleotide sequence)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Thu Oct 18 20:06:10 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Oct 2007 21:06:10 +0100
Subject: [Biopython-dev] BioSQL documentation
Message-ID: <4717BCB2.2070301@maubp.freeserve.co.uk>
I was just having a look at:
http://biopython.org/DIST/docs/biosql/python_biosql_basic.html
The source for this HTML and PDF document lives here in the BioSQL CVS:
biosql-schema/doc/biopython/python_biosql_basic.tex
It would be nice to update the following section:
> 3.3 Loading a GenBank file into the database
>
> ...
>
> Now we want to do the loading of the file into the database. The
> Biopython implementation works by taking a standard Iterator object
> that returns Biopython SeqFeature objects and then doing the loading.
I think that should actually say "... that returns Biopython SeqRecord
objects containing SeqFeature objects ..."
> ... our GenBank file, which we can do with the following code:
>
>>>> from Bio import GenBank parser = GenBank.FeatureParser()
>>>> iterator = GenBank.Iterator(open("cor6_6.gb"), parser)
That can now be done with Bio.SeqIO which should be clearer, and
hopefully make it easier to see how to extend this to SwissProt etc:
from Bio import SeqIO
iterator = SeqIO.parse(open("cor6_6.gb"), "genbank")
I would do this myself, but I don't have a BioSQL database setup myself
right now. It would also be nice if the documentation didn't skip the
bit about setting up the database with the BioSQL schema, or at least
had links to the relevant BioSQL documentation.
Peter
From bugzilla-daemon at portal.open-bio.org Fri Oct 19 02:15:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Oct 2007 22:15:01 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710190215.l9J2F1bo006275@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-10-18 22:15 EST -------
> It would be nice to simplify Biopython by also deprecating Bio.Translate,
To avoid a plethora of translation functions, I would prefer that.
> but if we want to do this without loss of current functionality we
> need to consider including the following in Bio.Seq:
> - translate_to_stop (all amino acids up to but excluding the first stop)
Whether or not to stop translating at the first stop codon could be an argument
to the translate method. As an alternative, it may be preferable to have a
split() method that splits the sequences at the stop codons. Such a method
could be applied to all protein sequences, not only those created by
translate().
> - back_translate (gives a single possible nucleotide sequence)
Does anybody actually use this function?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From salish at picasso.ucsf.edu Fri Oct 19 07:12:53 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Fri, 19 Oct 2007 00:12:53 -0700
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for
the Seq object (in Bio.Seq)
In-Reply-To: <200710190215.l9J2F1bo006275@portal.open-bio.org>
References:
<200710190215.l9J2F1bo006275@portal.open-bio.org>
Message-ID: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
Yes. Back-translating a sequence is important in codon optimization,
searching for homologous proteins, etc.
> > - back_translate (gives a single possible nucleotide sequence)
> Does anybody actually use this function?
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From bugzilla-daemon at portal.open-bio.org Fri Oct 19 12:38:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 19 Oct 2007 08:38:59 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710191238.l9JCcx4I001886@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
------- Comment #37 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-19 08:38 EST -------
I would have suggested adding a mxTextTools version check to
test_GenBankFormat.py and throwing the external dependancy error is 3.0 is
found. That would "solve" the problem test case, and after the next release we
could begin the process of deprecating the modules you suggested.
But I'm OK with your suggestion Michiel (comment 36), although it seems a
little drastic.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Fri Oct 19 08:08:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 19 Oct 2007 09:08:41 +0100
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
References: <200710190215.l9J2F1bo006275@portal.open-bio.org>
<9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
Message-ID: <47186609.3090408@maubp.freeserve.co.uk>
Howard Salis wrote:
> Yes. Back-translating a sequence is important in codon optimization,
> searching for homologous proteins, etc.
Unlike forward translation, transcription, back-transcription,
complements and reverse complements, back-translation is not a
one-to-one mapping.
In your examples, would you want to know all:
- all possible back translations (as unambigous nucleotides)
- all possible back translations (as ambigous nucleotides)
- a possible back translation (using ambiguous nucleotides)
- a possible back translation (using un-ambiguous nucleotides)
For example, back translating an Tyr => UAC or UAU => UAW (nice and
clear - we can represent this perfectly with a single ambiguous codon).
On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN
Oh, and would you expect DNA or RNA back?
Peter
From salish at picasso.ucsf.edu Fri Oct 19 16:31:49 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Fri, 19 Oct 2007 09:31:49 -0700
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <47186609.3090408@maubp.freeserve.co.uk>
References:
<200710190215.l9J2F1bo006275@portal.open-bio.org>
<9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com>
<47186609.3090408@maubp.freeserve.co.uk>
Message-ID: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
Yes, I know it's a one-to-many mapping. But that's why it's nice to
have a handy subroutine for doing it.
For codon optimization, all possible back translations with
unambiguous nucleotides would be best. Then, one evaluates some
objective function over all possible sequences to find an optimal one.
Optimality depends on the application, but eliminating restriction
sites, avoiding certain repetitive or transposon sequences, etc is
very common.
For searching for homologous proteins, it would be best to have the
back-translate function produce something that could be fed into an
alignment program or regexp expression. Then, one could align a
database of sequences with your back-translated protein to determine
which sequence is most similar to your protein. Basically, this is
what BlastP does (you might want to look up its algorithm to determine
a good way of doing this, if you wish to reproduce it in Biopython or,
otherwise, rely on the NCBI webserver).
What does the current back-translate function output?
-Howard
On 10/19/07, Peter wrote:
> Howard Salis wrote:
> > Yes. Back-translating a sequence is important in codon optimization,
> > searching for homologous proteins, etc.
>
> Unlike forward translation, transcription, back-transcription,
> complements and reverse complements, back-translation is not a
> one-to-one mapping.
>
> In your examples, would you want to know all:
> - all possible back translations (as unambigous nucleotides)
> - all possible back translations (as ambigous nucleotides)
> - a possible back translation (using ambiguous nucleotides)
> - a possible back translation (using un-ambiguous nucleotides)
>
> For example, back translating an Tyr => UAC or UAU => UAW (nice and
> clear - we can represent this perfectly with a single ambiguous codon).
> On the other hand, Arg => AGA, AGG, CGA, CGC, CGG, CGU => AGR or CGN
>
> Oh, and would you expect DNA or RNA back?
>
> Peter
>
>
From bugzilla-daemon at portal.open-bio.org Mon Oct 22 09:07:05 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Oct 2007 05:07:05 -0400
Subject: [Biopython-dev] [Bug 2366] Ambiguous nucleotides in
(Reverse)complement functions in Bio.Seq
In-Reply-To:
Message-ID: <200710220907.l9M975hw013729@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2366
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-22 05:07 EST -------
Marking as fixed
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 22 12:30:59 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Oct 2007 13:30:59 +0100
Subject: [Biopython-dev] [Bug 2381] back-translate
In-Reply-To: <9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
References: <200710190215.l9J2F1bo006275@portal.open-bio.org> <9fa7e98e0710190012t52ceb9dbx564ba3720d4359f2@mail.gmail.com> <47186609.3090408@maubp.freeserve.co.uk>
<9fa7e98e0710190931q3b589488p55b8863cc0e38380@mail.gmail.com>
Message-ID: <471C9803.8050709@maubp.freeserve.co.uk>
Howard Salis wrote:
>
> What does the current back-translate function output?
>
Short example,
>>> from Bio import Translate
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet.IUPAC import unambiguous_dna
>>> my_dna = Seq("GCCGCATGCATAGATAGATAG", unambiguous_dna)
>>> my_prot = Translate.unambiguous_dna_by_id[11].translate(my_dna)
>>> my_prot
Seq('AACIDR*', HasStopCodon(IUPACProtein(), '*'))
>>> Translate.unambiguous_dna_by_id[11].back_translate(my_prot)
Seq('GCTGCTTGTATTGATCGTTAA', IUPACUnambiguousDNA())
>>> my_dna
Seq('GCCGCATGCATAGATAGATAG', IUPACUnambiguousDNA())
i.e. The current back_translate picks one possible back translation.
Peter
From bugzilla-daemon at portal.open-bio.org Mon Oct 22 16:52:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Oct 2007 12:52:12 -0400
Subject: [Biopython-dev] [Bug 2386] New: Bio.Seq.Seq and MutableSeq count()
method only works for single residues
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
Summary: Bio.Seq.Seq and MutableSeq count() method only works for
single residues
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Logging this bug to report an issue raised on the mailing list by Jimmy
Musselwhite.
The Seq object and MutableSeq objects' count methods only works for single
residues (e.g. "G"). Zero is returned when asked to count a multicharacter
string, "GG" for example.
For compatibility with strings, my_seq.count("GG") should work as expected,
returning the same value as my_seq.tostring().count("GG")
Doing this for the Seq object is trivial. Adding support for the MutableSeq
could be done via the tostring() method but there might be a more elegant
solution with less overhead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Wed Oct 24 00:46:58 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 23 Oct 2007 20:46:58 -0400
Subject: [Biopython-dev] Removing deprecated functionality
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
Hi everybody,
We now have a fixed Biopython in CVS that works with both the old and the new
mxTextTools. I am planning to create the new Biopython release later this
week.
Bio.Kabat and the blast and blasturl functions in Bio.Blast.NCBIWWW were
deprecated in previous Biopython. The two functions in Bio.Blast.NCBIWWW have
been deprecated in favor of qblast starting with Biopython 1.40b (February
2005). I am planning to remove this functionality for release 1.44 -- let us
know if this would cause you some problems.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 16:58:19 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 12:58:19 -0400
Subject: [Biopython-dev] [Bug 2361] Test Suite Failures from Martel/Sax with
egenix mxTextTools 3.0
In-Reply-To:
Message-ID: <200710251658.l9PGwJgB007432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2361
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 12:58 EST -------
Marking as fixed, Michiel made the changes outlined in comment 36 in CVS.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 17:02:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 13:02:50 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251702.l9PH2oC8008104@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Uppdated lcc code. |Updated Bio.lcc code.
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 13:02 EST -------
Sebastian - any feedback on my above comment?
P.S. Corrected spelling in bug title.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 17:22:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 13:22:46 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251722.l9PHMkFm009816@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
------- Comment #4 from sbassi at gmail.com 2007-10-25 13:22 EST -------
(In reply to comment #3)
> Sebastian - any feedback on my above comment?
>
> P.S. Corrected spelling in bug title.
>
I do agree with most of your comments, but I can't implement them right now
because of my current workload.
LCC stands for Local Composition Complexity (see here
http://mrw.interscience.wiley.com/emrw/9780470015902/els/article/a0005260/current/abstract)
Please move it to Bio/SeqUtils/.
I don't know the values for ambiguous nucleotides, I would ckeck this for next
version.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:01:50 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 14:01:50 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251801.l9PI1oRF012742@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:01 EST -------
I've checked this in as Bio/SeqUtils/lcc.py (and deprecated Bio/lcc.py which
had a slightly different API since you dropped the start/end options in
lcc_mult).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 18:25:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 14:25:49 -0400
Subject: [Biopython-dev] [Bug 2374] Updated Bio.lcc code.
In-Reply-To:
Message-ID: <200710251825.l9PIPnEG015022@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2374
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 14:25 EST -------
Also updated Bio/SeqUtils/lcc.py to cope with Seq and MutableSeq objects in
addition to strings.
Plus added a unit test, test_SeqUtils.py which covers both Bio.SeqUtils.lcc and
Bio.SeqUtils.CheckSum and replaces my older test_CheckSum.py
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Oct 25 22:03:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Oct 2007 18:03:15 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200710252203.l9PM3For029293@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-25 18:03 EST -------
Created an attachment (id=795)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=795&action=view)
Rough patch to add methods to Bio.Seq
Part of this patch is to add ambiguous_generic_by_id and
ambiguous_generic_by_name entries to Bio.Data.CodonTable, variants of the
unambiguous generic_by_id and generic_by_name tables. Using this lets us
properly translate ambiguous sequences.
This patch does NOT tackle back_translate, or have special treatment of
start/stop codons, in the Seq methods.
This patch does NOT mark Bio.Translate or Bio.Transcribe as deprecated.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Fri Oct 26 02:30:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Thu, 25 Oct 2007 22:30:56 -0400
Subject: [Biopython-dev] CVS freeze
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Hi everybody,
With the fixed Biopython now in CVS, I'm planning to make the new Biopython
release tomorrow (Saturday). I'd therefore like to ask you not to make
commits to CVS starting from 0:01 GMT on Saturday, until the new release is
out. If you make any commits before that time, please double-check that all
the Biopython tests still run. Also, if you have some important patches for
which you need more time, please let us know.
Thanks!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From bsouthey at gmail.com Fri Oct 26 15:12:14 2007
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 26 Oct 2007 10:12:14 -0500
Subject: [Biopython-dev] CVS freeze
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID:
Hi,
Just in case you are not aware of it, UniProt is going to make a
substantial change to the DE line in SwissProt/TrEMBL format 'Not
before: 01-Feb-2008'. See
http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
Bruce
On 10/25/07, Michiel De Hoon wrote:
> Hi everybody,
>
> With the fixed Biopython now in CVS, I'm planning to make the new Biopython
> release tomorrow (Saturday). I'd therefore like to ask you not to make
> commits to CVS starting from 0:01 GMT on Saturday, until the new release is
> out. If you make any commits before that time, please double-check that all
> the Biopython tests still run. Also, if you have some important patches for
> which you need more time, please let us know.
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From biopython-dev at maubp.freeserve.co.uk Fri Oct 26 15:24:57 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Oct 2007 16:24:57 +0100
Subject: [Biopython-dev] DE line in SwissProt/TrEMBL format
In-Reply-To:
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID: <472206C9.6060407@maubp.freeserve.co.uk>
Bruce Southey wrote:
> Hi,
> Just in case you are not aware of it, UniProt is going to make a
> substantial change to the DE line in SwissProt/TrEMBL format 'Not
> before: 01-Feb-2008'. See
> http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
>
> Bruce
Thanks for the heads up. I don't think we need to worry about that for
the planned release. We should still be able to parse the new files,
its just the new structured description will be stored as a single
concatenated string in Biopython.
Peter
From mdehoon at c2b2.columbia.edu Sat Oct 27 03:12:46 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri, 26 Oct 2007 23:12:46 -0400
Subject: [Biopython-dev] CVS freeze
References: <6243BAA9F5E0D24DA41B27997D1FD14402B643@mail2.exch.c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B644@mail2.exch.c2b2.columbia.edu>
Thanks for letting us know. I think that it is OK though to make the release
now, as we'll probably have another release before the date the
SwissProt/TrEMBL format changes.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Bruce Southey [mailto:bsouthey at gmail.com]
Sent: Fri 10/26/2007 11:12 AM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] CVS freeze
Hi,
Just in case you are not aware of it, UniProt is going to make a
substantial change to the DE line in SwissProt/TrEMBL format 'Not
before: 01-Feb-2008'. See
http://www.expasy.org/sprot/relnotes/sp_soon.html#DE
Bruce
On 10/25/07, Michiel De Hoon wrote:
> Hi everybody,
>
> With the fixed Biopython now in CVS, I'm planning to make the new Biopython
> release tomorrow (Saturday). I'd therefore like to ask you not to make
> commits to CVS starting from 0:01 GMT on Saturday, until the new release is
> out. If you make any commits before that time, please double-check that all
> the Biopython tests still run. Also, if you have some important patches for
> which you need more time, please let us know.
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From mdehoon at c2b2.columbia.edu Sun Oct 28 06:32:40 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:32:40 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Hi everybody,
Biopython release 1.44 is now available for download from the Biopython
website at http://biopython.org.
This release includes lots of code improvements and fixes in the Blast
interface and parsers, sequence input/output, the SwissProt parser, the
clustering routines, as well as a brand new module for population genetics.
For reasons of compatibility, some radical changes were necessary in some
parts of the code; please let us know if you find some functionality missing.
My thanks to all code contributers who made this new release possible.
--Michiel on behalf of the Biopython developers
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Sun Oct 28 06:35:12 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 02:35:12 -0400
Subject: [Biopython-dev] Non-ASCII character in PopGen documentation
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
While making the 1.44 release, I noticed that a non-ascii character in a
formula in the PopGen documentation was causing problems for Hevea. As I
couldn't guess what the formula should be, I replaced this formula by a
placeholder (see CVS). Can somebody have a look at this and try to fix it?
Thanks!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 09:43:56 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 09:43:56 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <472459DC.3050907@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Grand. Thank you for dedicating your weekend to this Michiel.
A couple of questions, the main Wiki page is locked and needs updating
to mention the new release. Who has access?
Secondly, I see you have updated the open-bio news feed,
http://news.open-bio.org/
What about http://biopython.org/news/ - which appears not to have been
used at all since it was started. Perhaps we can just have a filtered
view of http://news.open-bio.org/ here?
Related to this, on the wiki News page perhaps we can just show the last
few items from http://news.open-bio.org/index.rdf (even though some of
them are for other Bio* projects).
Peter
From tiagoantao at gmail.com Sun Oct 28 18:24:55 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 18:24:55 +0000
Subject: [Biopython-dev] Non-ASCII character in PopGen documentation
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B646@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724D3F7.40105@gmail.com>
I currently have a different version of Tutorial.tex here (I have other
things already written in preparation for future versions). I don't know
how the non-ascii character got there. The formula is:
\[ N_{m} = \frac{1 - F_{st}}{4F_{st}} \]
Michiel De Hoon wrote:
> While making the 1.44 release, I noticed that a non-ascii character in a
> formula in the PopGen documentation was causing problems for Hevea. As I
> couldn't guess what the formula should be, I replaced this formula by a
> placeholder (see CVS). Can somebody have a look at this and try to fix it?
>
> Thanks!
>
> --Michiel.
>
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
--
tiagoantao at gmail.com
http://tiago.org/ps
From tiagoantao at gmail.com Sun Oct 28 20:54:06 2007
From: tiagoantao at gmail.com (Tiago Antao)
Date: Sun, 28 Oct 2007 20:54:06 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724F6EE.50805@gmail.com>
Hi,
Michiel De Hoon wrote:
> This release includes lots of code improvements and fixes in the Blast
> interface and parsers, sequence input/output, the SwissProt parser, the
> clustering routines, as well as a brand new module for population genetics.
> For reasons of compatibility, some radical changes were necessary in some
> parts of the code; please let us know if you find some functionality missing.
Is it OK to resume uploading non stable code to CVS? I have a few things
that I would like to add to the population genetics module (coalescent
simulation), but that still needs polishing (mainly documenting and test
code).
Tiago
--
tiagoantao at gmail.com
http://tiago.org/ps
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 20:55:42 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 20:55:42 +0000
Subject: [Biopython-dev] Biopython release 1.44 ready
In-Reply-To: <4724F6EE.50805@gmail.com>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <4724F74E.5010801@maubp.freeserve.co.uk>
Tiago Antao wrote:
> Is it OK to resume uploading non stable code to CVS? I have a few things
> that I would like to add to the population genetics module (coalescent
> simulation), but that still needs polishing (mainly documenting and test
> code).
Wait and see what Michiel's says. However, perhaps you should hold off
a few more days - in case there are any teething problems with the
Biopython 1.44 that would warrant making another release ASAP.
Peter
From biopython-dev at maubp.freeserve.co.uk Sun Oct 28 19:59:41 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Oct 2007 19:59:41 +0000
Subject: [Biopython-dev] mxTextTools optional?
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
Message-ID: <4724EA2D.3020609@maubp.freeserve.co.uk>
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Given some of the changes (deprecations) made in this release, perhaps
we can now change setup.py so that mxTextTools is merely suggested, but
not required (the same way we treat reportlab and Numeric).
Any comments?
Peter
From mdehoon at c2b2.columbia.edu Mon Oct 29 01:12:48 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 21:12:48 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<472459DC.3050907@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B647@mail2.exch.c2b2.columbia.edu>
> Michiel De Hoon wrote:
> > Hi everybody,
> >
> > Biopython release 1.44 is now available for download from the Biopython
> > website at http://biopython.org.
>
> Grand. Thank you for dedicating your weekend to this Michiel.
>
> A couple of questions, the main Wiki page is locked and needs updating
> to mention the new release. Who has access?
Apparently I do. I updated this page now.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2913 bytes
Desc: not available
URL:
From mdehoon at c2b2.columbia.edu Mon Oct 29 01:57:18 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 21:57:18 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B648@mail2.exch.c2b2.columbia.edu>
> Is it OK to resume uploading non stable code to CVS? I have a few things
> that I would like to add to the population genetics module (coalescent
> simulation), but that still needs polishing (mainly documenting and test
> code).
Sure, go ahead.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Mon Oct 29 02:01:16 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 22:01:16 -0400
Subject: [Biopython-dev] Biopython release 1.44 ready
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724F6EE.50805@gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B649@mail2.exch.c2b2.columbia.edu>
On second thought, I agree with Peter's suggestion of waiting a few days to
see if any disasters show up with the 1.44 release. Sorry!
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Tiago Antao [mailto:tiagoantao at gmail.com]
Sent: Sun 10/28/2007 4:54 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Biopython release 1.44 ready
Hi,
Michiel De Hoon wrote:
> This release includes lots of code improvements and fixes in the Blast
> interface and parsers, sequence input/output, the SwissProt parser, the
> clustering routines, as well as a brand new module for population genetics.
> For reasons of compatibility, some radical changes were necessary in some
> parts of the code; please let us know if you find some functionality
missing.
Is it OK to resume uploading non stable code to CVS? I have a few things
that I would like to add to the population genetics module (coalescent
simulation), but that still needs polishing (mainly documenting and test
code).
Tiago
--
tiagoantao at gmail.com
http://tiago.org/ps
From mdehoon at c2b2.columbia.edu Mon Oct 29 02:02:12 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Sun, 28 Oct 2007 22:02:12 -0400
Subject: [Biopython-dev] mxTextTools optional?
References: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu>
<4724EA2D.3020609@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64A@mail2.exch.c2b2.columbia.edu>
The fewer software packages Biopython requires, the better, to keep things
simple for users, not to mention developers. For a future release, we should
check if the modules that still rely on mxTextTools can be replaced by
pure-Python code.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Peter
Sent: Sun 10/28/2007 3:59 PM
To: biopython-dev at lists.open-bio.org
Subject: Re: [Biopython-dev] mxTextTools optional?
Michiel De Hoon wrote:
> Hi everybody,
>
> Biopython release 1.44 is now available for download from the Biopython
> website at http://biopython.org.
Given some of the changes (deprecations) made in this release, perhaps
we can now change setup.py so that mxTextTools is merely suggested, but
not required (the same way we treat reportlab and Numeric).
Any comments?
Peter
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3310 bytes
Desc: not available
URL:
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 17:21:03 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 13:21:03 -0400
Subject: [Biopython-dev] [Bug 2390] New: Error importing Swiss Prot in BioSQL
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
Summary: Error importing Swiss Prot in BioSQL
Product: Biopython
Version: Not Applicable
Platform: Macintosh
OS/Version: MacOS X
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: Biosql at hotmail.com
Hello,
I already submitted this problem in the mailing list, where I can't import the
SwissProt flat file in BioSQL, even with the new version (1.44) of Biopython.
Here's the script I'm using :
from BioSQL import BioSeqDatabase
from Bio.SwissProt import SProt
server = BioSeqDatabase.open_database(driver = 'MySQLdb', user = '', passwd =
'', host = 'localhost', db = 'bioseqdb')
s_parser = SProt.SequenceParser()
s_iterator = SProt.Iterator(open('path to/uniprot_sprot.dat', 'r'), s_parser)
db = server.new_database('Swiss')
db.load(s_iterator)
And here's the error:
Traceback (most recent call last):
File '', line 1, in
File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 414, in
load
db_loader.load_seqrecord(cur_record)
File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File '/sw/lib/python2.5/site-packages/BioSQL/Loader.py', line 250, in
_load_bioentry_table
version))
File '/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py', line 277, in
execute
self.cursor.execute(sql, args or ())
File '/sw/lib/python2.5/site-packages/MySQLdb/cursors.py', line 151, in
execute
query = query % db.literal(args)
TypeError: not all arguments converted during string formatting
Thanks for the help !
Jonathan
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 17:23:54 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 13:23:54 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710291723.l9THNsun017818@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #1 from Biosql at hotmail.com 2007-10-29 13:23 EST -------
Created an attachment (id=799)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=799&action=view)
Sample of Swiss Prot flat file
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Oct 29 19:19:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Oct 2007 15:19:01 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710291919.l9TJJ1O2026999@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-29 15:19 EST -------
I'm trying to narrow down the problem:
* Have you tried different input SwissProt files?
* Have you tried a GenBank file (using the GenBank parser)?
* Did you check the username/password as suggested on the mailing list (empty
strings look wrong to me)?
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 19:58:45 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 19:58:45 +0000
Subject: [Biopython-dev] BioRegistry, Bio.db
Message-ID: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
While looking over the Tutorial this evening (and making some sequence
related updates), I and noticed that the section "BioRegistry ?
automatically ?nding sequence sources" (in the Cook Book chapter)
doesn't work anymore.
I believe that Bio.db is setup by the complicated and un-commented
code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was
commented out for Biopython 1.44
Does anyone use this module? I've never really looked at it in depth,
but it looks interesting and perhaps worth saving. Note if we do want
to resurrect it, it needs a unit test.
At first glance, the only Martel dependency here is for recognising
error conditions and giving nice messages instead. If that's all it
is used for, then perhaps we can switch to regular expressions
instead.
Peter
From biopython-dev at maubp.freeserve.co.uk Mon Oct 29 21:39:50 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Oct 2007 21:39:50 +0000
Subject: [Biopython-dev] Removing deprecated functionality
In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
References: <6243BAA9F5E0D24DA41B27997D1FD14402B640@mail2.exch.c2b2.columbia.edu>
Message-ID: <320fb6e00710291439t6f636964i9681e2b0c90e6c96@mail.gmail.com>
On 10/24/07, Michiel De Hoon wrote:
> Bio.Kabat and ,,, were deprecated in previous Biopython. ....
> I am planning to remove this functionality for release 1.44
I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is
it OK if we remove the now empty directory as well?
Peter
From mdehoon at c2b2.columbia.edu Tue Oct 30 01:06:38 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon, 29 Oct 2007 21:06:38 -0400
Subject: [Biopython-dev] Removing Bio.Kabat
References: <320fb6e00710291438x3f7d7d57t77b06e4b2221c470@mail.gmail.com>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64E@mail2.exch.c2b2.columbia.edu>
As far as I know, it is not possible to remove a directory in CVS. See
http://www.thathost.com/wincvs-howto/cvsdoc/cvs_7.html#SEC69
I believe that it is possible to remove a directory by hand from the CVS
source tree, but it is not the official way to do it. Hopefully, we can
remove directories once we're using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com]
Sent: Mon 10/29/2007 5:38 PM
To: Michiel De Hoon
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Removing Bio.Kabat
On 10/24/07, Michiel De Hoon wrote:
> Bio.Kabat and ,,, were deprecated in previous Biopython. ....
> I am planning to remove this functionality for release 1.44
I see you removed the files Bio/Kabat/*.py for Biopython 1.44, but is
it OK if we remove the now empty directory as well?
Peter
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 12:25:20 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 08:25:20 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301225.l9UCPKjo026963@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 08:25 EST -------
Created an attachment (id=800)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=800&action=view)
Patch to Bio/Seq.py count methods
Lets the Seq and MutableSeq count methods take either a single letter or a
multiple letter argument, which can be strings, Seq objects or MutableSeq
objects. Adds doc-strings
Includes a trivial mini-test which would be used in the Seq unit test instead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chris.lasher at gmail.com Tue Oct 30 14:17:29 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Tue, 30 Oct 2007 10:17:29 -0400
Subject: [Biopython-dev] Biopython SVN Transition OK'd
Message-ID: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
Hi all,
Biopython just got the okay from OpenBio to transition from CVS to
Subversion--a good step in the right direction (though I've recently
started transitioning from SVN to Bazaar VCS). All we have to do is
come up with a date when the CVS repository can be locked down and
taken offline.
Also, I need to know what is needed from me in terms of helping all
the devs migrate to SVN. I produced a screencast series on Subversion
at
http://showmedo.com/videos/series?name=bfNi2X3Xg
and there is a transition guide at
http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
Would providing links to these on the wiki be sufficient? What further
information would you like to know? Subversion is not a radical
departure from CVS and many of the commands are a one-to-one mapping.
The biggest difference is commits occur for the whole repository, not
on a per-file basis, and directories are tracked, as well.
Let's get a discussion on this and set a date soon.
Chris
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 14:25:01 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 10:25:01 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301425.l9UEP19U002945@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #2 from dalloliogm at gmail.com 2007-10-30 10:25 EST -------
The new code is good, but please consider about implementing case-insensitive
searches:
>>> Seq('AACCCCaa').count('a')
... 2
>>> Seq('AACCCCaa').count('a', 'i')
... 4
they could be useful in many cases, because sometimes one has to deal
mixed-case sequences.
I think the easiest way to implement this would be by using regular
expressions..
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 18:02:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 14:02:49 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710301802.l9UI2n1J020073@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #3 from Biosql at hotmail.com 2007-10-30 14:02 EST -------
(In reply to comment #2)
> I'm trying to narrow down the problem:
> * Have you tried different input SwissProt files?
> * Have you tried a GenBank file (using the GenBank parser)?
> * Did you check the username/password as suggested on the mailing list (empty
> strings look wrong to me)?
>
> Peter
>
I'm sorry Peter, the reply you sent me on the mailing list was cut in half and
I didn't see the rest of your message until I've read it directly on the
mailing list.
I tried to parse the cor6_6.gb with the Genbank parser and I'm getting the same
result, sorry I didn't tried this before. I also tried what you suggest with
the SeqIO module with the cor6_6.gb and also a SwissProt file and I'm still
getting the same TypeError, which is :
Traceback (most recent call last):
File "DB_Gen.py", line 25, in
db.load(iterator)
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in
load
db_loader.load_seqrecord(cur_record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in
_load_bioentry_table
version))
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in
execute
self.cursor.execute(sql, args or ())
File "build/bdist.macosx-10.4-ppc/egg/MySQLdb/cursors.py", line 151, in
execute
TypeError: not all arguments converted during string formatting
It seems to me that the problem could be with the MySQLdb module, but I don't
understant since I'm using the latest release 1.2.2c1, but I've also tried it
with the stable 1.2.2 release.
Am I right ?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:06:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:06:38 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301906.l9UJ6cDZ023596@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST -------
I really don't want to make the Seq count method different to the python string
count method.
Speaking of which, the string uses count(sub [, start[, end]]) to allow
searching with a optional start and further optional end index. I should
probably add that.
In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is
a simple enough way of doing things. Counting case insensistive variants of a
longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the
python re library would work directly on Seq objects (without having to
explicitly turn them into strings first).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:06:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:06:52 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710301906.l9UJ6q7l023634@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:06 EST -------
Thanks for that. It looks like we can *probably* rule out a problem in the
sequence parsing.
Unfortunately I personally haven't used BioSQL myself (yet), and don't have a
system setup here I can try this on.
It appears (just from reading the stack error) that there is some mis-match
between the SQL query (which I assume contains python % placeholders) and the
list of arguments (to go in these placeholders).
If you fancy trying to investigate this further yourself, I would start by
adding a break point on BioSQL/BioSeqDatabase.py line 277 to check out what
contents of the sql and args variables are. Or, just add some print statements
just before line 277: self.cursor.execute(sql, args or ())
I hope someone else on the mailing list will have some suggestions...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:22:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:22:30 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301922.l9UJMUoM024725@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #4 from howard.salis at gmail.com 2007-10-30 15:22 EST -------
How about the upper and lower methods for Seq classes?
Then, one could do my_seq.upper().count("ATG")
Would that work well?
-Howard
(In reply to comment #3)
> I really don't want to make the Seq count method different to the python string
> count method.
>
> Speaking of which, the string uses count(sub [, start[, end]]) to allow
> searching with a optional start and further optional end index. I should
> probably add that.
>
> In the case of single letter searches, my_seq.count("A") + my_seq.count("a") is
> a simple enough way of doing things. Counting case insensistive variants of a
> longer sub-sequence like "ATG" wouldn't be so easy. I would be nice if the
> python re library would work directly on Seq objects (without having to
> explicitly turn them into strings first).
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 19:30:29 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 19:30:29 +0000
Subject: [Biopython-dev] Biopython SVN Transition
In-Reply-To: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
Message-ID: <47278655.8090300@maubp.freeserve.co.uk>
Chris Lasher wrote:
> Hi all,
>
> Biopython just got the okay from OpenBio to transition from CVS to
> Subversion--a good step in the right direction (though I've recently
> started transitioning from SVN to Bazaar VCS). All we have to do is
> come up with a date when the CVS repository can be locked down and
> taken offline.
I was wondering if anyone would start suggesting moving to git or
something else ;)
Michiel - are you expecting any complications from CVS to SVN regarding
the build process?
Another thought; will existing developer accounts "just work" on the SVN
system? Also do you (Chris) have CVS access, and if not do you need or
want it?
> Also, I need to know what is needed from me in terms of helping all
> the devs migrate to SVN. I produced a screencast series on Subversion
> at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a
> transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash
plugin working on my 64bit Ubuntu. I'll have to check that out on
Windows later in the week :)
If you are able to field any queries on the mailing list, that would
probably be fine.
> Would providing links to these on the wiki be sufficient?
If you could look after that aspect of the wiki, that would be great.
> What further information would you like to know? Subversion is not a
> radical departure from CVS and many of the commands are a one-to-one
> mapping. The biggest difference is commits occur for the whole
> repository, not on a per-file basis, and directories are tracked, as
> well.
The fact the CVS and SVN are relatively similar is probably one reason
why no-one has raised any real objections to the move.
> Let's get a discussion on this and set a date soon.
In terms of timing, how long do you/the OBF guys expect the transfer to
take? And would they prefer to do this over a weekend or mid week?
Barring any problems with Biopython 1.44 which would force us to do
another release in the very short term, I guess in the next fortnight is
reasonable (especially if we only expect a couple of days downtime).
Of course, I personally want to start working on the Seq objects and
alignments - and Tiago wants to get back to his Population Genetics module.
Peter
P.S. Would you or any of the people doing the transition be able to sort
out bug 2363?
http://bugzilla.open-bio.org/show_bug.cgi?id=2363
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:33:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:33:40 -0400
Subject: [Biopython-dev] [Bug 2386] Bio.Seq.Seq and MutableSeq count()
method only works for single residues
In-Reply-To:
Message-ID: <200710301933.l9UJXedO025330@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2386
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:33 EST -------
Adding .upper() and .lower() methods is on my mental todo list, just a bit
lower down the my priorities than the .count() method (this bug) and biological
methods covered on bug 2381.
One of us should file an enhancement bug for .upper() and .lower()
I agree they are needed to make the Seq object more string like. However the
implementation is non-trivial due to the alphabet object (which may define a
case sensitive list of expected letters).
And yes, once these methods are supported then doing
my_seq.upper().count("ATG") would work fine.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 19:45:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 15:45:35 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710301945.l9UJjZlQ026374@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Make SeqRecord subclass Seq |Make Seq more like a string,
|subclass string? |even subclass string?
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-30 15:45 EST -------
I modified the title to focus on the Seq object.
See also bug 2386 (about the count method), and bug 2381 (about biological
methods).
(In reply to comment #4)
> (In reply to comment #3)
> > It does not add any .short() method to give a truncated representation
> > string like the current str() method gives.
>
> Why not? This new method should not cause any compatibility problem
Mainly because I'm not convinced that we need a .short() method, and its harder
to remove things at a later date (as people may be using them).
Surely my_seq[:50] or depending on the context, str(my_seq[:50]), is enough?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Oct 30 22:32:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 18:32:12 -0400
Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL
In-Reply-To:
Message-ID: <200710302232.l9UMWCb3004960@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2390
------- Comment #5 from Biosql at hotmail.com 2007-10-30 18:32 EST -------
It seems that a %s is missing at line 243 in Loader.py, since there's only 6 %s
in the sql query, but 7 arguments are being fed for the loading of bioentry.
So I added an %s and the loading is fine, but another problem is arising after
this.
Traceback (most recent call last):
File "DB_Gen.py", line 25, in
db.load(iterator)
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 415, in
load
db_loader.load_seqrecord(cur_record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in
load_seqrecord
bioentry_id = self._load_bioentry_table(record)
File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 253, in
_load_bioentry_table
bioentry_id = self.adaptor.last_id('bioentry')
File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 148, in
last_id
return self.dbutils.last_id(self.cursor, table)
File "/sw/lib/python2.5/site-packages/BioSQL/DBUtils.py", line 35, in last_id
return cursor.insert_id()
AttributeError: 'Cursor' object has no attribute 'insert_id'
I'm gonna check it tommorow.
Jonathan
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Tue Oct 30 22:36:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Oct 2007 22:36:43 +0000
Subject: [Biopython-dev] BioRegistry, Bio.db
In-Reply-To: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
References: <320fb6e00710291258v533fed71u490ec1aadff3359c@mail.gmail.com>
Message-ID: <4727B1FB.2020803@maubp.freeserve.co.uk>
Peter wrote:
> While looking over the Tutorial this evening (and making some sequence
> related updates), I noticed that the section "BioRegistry ?
> automatically ?nding sequence sources" (in the Cook Book chapter)
> doesn't work any more.
Does anyone here use this? Should I ask on the main list?
> I believe that Bio.db is setup by the complicated and un-commented
> code in Bio/__init__.py by calling Bio/config/DBRegistry.py - this was
> commented out for Biopython 1.44
Confirmed. After uncommenting the call to _load_registries() in
Bio/__init__.py the example in the tutorial using Bio.db works.
Note you do get a DeprecationWarning about the concurrent behaviour
provided by Bio.MultiProc, but I have not explored any further.
Thoughts?
Peter
From mdehoon at c2b2.columbia.edu Wed Oct 31 01:05:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 30 Oct 2007 21:05:22 -0400
Subject: [Biopython-dev] Biopython SVN Transition
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu>
> Michiel - are you expecting any complications from CVS to SVN regarding
> the build process?
For the build process, we are not doing anything very complicated with CVS,
so I doubt that there will be any major problems when we start using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From mdehoon at c2b2.columbia.edu Wed Oct 31 01:05:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 30 Oct 2007 21:05:22 -0400
Subject: [Biopython-dev] Biopython SVN Transition
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B64F@mail2.exch.c2b2.columbia.edu>
> Michiel - are you expecting any complications from CVS to SVN regarding
> the build process?
For the build process, we are not doing anything very complicated with CVS,
so I doubt that there will be any major problems when we start using SVN.
--Michiel.
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2845 bytes
Desc: not available
URL:
From bugzilla-daemon at portal.open-bio.org Wed Oct 31 01:30:20 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Oct 2007 21:30:20 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710310130.l9V1UKEN014287@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2007-10-30 21:30 EST -------
First, let's think about how a Seq object should look like, before getting into
implementation details.
In my opinion, a Seq object is essentially a string, but with some added
functionality that are useful in biological contexts. Currently, this is
limited to specifying an alphabet. Personally, I never used such an alphabet,
so in practice I prefer using a simple string instead of a Seq object.
However, if we extend its functionality, I think a Seq class can be useful
enough to warrant its existence in Biopython.
In short, to my mind a Seq object should have the following properties:
1) A Seq object is basically a string, so it should behave as if it were
subclassed from string.
2) As a result, functions that have a sequence as an argument, but don't need
the added features of a Seq object, should work with strings as well as Seq
objects.
3) The sequence should be mutable, so that we won't need a separate MutableSeq
class. This also implies that a Seq class cannot subclass from string, since
strings are not mutable.
4) Currently, Seq objects have an associated alphabet; SeqRecord objects have
annotations, dbxrefs, a description, features, id, and name. I think a new Seq
object should have both, so that we can avoid having both a Seq and a SeqRecord
class. Of course, some or all of these fields can remain None.
5) A Seq class should have methods that one expects from a sequence class, in
particular complement(), reverse_complement(), perhaps a modified count() that
can ignore case.
With respect to 3), we'd probably have to write such a Seq class in C.
The end result would be a Seq class that actually has some benefit to the user,
without requiring its use when a string suffices, and avoids having three
classes (Seq, MutableSeq, SeqRecord) for essentially the same thing.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chris.lasher at gmail.com Wed Oct 31 05:55:03 2007
From: chris.lasher at gmail.com (Chris Lasher)
Date: Wed, 31 Oct 2007 01:55:03 -0400
Subject: [Biopython-dev] Biopython SVN Transition
In-Reply-To: <47278655.8090300@maubp.freeserve.co.uk>
References: <128a885f0710300717p7d91a4adjfaddc9c496974e67@mail.gmail.com>
<47278655.8090300@maubp.freeserve.co.uk>
Message-ID: <128a885f0710302255y4c34ac8axa48f48b253d5854a@mail.gmail.com>
On 10/30/07, Peter wrote:
> I was wondering if anyone would start suggesting moving to git or
> something else ;)
I tried Git and didn't like it. Bazaar suits me much better, and it
even has support for SVN repositories with bzr-svn. Git is not truly
cross-platform. It performs terribly on Windows. This left me looking
at Mercurial (Hg) and Bazaar (bzr). I liked the direction that Bazaar
was moving in and their emphasis on testing with real unit/regression
tests. For those interested, you can see some of the "literature" I
read through on my del.icio.us page:
http://del.icio.us/gotgenes/dscm
> Another thought; will existing developer accounts "just work" on the SVN
> system? Also do you (Chris) have CVS access, and if not do you need or
> want it?
The existing developer accounts will "just work" because they're going
to do SVN over SSH. I have SSH access on the machine and CVS access as
well. Thanks for checking.
> > Also, I need to know what is needed from me in terms of helping all
> > the devs migrate to SVN. I produced a screencast series on Subversion
> > at http://showmedo.com/videos/series?name=bfNi2X3Xg and there is a
> > transition guide at http://svnbook.red-bean.com/en/1.4/svn.forcvs.html
>
> Sadly that didn't play with gnash 0.8, and I don't have Adobe's Flash
> plugin working on my 64bit Ubuntu. I'll have to check that out on
> Windows later in the week :)
Bummer! Does nspluginwrapper not work?
> If you are able to field any queries on the mailing list, that would
> probably be fine.
I'd be happy to do that.
Should this page be renamed to SVN to be in the same line as tho CVS page?
> > Would providing links to these on the wiki be sufficient?
>
> If you could look after that aspect of the wiki, that would be great.
At some point I had started this:
http://biopython.org/wiki/Subversion_migration
> > What further information would you like to know? Subversion is not a
> > radical departure from CVS and many of the commands are a one-to-one
> > mapping. The biggest difference is commits occur for the whole
> > repository, not on a per-file basis, and directories are tracked, as
> > well.
>
> The fact the CVS and SVN are relatively similar is probably one reason
> why no-one has raised any real objections to the move.
>
> > Let's get a discussion on this and set a date soon.
>
> In terms of timing, how long do you/the OBF guys expect the transfer to
> take? And would they prefer to do this over a weekend or mid week?
Not sure, let me ask Jason Stajich.
> Barring any problems with Biopython 1.44 which would force us to do
> another release in the very short term, I guess in the next fortnight is
> reasonable (especially if we only expect a couple of days downtime).
I think we could expect less than a full day downtime.
> Of course, I personally want to start working on the Seq objects and
> alignments - and Tiago wants to get back to his Population Genetics module.
By all means, continue using CVS until I get a firm date for the
Biopython Devs. Even if you have uncommitted changes when the CVS
server goes down, you can simply copy the files to your checked out
copy of the SVN repository and continue as is.
> P.S. Would you or any of the people doing the transition be able to sort
> out bug 2363?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2363
That's a very good question. I wonder if cvs2svn is capable of picking
up those errors in commits and choose the proper format. I had trouble
getting a hold of an expert who could tell me how to identify files
committed as binary files, and how to change that to text (or vice
versa). I should send an email to the Subversion mailing list,
perhaps, or the CVS list if it's still active. I'll also check to see
if Jason knows.
From bugzilla-daemon at portal.open-bio.org Wed Oct 31 09:54:24 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 31 Oct 2007 05:54:24 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200710310954.l9V9sOw7014572@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-10-31 05:54 EST -------
> In short, to my mind a Seq object should have the following properties:
> 1) A Seq object is basically a string, so it should behave as if it were
> subclassed from string.
I agree, where possible the Seq object should act like a string.
In particular str(my_seq) should give the full string.
> 2) As a result, functions that have a sequence as an argument, but don't
> need the added features of a Seq object, should work with strings as well
> as Seq objects.
Again, I agree. I've doubled checked this works for some of the recently
updated SeqUtils functionality. I would hope we get this "for free" once the
Seq object itself becomes more string like.
> 3) The sequence should be mutable, so that we won't need a separate
> MutableSeq class. This also implies that a Seq class cannot subclass from
> string, since strings are not mutable.
Why? Python strings are not mutable, and this isn't usually a problem.
Personally, I have never needed a mutable sequence and have only ever used them
in test cases. Having the basic Seq non-mutable means we can leverage existing
string functionality and optimizations.
Also writing a new mutable sequence in C seems like a bit maintainance load in
the long term (and may complicate the cross platform build process). Surely we
can get good enough performance via the array of characters route currently
used?
On related remark: The fact that the current MutableSeq methods like
reverse_complement() work in-situ rather than returning a new object makes
switching between the Seq and MutableSeq fiddly.
> 4) Currently, Seq objects have an associated alphabet; SeqRecord objects
> [also] have annotations, dbxrefs, a description, features, id, and name.
> I think a new Seq object should have both, so that we can avoid having both
> a Seq and a SeqRecord class. Of course, some or all of these fields can
> remain None.
I don't really see the benefit over the current scheme. I'm happy with the
division between Seq and SeqRecord, but we could go for SeqRecord being a more
annotated subclass of the Seq class. This would be similar to Bioperl's Seq,
PrimarySeq, or RichSeq objects.
Something I do want to add is splicing for SeqRecords, which would return a new
SeqRecord with sensible name/id/description. I think for this to really be
useful we need to add "per residue annotation", such as lists or strings of
information the same length as the sequence (e.g. predicted secondary
structure, or sequencing quality scores) which would also get spliced when
splicing a SeqRecord.
> 5) A Seq class should have methods that one expects from a sequence class,
> in particular complement(), reverse_complement(), perhaps a modified count()
> that can ignore case.
Usually mixed case sequences are used for a reason, and the user may need both
case sensitive counts and case insensitive counts. I would keep .count() case
sensistive like a real string, and suggest .upper().count() as a simple
workarround for case in-sensitive counts.
Plus the Seq object should have methods for forward and back transcription and
translation, see Bug 2381
A more drastic change we could consider is getting rid of the alphabet as an
explicit property, and having ProteinSeq, NucleotideSeq, DnaSeq and RnaSeq
(decorator/sub)classes which would have only the relevant biological sequence
methods. We would lose the expected "letters" feature of the alphabet, but I
don't think this is really helpful at the moment because the Seq class does not
enforce it.
Otherwise I would advocate when creating a Seq object (or editing a MutableSeq
object) the new letters should be screened against self.alphabet.letters (if
present).
On balance I favour making gradual changes which don't change the current
scheme (Seq with Alphabet property; SeqRecord with Seq property). Anything
more drastic might best be pursued on a new branch which could become Biopython
2.0
P.S. We should try not to implicitly assume that the elements in a sequence are
single letters? What about when working with protein structures which contain
modified amino acids (with defined three letter codes) which do not map back to
single letters.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.