From bugzilla-daemon at portal.open-bio.org Tue Jul 1 04:36:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 04:36:33 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST -------
Just uploaded a new Nexus.py to CVS.
First, the taxlabels command in a taxa block is now ignored. For a standard
nexus file, taxon labels are in the matrix, and a taxon block is irrelevant.
The only exception are transposed matrices, which are not supported by Nexus.py
anyway.
Without the added confusion of a separate taxlabels command, it is now fairly
easy to deal with duplicate names. Both self.taxlabels and self.matrix now
carry the same set of unique taxon names.
All example files seem to work fine for me. unless I hear otherwise, I close
this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:01:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:01:29 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST -------
Does this mean that there will be no way to see the original non-unique names
from within Bio.Nexus? I agree they are a pain, but it would be nice to
preserve them.
I haven't read the Nexus specs (restricted article), but does this comment on
the issue of repeated identifiers?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:13:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:13:02 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST -------
Yes, the original non-unique names are currently not preserved. It would be
fairly easy to keep them, if desired.
The NEXUS specs (Maddison et al.) state that unique names "should be avoided if
this might cause ambiguity", which imho they always do. But I experienced that
sometimes names become identical due to truncation etc, so I needed a way to
deal with it instead of just throwing an error.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:16:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 09:16:57 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST -------
I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
everyone (instead creating their own uppercase-lowercase variants of those
terribly complicated biopython alphabet classes), and easy to change for all
other modules if lowercase-uppercase is what they want (or need).
Nexus.py and Phd.py certainly need to allow lowercase characters, as this is
very common.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:56:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:56:03 -0400
Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Summary: Support for simple "tab" format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Requested on the mailing list by Giovanni Marco Dall'Olio:
http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html
See BioPerl:
http://www.bioperl.org/wiki/Tab_sequence_format
Suggested implementation to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:57:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:57:26 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST -------
Created an attachment (id=962)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view)
New file Bio/SeqIO/TabIO.py
Treats the first field as the record's .id (and .name)
Treats the second field as the record's sequence.
When writing, uses only record.id and record.seq
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 12:00:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 12:00:59 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST -------
Created an attachment (id=963)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view)
Patch to add the "tab" format to Bio.SeqIO and update the unit test output
The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format.
Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the
updated output from test_SeqIO.py (trivial to regenerate with "python
run_tests.py -g test_SeqIO.py").
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 06:33:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 11:33:35 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Hello Michiel et al.,
I've already added a few if statements to the end of
Bio.Entrez._open() to catch a few errors I'd observed, and I've just
found another example:
>>> from Bio import Entrez
>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
'\n'
>>> Entrez.efetch("nucleotide", id="fiction").read()
'\n'
This seems to happen for any invalid identifier. Are you happy for me
to check for this as an error too? Are there any valid reasons to get
back an empty dataset like this?
Also, I was wondering if we should raise a ValueError rather than
IOError if we are fairly sure the problem is with the arguments rather
than the network or the sever being unavailable.
Peter
From sdavis2 at mail.nih.gov Wed Jul 2 07:18:43 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 Jul 2008 07:18:43 -0400
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote:
> Hello Michiel et al.,
>
> I've already added a few if statements to the end of
> Bio.Entrez._open() to catch a few errors I'd observed, and I've just
> found another example:
>
>>>> from Bio import Entrez
>>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
> '\n'
>>>> Entrez.efetch("nucleotide", id="fiction").read()
> '\n'
>
> This seems to happen for any invalid identifier. Are you happy for me
> to check for this as an error too? Are there any valid reasons to get
> back an empty dataset like this?
If the ability to use history is added, then an empty dataset could be
a valid return after an empty search. For id-based-searches, I'm not
sure I would raise an error for an empty set being returned anyway.
Just my $0.02.
Sean
> Also, I was wondering if we should raise a ValueError rather than
> IOError if we are fairly sure the problem is with the arguments rather
> than the network or the sever being unavailable.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From biopython at maubp.freeserve.co.uk Wed Jul 2 07:34:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 12:34:32 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
<264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com>
>> This seems to happen for any invalid identifier. Are you happy for me
>> to check for this as an error too? Are there any valid reasons to get
>> back an empty dataset like this?
>
> If the ability to use history is added, then an empty dataset could be
> a valid return after an empty search. ...
Bio.Entrez has always supported the history, its just up to the user
to take advantage of it. I've included an example in the tutorial to
explain how to do this, cut and pasted below:
from Bio import Entrez
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16",
usehistory="y", email="history.user at example.com")
search_results = Entrez.read(search_handle)
search_handle.close()
gi_list = search_results["IdList"]
count = int(search_results["Count"])
assert count == len(gi_list)
session_cookie = search_results["WebEnv"]
query_key = search_results["QueryKey"]
#Now use the history session cookie and query key to download the
results in batchs
batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0,count,batch_size) :
end = min(count, start+batch_size)
print "Going to download record %i to %i" % (start+1, end)
fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta",
retstart=start, retmax=batch_size,
webenv=session_cookie, query_key=query_key,
email="history.user at example.com")
data = fetch_handle.read()
fetch_handle.close()
out_handle.write(data)
out_handle.close()
Feedback on the tutorial or the example is of course welcome.
> For id-based-searches, I'm not sure I would raise an error for an empty
> set being returned anyway.
>
> Just my $0.02.
I was wondering about this kind of thing... maybe some more testing of
these kinds of examples would be in order.
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:03:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:03:36 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Hi all,
Do any of you have any comments or feedback on this suggested new
"simple tab separated" format for Bio.SeqIO? To match BioPerl I plan
on calling it the "tab" format - see below.
Any real world example files would be good for the test suite.
One nice thing is it adds another output format, something we're a bit
short of in Bio.SeqIO with only fasta and some alignment formats (now
handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip).
Peter
---------- Forwarded message ----------
From: Peter
Date: Tue, Jul 1, 2008 at 5:06 PM
Subject: Re: [BioPython] Sequence from Fasta
To: dalloliogm at gmail.com
Cc: biopython at biopython.org
Giovanni wrote:
> yes, I think it will be useful to implement.
> I know of people who have written a customized fasta2tab script and
> use it quite frequently, so it would be good to support such a task.
> As you said before this format is commonly used in combination with
> grep/gawk scripts.
I've gone for the simple option about how to parse the first field, its used
as the record identifer (.id) and name only (nothing clever). Here is my
suggested code, which you are welcome to download and try out.
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
If you want to try this yourself you'll need to download the new file
TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to
tell it about the new format (two new lines, see patch).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:21:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:21:29 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Hello again Frank,
As Biopython's NEXUS expect, I've got a couple of hopefully trivial
questions about the format, which connect to how best to handle it the
Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/AlignIO
My short questions are:
Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
with more than one #NEXUS line)?
Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
If the answer to either of those is a "yes", then any example files
you could contribute would be very helpful.
I have a more complicated question too, which would help me to resolve Bug 2227:
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
Q3: Given a generic Alignment object (e.g. from one of the other
parsers), can I construct a corresponding Nexus object where the
aligned sequences are used for the matrix? If so, how?
Thank you,
Peter
From mjldehoon at yahoo.com Wed Jul 2 09:30:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file.
Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record.
In that case, Bio.SeqIO looks like a more suitable place for this parser.
The user would see something like this:
>>> from Bio import SeqIO
>>> handle = open("mydatafile.txt")
>>> records = SeqIO.parse(handle, "ig")
>>> records.comment
"This is the overall comment"
>>> for record in records:
# ... record is a SeqRecord.
Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator.
Any objections, anybody?
--Michiel
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:48:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:48:31 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote:
> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
Just to be upfront, I'm not familiar with this format, but I've had a
look at the examples.
> In this format, each sequence has a name and comments, and in addition there can
> also be an overall comment to the file.
OK. This is also the case in other file formats, for example GenBank
files can have free format text file header at the start but we ignore
this.
How would you separate the file header comment from the first record
comment? Some files include what looks like a file header but the
lines all seem to start with "; ". Maybe look for "; LOCUS..."?
Given the whole comment seems to be free format I don't think this is
very nice.
On the other hand, some of the sample inputs includes a number of
lines starting ";; Modified by ..." which would be easy to separate
(one semi colon versus two semi colons). These are clearly file-level
header lines, rather than being part of the first record.
> Currently the parser in Bio.IntelliGenetics stores this information in
> Bio.IntelliGenetics.Record.Record objects (one record per sequence; the
> overall comment is inadvertently added to the first sequence in the file). I
> think it makes more sense to use a SeqRecord for that, and to deprecate
> Bio.IntelliGenetics.Record.Record.
If all the data extracted by the Bio.IntelliGenetics parser could be
dealt with using the SeqRecord parser added to Bio.SeqIO, then yes
deprecating Bio.IntelliGenetics sounds fine.
> In that case, Bio.SeqIO looks like a more suitable place for this parser.
> The user would see something like this:
>>>> from Bio import SeqIO
>>>> handle = open("mydatafile.txt")
>>>> records = SeqIO.parse(handle, "ig")
>>>> records.comment
> "This is the overall comment"
>>>> for record in records:
> # ... record is a SeqRecord.
I see you are using "ig" as the format name, matching EMBOSS. Good :)
http://emboss.sourceforge.net/docs/themes/seqformats/ig
> Because of the overall comment, SeqIO.parse cannot simply return a
> generator function. It must return a full-fledged class, but one with an iterator.
Not necessarily. We can still use a simple generator function and either throw
away the header comment, or included it with the first record (or even
with every
record). If you did create an iterator class, would you make the
header available
as a property of the iterator?
Given the apparently fuzzy boundary between the file header and the first record
header, I would just opt to treat it all as a comment for the first
record. And use a
simple generator function.
Peter
From fkauff at biologie.uni-kl.de Wed Jul 2 10:01:01 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Wed, 02 Jul 2008 16:01:01 +0200
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Message-ID: <486B8A1D.8090806@biologie.uni-kl.de>
Hi Peter,
Peter wrote:
> Hello again Frank,
>
> As Biopython's NEXUS expect, I've got a couple of hopefully trivial
> questions about the format, which connect to how best to handle it the
> Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
> http://biopython.org/wiki/AlignIO
>
> My short questions are:
>
> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
> with more than one #NEXUS line)?
>
As far as I know: no. #NEXUS just indicates the file being a NEXUS file,
the concept of "records" is not part of a nexus file
> Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
>
>
I just had a quick look at the old Maddison et al. introductory paper of
Nexus, and it says that "although the nexus standard does not impose
constraints on the number of blocks, particular programs will". I don't
know of any program that would read more than one data block and keep
both of them.
> If the answer to either of those is a "yes", then any example files
> you could contribute would be very helpful.
>
> I have a more complicated question too, which would help me to resolve Bug 2227:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2227
>
> Q3: Given a generic Alignment object (e.g. from one of the other
> parsers), can I construct a corresponding Nexus object where the
> aligned sequences are used for the matrix? If so, how?
>
Hmmm - not really. Nexus.py does not support "empty" nexus class objects
that could be filled with data (just tried) . But it would actually be a
nice thing to have. I'll put this on my to do list.
Cheers,
Frank
> Thank you,
>
> Peter
>
>
'
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:01:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:01:13 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com>
Hello again,
Interestingly the IntelliGenetics looks the same as the MASE alignment
file format:
http://www.bioperl.org/wiki/Mase_multiple_alignment_format
On the other hand, the EMBOSS example is clearly not a multiple
sequence alignment:
http://emboss.sourceforge.net/docs/themes/seqformats/ig
Adding the parser to Bio.SeqIO would let us read in alignments too via
Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try
and convert the SeqRecords into an Alignment).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:06:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:06:40 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com>
Forgot to send this to the list, another point about IntelliGenetics vs MASE
---------- Forwarded message ----------
From: Peter
Date: Wed, Jul 2, 2008 at 3:05 PM
Subject: Re: [Biopython-dev] Bio.IntelliGenetics
To: mjldehoon at yahoo.com
> How would you separate the file header comment from the first record
> comment? Some files include what looks like a file header but the
> lines all seem to start with "; ". Maybe look for "; LOCUS..."?
> Given the whole comment seems to be free format I don't think this is
> very nice.
>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
I found an old link I had added on the wiki page for SeqIO development,
http://pbil.univ-lyon1.fr/help/formats.html
This clearly describes MASE format format s having (optional) header
lines as starting with two semi colons. But are MASE and
IntelliGenetics the same thing?
Petet
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:12:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:12:48 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
<486B8A1D.8090806@biologie.uni-kl.de>
Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com>
>> My short questions are:
>>
>> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
>> with more than one #NEXUS line)?
>
> As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the
> concept of "records" is not part of a nexus file
OK, thank you.
>> Q2: Can a NEXUS record/file contain more than one alignment (matrix
>> block)?
>
> I just had a quick look at the old Maddison et al. introductory paper of
> Nexus, and it says that "although the nexus standard does not impose
> constraints on the number of blocks, particular programs will". I don't know
> of any program that would read more than one data block and keep both of
> them.
So that is a "yes in theory", but it doesn't sound worth worrying about.
>> Q3: Given a generic Alignment object (e.g. from one of the other
>> parsers), can I construct a corresponding Nexus object where the
>> aligned sequences are used for the matrix? If so, how?
>
> Hmmm - not really. Nexus.py does not support "empty" nexus class objects
> that could be filled with data (just tried) . But it would actually be a
> nice thing to have. I'll put this on my to do list.
Thanks,
Peter
From mjldehoon at yahoo.com Wed Jul 2 10:15:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon.
http://pbil.univ-lyon1.fr/help/formats.html
> If you did create an iterator class, would you make the
> header available as a property of the iterator?
I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class.
---Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:38:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools
in run_tests.py
In-Reply-To:
Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2524
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Documentation |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST -------
Filing under "Unit Tests".
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:39:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:39:22 -0400
Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test
suite)
In-Reply-To:
Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2469
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Main Distribution |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST -------
Filing under "Unit Tests"
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:56:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:56:00 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<529945.38158.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com>
>> If you did create an iterator class, would you make the
>> header available as a property of the iterator?
>
> I am not sure what you mean by a property of the iterator. I was
> thinking to simply add a field to the class.
Adding the file header field to the iterator class? You could do I suppose.
Right now all the Bio.SeqIO parsers use generator functions (although
not in AlignIO), although I have no objection to returning iterator
classes instead.
I don't really like the idea of Bio.SeqIO parsers returning anything
other than SeqRecord objects - even if it is indirectly via a richer
iterator object. I see the Bio.SeqIO as a common unified API, and the
downside is sometimes extra data doesn't really fit.
If there really is some important meta-data in a file format that
applies to all the records, then it cannot easily be represented in
the Bio.SeqIO system except as annotation added to every single
SeqRecord. e.g. Add the header to the annotations dictionary under
"file-header" or something.
Peter
From mjldehoon at yahoo.com Wed Jul 2 11:29:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com>
--- On Wed, 7/2/08, Peter wrote:
> I found an old link I had added on the wiki page for SeqIO
> development,
> http://pbil.univ-lyon1.fr/help/formats.html
>
> This clearly describes MASE format format s having
> (optional) header
> lines as starting with two semi colons. But are MASE and
> IntelliGenetics the same thing?
It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments.
So let's do the following:
For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway.
Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments.
Does that sound OK?
--Michiel.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:28:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:28:19 -0400
Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
Summary: Support for PIR / NBRF format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also
supports "nbrf" as an alternative.
http://bioperl.org/wiki/PIR_sequence_format
Patch to follow, a new parser and writer in plain python. The existing Martel
based parser in Bio.NBRF could then be deprecated.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:30:28 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST -------
Created an attachment (id=964)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view)
New file Bio/SeqIO/PirIO.py
Note that the details of storing the sequence type may need tweaking for better
agreement with the de-facto conventions from the GenBank parser.
As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py
valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \
"DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \
"RC":"circular RNA", "N3": "transfer RNA", "N1": "other"
}
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 13:37:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 13:37:05 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST -------
My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py
Also when recording a SeqRecord from a non-PIR input, we could try and guess
the sequence type. The alphabet itself is one clue. GenBank and EMBL files
should also record if the sequence is linear or circular, as well as a sequence
type.
For proteins, I don't see how to decide between P1 and F1 though (complete
protein vs protein fragment). Maybe default to F1?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 15:51:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 15:51:49 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST -------
Even better docs:
http://blog.doughellmann.com/2007/07/pymotw-subprocess.html
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:24:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:24:32 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST -------
Hi Frank,
I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the
original taxon order with and without the name changes.
n.unaltered_taxlabels = Original names in order with duplicates
n.original_taxon_order = Modified names in order, suitable as keys to n.matrix
I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing
the original name and the modified unique name as the SeqRecord's name and id
properties.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:52:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:52:08 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST -------
Hi Peter,
I'd strongly suggest to use self.taxlabels instead of
self.original_taxon_order. The latter is only for compatibility, and
original_taxon_order just maps taxlabels. Actually it might make sense to give
a deprecation warning if original_taxon_order is used, and it should be removed
in some future release.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:06:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:06:46 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #584 is|0 |1
obsolete| |
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST -------
(From update of attachment 584)
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it. This code it now obsolete.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:13:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:13:38 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST -------
Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view)
Bio/Nexus/Nexus.py handle support in write_nexus_data()
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it:
#Read in an alignment object, e.g. with Bio.AlignIO
from Bio import AlignIO
alignment = AlignIO.read(open("example.aln"), "clustal")
#Make a Nexus object
from Bio.Nexus import Nexus
handle = open("test.nex", "w")
n = Nexus.Nexus()
n.alphabet = alignment._alphabet
for record in alignment :
n.add_sequence(record.id, record.seq.tostring())
n.write_nexus_data(handle)
handle.close()
There are two problems with write_nexus_data(), firstly it doesn't accept a
StringIO handle (see also Bug 2454).
Secondly, if given a handle it closes it. This would break the above code, or
how I typically use StringIO.
This patch addresses these points.
Frank, are you happy for me to commit this change?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:02:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:02:30 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST -------
Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view)
Patch to Bio/AlignIO/NexusIO.py adding write support
This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment
4).
My method for constructing an empty DNA, RNA, or Protein Nexus object is
perhaps inelegant. This is required in order to setup the alphabet,
ambiguous_values and unambiguous_letters properties which otherwise default to
DNA.
Also note that the Nexus add_sequence() method does not seem to support
duplicated taxa names. Perhaps this method could update the
unaltered_taxlabels property and use the _unique_label method to cope with
duplicate names?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:08:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:08:26 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST -------
I have changed my use of original_taxon_order to just taxlabels (code now in
Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py).
I agree, adding a deprecation warning to the original_taxon_order get/set
functions would make sense.
P.S. Thanks for adding the unaltered_taxlabels property.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 4 04:11:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 09:11:06 +0100
Subject: [Biopython-dev] What happened to Biopython 1.46?
Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com>
We were recently talking about doing another release, but as you may
have noticed nothing has been announced.
Michiel devoted a good chunk of his weekend to preparing Biopython
1.46 and uploaded it to the servers on Sunday 29th. He didn't issue
an announcement email at the time due to the problem with the wiki
being read only (now fixed). However, on the Monday evening I
realised I'd done something really stupid in Bio.Data.CodonTable just
before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be
used whenever a translation table was requested by name. This change
has been reverted, and I've added further translation checks in
test_seq.py to avoid any similar issue in future.
So, while there is a Biopython 1.46, we're not going to advertise it
because the translation functionality is subtly wrong. However, it is
up on the website, and linked to with an errata statement.
Michiel will kindly try and prepare Biopython 1.47 soon... so please
hold off any big changes in CVS until then.
And I'm hearby publicly promising to treat him to dinner - hopefully
we'll be in the same country at the same time this year!
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:39:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:39:35 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST -------
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
> ...
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
Very nice. Go for it :-)
Cheers,
Frank
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
>
> #Read in an alignment object, e.g. with Bio.AlignIO
> from Bio import AlignIO
> alignment = AlignIO.read(open("example.aln"), "clustal")
>
> #Make a Nexus object
> from Bio.Nexus import Nexus
> handle = open("test.nex", "w")
> n = Nexus.Nexus()
> n.alphabet = alignment._alphabet
> for record in alignment :
> n.add_sequence(record.id, record.seq.tostring())
> n.write_nexus_data(handle)
> handle.close()
>
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:53:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:53:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #965 is|0 |1
obsolete| |
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST -------
(From update of attachment 965)
> > This patch addresses these points.
> >
> > Frank, are you happy for me to commit this change?
> >
>
> Very nice. Go for it :-)
>
Thanks Frank.
Checking in Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.17; previous revision: 1.16
done
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:56:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:56:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #966 is|0 |1
obsolete| |
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST -------
(From update of attachment 966)
There is slight problem with this patch on the alphabet selection (it uses
"dna" when it should use "rna").
I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until
after the next Biopython release.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:13:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:13:25 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040913.m649DPap027929@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST -------
(In reply to comment #5)
> Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details]
> Patch to Bio/AlignIO/NexusIO.py adding write support
>
> This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment
> 4).
>
> My method for constructing an empty DNA, RNA, or Protein Nexus object is
> perhaps inelegant. This is required in order to setup the alphabet,
> ambiguous_values and unambiguous_letters properties which otherwise default to
> DNA.
>
> Also note that the Nexus add_sequence() method does not seem to support
> duplicated taxa names. Perhaps this method could update the
> unaltered_taxlabels property and use the _unique_label method to cope with
> duplicate names?
>
Ok, I updated add_sequence and will commit the changes soon.
F
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:20:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:20:07 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST -------
(In reply to comment #9)
> >
> > Also note that the Nexus add_sequence() method does not seem to support
> > duplicated taxa names. Perhaps this method could update the
> > unaltered_taxlabels property and use the _unique_label method to cope with
> > duplicate names?
> >
> Ok, I updated add_sequence and will commit the changes soon.
>
Checking in biopython/Bio/Nexus/Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.18; previous revision: 1.17
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Jul 4 06:24:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in Bio/SeqIO/IgIO.py (based on
> the format name of "ig" used in EMBOSS).
OK.
> Would we then also deprecate Bio.IntelliGenetics?
Yes. Otherwise, it's replicated functionality.
> Do you want to make these changes, or should I?
Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know.
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be able to cope.
In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
--Michiel.
--- On Wed, 7/2/08, Peter wrote:
> From: Peter
> Subject: Re: [Biopython-dev] Bio.IntelliGenetics
> To: mjldehoon at yahoo.com
> Date: Wednesday, July 2, 2008, 12:11 PM
> > It may be that the link in
> Bio/IntelliGenetics/__init__.py actually does not pertain
> to
> > the IntelliGenetics format. Except for this link
> (which as you point out actually talks
> > about the MASE format, not the IntelliGenetics
> format), I have seen no description
> > elsewhere of these file-wide comments preceded by a
> double semi-colon in the
> > IntelliGenetics format. Even Biopython doesn't
> treat these consistently: The tests
> > for Bio.IntelliGenetics include comments with the
> double semi-colon, but the parser
> > doesn't treat them differently from
> sequence-specific comments.
>
> Maybe we should ask BioPerl if they distinguish between the
> IntelliGenetics and MASE formats?
>
> Looking back over the old mailing list, at the time they
> did think the
> two were the same:
> http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html
>
> > So let's do the following:
> > For the IntelliGenetics parser, do not look for double
> semi-colon comments. Only check
> > if the first character in a line is a semi-colon, and
> if so, treat it as a sequence-specific
> > comment. This is what Bio.IntelliGenetics currently
> does anyway.
> > Replace the parser class in Bio.IntelliGenetics by a
> generator function, and integrate it with
> > Bio.SeqIO.
>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in
> Bio/SeqIO/IgIO.py (based on the format name of
> "ig" used in EMBOSS).
> Would we then also deprecate Bio.IntelliGenetics?
>
> Do you want to make these changes, or should I?
>
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be
> able to cope.
>
> Peter
From biopython at maubp.freeserve.co.uk Fri Jul 4 10:31:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 15:31:55 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com>
References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
<36286.77119.qm@web62412.mail.re1.yahoo.com>
Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com>
>> Do you want to make these changes, or should I?
>
> Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead.
OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for
IntelliGenetics/MASE files using the format name "ig" to match EMBOSS.
The existing three sample files are now being used in test_SeqIO.py
and one of them also in test_AlignIO.py as well.
If anyone wants to scan over the code, I'd be delighted to have feedback.
Adding support for writing these files should be easy. Do you think
this is worth implementing?
Before we deprecate Bio.IntelliGenetics I suggest we ask on the
mailing list if anyone is using it.
> In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation
> from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but
> if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
I am ignoring the ";;" lines at the start of the file.
Peter
From mjldehoon at yahoo.com Sat Jul 5 04:24:41 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT)
Subject: [Biopython-dev] CVS freeze for release 1.47
Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out.
Thanks!
--Michiel.
From mjldehoon at yahoo.com Sat Jul 5 20:00:17 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT)
Subject: [Biopython-dev] Biopython release 1.47
Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com>
We are pleased to announce the release of Biopython 1.47.
This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules.
Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible.
--Michiel on behalf of the Biopython developers
From sbassi at gmail.com Sun Jul 6 15:53:54 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 6 Jul 2008 16:53:54 -0300
Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions,
is this a bug?
Message-ID:
NCBIStandalone changed in 1.46 due to bug #2508.
So this code that was working before, no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
The error trace is:
File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
line 1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'float' is not iterable
So I had to rewrite the code as:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation="1e-10", descriptions="1")
The problem is the function "_security_check_parameters", that assumes
that all arguments are strings.
Proposed solutions:
1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
2) Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:47:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:47:48 -0400
Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS
journals
In-Reply-To:
Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2447
------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST -------
Using Biopython release 1.47;
Bio.Entrez can parse the XML for this PMID:
>>> from Bio import Entrez
>>> PMID = "17238260"
>>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml')
>>> record = Entrez.read(handle)
>>>
Noel, can you use Bio.Entrez instead of Bio.EUtils?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:55:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:55:10 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST -------
Using Bio.Entrez in Biopython release 1.47:
>>> from Bio import Entrez
>>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['MedlineCitation']['Article']['AuthorList']
[{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'},
{u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'},
{u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'},
{u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta
C'}]
Noel, is this sufficient for your needs?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 07:12:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 07:12:26 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST -------
Thanks Michiel, but I found a workaround a day later so don't worry about me. I
was just letting you know about the bug...
Noel
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Jul 7 09:07:24 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Jul 2008 14:07:24 +0100
Subject: [Biopython-dev] NCBIStandalon not compatible with previous
versions, is this a bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com>
On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote:
> NCBIStandalone changed in 1.46 due to bug #2508.
> So this code that was working before, no longer works:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation=1e-10, descriptions=1)
>
> The error trace is:
>
> File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
> line 1986, in _security_check_parameters
> if ";" in value or "&&" in value :
> TypeError: argument of type 'float' is not iterable
>
> So I had to rewrite the code as:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation="1e-10", descriptions="1")
>
> The problem is the function "_security_check_parameters", that assumes
> that all arguments are strings.
>
> Proposed solutions:
>
> 1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
> 2) Modify line 1986 from:
> if ";" in value or "&&" in value :
> To this:
> if ";" in value or "&&" in str(value) :
I would say its a bug, and casting into a string on line 1986 looks
like the best fix. I won't be able to do this until tomorrow
afternoon at the latest - if you could file a bug that would be
helpful in case I forget ;)
Thanks
Peter
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 13:08:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 13:08:40 -0400
Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes
all arguments are strings
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
Summary: _security_check_parameters assumes all arguments are
strings
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
This code no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
Because new _security_check_parameters function assumes all blastall parameters
are string, but expectation and descriptions are float and int.
Proposed fix:
Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From sbassi at gmail.com Mon Jul 7 16:30:14 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 7 Jul 2008 17:30:14 -0300
Subject: [Biopython-dev] Alignment problem. bug?
Message-ID:
I would like to confirm that this is a bug ot not. If I get
confirmation, I would fill it in bugzilla.
With this code:
from Bio import Clustalw
from Bio.Clustalw import MultipleAlignCL
cline = MultipleAlignCL('foralig.txt')
cline.set_output("alig.aln")
alignment = Clustalw.do_alignment(cline)
I get:
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/ii.py", line 112, in
alignment = Clustalw.do_alignment(cline)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 125, in do_alignment
return parse_file(out_file, alphabet)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 47, in parse_file
generic_alignment = AlignIO.read(handle, "clustal")
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
line 299, in read
first = iterator.next()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
line 169, in next
raise ValueError("Could not parse line:\n%s" % line)
ValueError: Could not parse line:
I tested with Biopython 1.47 and 1.46 with the input file:
http://www.pastecode.com.ar/f44f28b41 (download at
http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
The clustal program is running because I see in the disk its output
(posted here: http://www.pastecode.com.ar/f275a5475). It seems it
fails to parse it.
I also tested in an older version (I guess it is 1.44) and it works
OK. So I think the problem was introduced in 1.46.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:41:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:41:02 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST -------
Included a float in the unit test for _security_check_parameters() added in Bug
2508:
Tests/test_NCBIStandalone.py revision: 1.15;
Fixed the string assumption in:
Bio/Blast/NCBIStandalone.py revision: 1.74;
Note that in your suggested fix Sebastian, both the "in" expressions need
casting to a string.
Thanks for reporting this!
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 04:51:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 09:51:31 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
> I would like to confirm that this is a bug ot not. If I get
> confirmation, I would fill it in bugzilla.
It does look like a bug to me...
> With this code:
>
> from Bio import Clustalw
> from Bio.Clustalw import MultipleAlignCL
>
> cline = MultipleAlignCL('foralig.txt')
> cline.set_output("alig.aln")
> alignment = Clustalw.do_alignment(cline)
>
> I get:
>
> Traceback (most recent call last):
> File "/mnt/hda2/py252/bin/ii.py", line 112, in
> alignment = Clustalw.do_alignment(cline)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 125, in do_alignment
> return parse_file(out_file, alphabet)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 47, in parse_file
> generic_alignment = AlignIO.read(handle, "clustal")
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
> line 299, in read
> first = iterator.next()
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
> line 169, in next
> raise ValueError("Could not parse line:\n%s" % line)
> ValueError: Could not parse line:
>
>
> I tested with Biopython 1.47 and 1.46 with the input file:
> http://www.pastecode.com.ar/f44f28b41 (download at
> http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
> The clustal program is running because I see in the disk its output
> (posted here: http://www.pastecode.com.ar/f275a5475). It seems it
> fails to parse it.
>
> I also tested in an older version (I guess it is 1.44) and it works
> OK. So I think the problem was introduced in 1.46.
For Biopython 1.46+ I switched the Bio.Clustalw parser to internally
call Bio.AlignIO, so one thing you could try is reverting
Bio/Clustalw/__init__.py to the older version (e.g. that shipped with
Biopython 1.45).
You haven't said which version of the ClustalW tool you are using -
maybe 2.0? If so, there could be a subtle change in the output
format since 1.83. If you could run the tool by hand and share the
output that would be helpful to try and track down this issue.
I don't seem to have any version of ClustalW installed on my current
machine, so it will take me a little longer to reproduce this here.
Could you file a bug please, and attach the example input and the
output when run by hand at the command line.
Thanks,
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:52:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:52:06 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST -------
Forgot to mark this as fixed - sorry for the extra email!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 07:02:37 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 12:02:37 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote:
> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
>> I would like to confirm that this is a bug ot not. If I get
>> confirmation, I would fill it in bugzilla.
>
> It does look like a bug to me...
I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
files where the first line of the consensus was blank (and would
probably affect both Clustal W 1.83 too).
I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Could you update this file and re-test please Sebastian? Also, may I
add a test alignment file based on your example to CVS please?
Thanks,
Peter
From mjldehoon at yahoo.com Tue Jul 8 08:47:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT)
Subject: [Biopython-dev] Bio.Sequencing
Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Hi everybody,
Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
Just to make sure that I am not treading on other people's work.
--Michiel
From fkauff at biologie.uni-kl.de Tue Jul 8 09:12:39 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 08 Jul 2008 15:12:39 +0200
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <487367C7.2050702@biologie.uni-kl.de>
Hi all,
Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
>
Not me. Green lights from my side.
Frank
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
>
>
> --Michiel
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From biopython at maubp.freeserve.co.uk Tue Jul 8 10:36:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 15:36:43 +0100
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com>
On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
My only comment is watch out for the fact that Bio.SeqIO is now
calling Bio.Sequencing for the "ace" and "phd" formats.
On a related note, I'd had some ideas for making the Ace parser more
user friendly by further extending the doc strings and defining
__str__ or __repr__ methods for some of the "line type classes" which
otherwise must be explored by using dir() to discover the properties.
I haven't actually done any work on this yet though.
Peter
From sbassi at gmail.com Tue Jul 8 11:38:29 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 12:38:29 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
> Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
> files where the first line of the consensus was blank (and would
> probably affect both Clustal W 1.83 too).
Yes, I used ClustalW 1.83
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
> Could you update this file and re-test please Sebastian? Also, may I
> add a test alignment file based on your example to CVS please?
Ok, I will test it today. You can use my file or any possible derivation of it.
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From biopython at maubp.freeserve.co.uk Tue Jul 8 11:56:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 16:56:20 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com>
>> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
>> Could you update this file and re-test please Sebastian? Also, may I
>> add a test alignment file based on your example to CVS please?
>
> Ok, I will test it today. You can use my file or any possible derivation of it.
Thanks - I have added a two sequence version of your example as
Tests/Clustalw/odd_consensus.aln
Peter
From sbassi at gmail.com Tue Jul 8 12:52:13 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 13:52:13 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Just to confirm that it works now. Thank you!
Best,
SB.
From biopython at maubp.freeserve.co.uk Wed Jul 9 07:11:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 12:11:16 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Now that Biopython 1.47 is out, are there any comments/objections to
my committing this to CVS?
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Thanks,
Peter
P.S. Any real world example files would be good for the test suite.
From lpritc at scri.ac.uk Wed Jul 9 08:14:04 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Wed, 09 Jul 2008 13:14:04 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID:
Only that you might want to consider Axon Text File format as a
self-describing tab-separated format which would facilitate storage and
recovery of all attributes of a sequence. There's a spec here:
http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
On 09/07/2008 12:11, "Peter" wrote:
> Now that Biopython 1.47 is out, are there any comments/objections to
> my committing this to CVS?
>
> Bug 2533 - Support for simple "tab" format in Bio.SeqIO
> http://bugzilla.open-bio.org/show_bug.cgi?id=2533
>
> Thanks,
>
> Peter
>
> P.S. Any real world example files would be good for the test suite.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
SCRI, Invergowrie, Dundee, DD2 5DA.
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
DISCLAIMER:
This email is from the Scottish Crop Research Institute, but the views
expressed by the sender are not necessarily the views of SCRI and its
subsidiaries. This email and any files transmitted with it are confidential
to the intended recipient at the e-mail address to which it has been
addressed. It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this
confidentiality and you must not use, disclose, copy, print or rely on this
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the
name of the sender and delete the email from your system.
Although SCRI has taken reasonable precautions to ensure no viruses are
present in this email, neither the Institute nor the sender accepts any
responsibility for any viruses, and it is your responsibility to scan the email
and the attachments (if any).
From biopython at maubp.freeserve.co.uk Wed Jul 9 08:30:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 13:30:26 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com>
On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote:
> Only that you might want to consider Axon Text File format as a
> self-describing tab-separated format which would facilitate storage and
> recovery of all attributes of a sequence. There's a spec here:
>
> http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
>
Its an interesting and flexible file format, but I don't see any
standard column name for "sequence" which would be of particular
interest from the point of view of the Bio.SeqIO module. If there is
a de-facto convention for storing sequence data in an Axon Text File,
then we could adopt this within Bio.SeqIO. Otherwise, I think any
Axon Text File parser added to Biopython would have to be of much more
general nature (and not part of Bio.SeqIO).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 9 09:03:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 14:03:16 +0100
Subject: [Biopython-dev] Simple __getitem__ for Alignments
Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com>
Now that the latest release is out (Biopython 1.47), Bio.AlignIO
should start to get used more. I anticipate more people getting
frustrated with the current Alignment object, and would like to make
another baby-step in improving it.
I'd like to add a minimal __getitem__ method, as described in Bug 1944
comment 15,
http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15
> def __getitem__(self, index) :
> """Access part of the alignment.
>
> You can access a row of the alignment as a SeqRecord using an integer
> index (think of the alignment as a list of SeqRecord objects here):
>
> first_record = my_alignment[0]
> last_record = my_alignment[-1]
>
> Right now, this is the ONLY indexing operation supported. The
> use of two indices and splice notation to extract a sub-alignment,
> row, column or letter is under discussion for a future update."""
> if isinstance(index, int) :
> #e.g. result = align[x]
> #Return a SeqRecord
> return self._records[index]
> else :
> raise TypeError, "Not currently supported, but may be in future."
>From the discussion on Bug 1944, this doesn't seem to be contentious -
the debate is about more advanced splicing operations.
I'd also like to add a __len__ method which would return the number of
SeqRecord objects (i.e. the number of rows). This would then let the
alignment be treated very much like a read-only list of SeqRecord
objects. Remember, we can already iterate over the rows in the
alignment as SeqRecord objects.
Any comments?
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:21:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:21:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST -------
(In reply to comment #16)
I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
to the previous parser.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:37:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:37:44 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST -------
Michiel,
while you're at it - could you update my email in the source as well? And
Cymon's email is now cy at cymon.org. Thanks!
Frank
(In reply to comment #20)
> (In reply to comment #16)
> I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
> to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
> to the previous parser.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:38:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:38:18 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST -------
In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've
also update Bio.SeqIO.PhdIO in CVS (good).
I would suggest you add yourself to the copyright statement for this module,
and add some doc string entries to the new read and parse functions. I haven't
looked over the details of the new code (other than confirming test_Phd.py and
test_SeqIO.py seem happy).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 10:28:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 10:28:36 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST -------
(In reply to comment #21)
> Michiel,
>
> while you're at it - could you update my email in the source as well? And
> Cymon's email is now
I have updated your address, but I'd prefer hold off on Cymon's without his
direct permission -- spammers are watching too, you know.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:33:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:33:42 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST -------
OK, so my old code not yet converted to biopython-1.47 gives me:
_textframe = blast.blast_and_htmlize(_query_sequence, _usermode,
upload_temp_path, blast_path, uri, _align_view, _matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in
blast_and_htmlize
_blast_out, _error_info, _blast_file = blastall(blast_path + targetdb,
query_sequence, upload_temp_path, mode='sequence', align_view=align_view,
matrix=matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize,
gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands,
alignments=_alignments, descriptions=_descriptions, expectation=_expectation,
align_view=align_view)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1620, in blastall
_security_check_parameters(keywds)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'int' is not iterable
It turns out I am passing in:
{'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4,
'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999}
I don't think it makes sense to require users to pass strings instead of
numbers to the function.
While looking into the _security_check_parameters() I think you should also
check for "||" - the logical OR as interpreted by shell and redirections ">"
and "<".
FIX:
-if ";" in value or "&&" in value:
+if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in
str(value) or "<" in str(value):
My apologies that I did not test earlier.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:38:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:38:08 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST -------
Don't know if you want to leave in the back-door to pass in another argument
with its value. If not, prevent spaces as well. Values never contain spaces
unless wrapped by single or double-quotes. I find it perfectly legal to tell
blastall:
-d "/some/db /another/db /yet/another" to search over three databases at once.
It seems it does not reflect -d specified 3 times on its command-line.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 16:12:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 16:12:40 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST -------
The issue with non-string arguments (e.g. float or integers) was reported by by
Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was
after the release of Biopython 1.47.
As you've demonstrated there are valid reasons to want to include spaces. I
would rather not add a check which requires lots of special casing.
I'm leaving this bug open to consider extending _security_check_parameters() to
prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds
reasonable. A third opinion wouldn't hurt of course!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 06:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 06:30:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST -------
> (In reply to comment #21)
> > Michiel,
> >
> > while you're at it - could you update my email in the source as well? And
> > Cymon's email is now
>
> I have updated your address, but I'd prefer hold off on Cymon's without his
> direct permission -- spammers are watching too, you know.
>
Contacted Cymon, reply below:
Hi Frank,
...
>
> Do you want your email address updated in the ace/phd parser code? Or
> removed (just the email, not the name, of course)? Don't know if you follow
> biopython-dev lately.
I dont actually follow the -dev list but perhaps I should, as I think
I'm going to be using and doing far more diverse bioinformatics stuff
(now that I'm employed as a bioinformatician :)
Anyway, the email can be changed to cymon.cox at gmail.com - best to go
through google I think as their spam filters tend to be pretty good.
Cheers, C.
(In reply to comment #23)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 12:24:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 12:24:27 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST -------
Checked in, marking as fixed.
Bio/SeqIO/TabIO.py initial revision: 1.1
Bio/SeqIO/__init__.py new revision: 1.33
Tests/output/test_SeqIO new revision: 1.25
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 23:20:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 23:20:11 -0400
Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
Summary: AlignInfo.py fails a test
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
When I run:
$ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py
I get the first 2 test OK but then:
Traceback (most recent call last):
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in
print summary.information_content()
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in
information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
I've also tried without the AlignIO:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.ProteinAlphabet)
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
summary.information_content()
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/align.py", line 16, in
summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 508, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 04:49:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 04:49:08 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST -------
Going over your example code:
>>> from Bio import Alphabet
>>> from Bio.Align.Generic import Alignment
>>> from Bio.Align.AlignInfo import SummaryInfo
>>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
>>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
>>> a = Alignment(Alphabet.ProteinAlphabet)
First problem, you gave the Alignment object an Alphabet class, rather than an
instance of the class. I guess we should an explicit check to the Alignment
object...
You should have used:
>>> a = Alignment(Alphabet.ProteinAlphabet())
Or, if you prefer perhaps:
>>> a = Alignment(Alphabet.generic_protein)
Then when we get to the information_content, there is another issue:
>>> a.add_sequence("asp",seq1)
>>> a.add_sequence("unk",seq2)
>>> summary = SummaryInfo(a)
>>> summary.information_content()
Traceback (most recent call last):
...
AttributeError: ProteinAlphabet instance has no attribute 'gap_char'
The trouble here is that SummaryInfo class is looking for a declared gap
character in the protein alphabet - and none has been declared. Your example
sequences appear to use "-" as a gap, but you haven't declared this.
Try this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-"))
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
print summary.information_content()
You mentioned having a similar issue with Bio.AlignIO - could you attached the
example file to this bug with some trivial code showing your problem?
Thanks, Peter.
P.S. Please update to Biopython 1.47 rather than using 1.46
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 05:50:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 05:50:49 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST -------
I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py
directly. I don't know how I missed that before...
/home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <--
AlignInfo.py
new revision: 1.15; previous revision: 1.14
done
My opinion from from looking at the AlignInfo code, and scanning back over the
CVS history, is that it was ever used much with generic alphabets (as tend to
be returned by Bio.AlignIO). There may be other issues here - for example I've
spotted another problem case, doubly extended alphabets like a protein alphabet
with declared Gapped and WithStopCodon (which you *might* want in an
alignment).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 11 06:33:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 11 Jul 2008 11:33:22 +0100
Subject: [Biopython-dev] Checking alphabet argument in alignments
Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com>
I'd like to add the following check to the __init__ method of the
Bio.Align.Generic.Alignment object (our base alignment class),
> if not (isinstance(alphabet, Alphabet.Alphabet) \
> or isinstance(alphabet, Alphabet.AlphabetEncoder)):
> raise ValueError("Invalid alphabet argument")
This will prevent subtle user errors like this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet)
which should be:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet())
The only downside I have thought of is if anyone has created their own
alignment class which does NOT subclass the original
Bio.Alphabet.Alphabet class.
This same test could (should?) also be added to the Seq and MutableSeq objects.
What do people think?
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 06:39:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 06:39:48 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST -------
In comment 2 I wrote:
> I've spotted another problem case, doubly extended alphabets like a
> protein alphabet declared Gapped and WithStopCodon (which you *might*
> want in an alignment).
This alphabet issue is fixed in CVS, as is another corner case of a divide by
zero error where an entire column consists of ignored characters.
Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS.
Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 12:18:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 12:18:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST -------
(In reply to comment #24)
OK, I updated Phd.py.
The last module to consider is Ace.py; I'll upload a fixed version soon.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:00:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:00:10 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST -------
(In reply to comment #1)
> First problem, you gave the Alignment object an Alphabet class, rather than an
> instance of the class. I guess we should an explicit check to the Alignment
> object...
Yes, that is my fault.
> You mentioned having a similar issue with Bio.AlignIO - could you attached the
> example file to this bug with some trivial code showing your problem?
Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py
rev. 1.17):
from Bio.Align import AlignInfo
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
print summary.information_content()
And I got (and this time I am not supplying any alphabet, at least not
explicit):
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/2542.py", line 12, in
print summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 499, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
> P.S. Please update to Biopython 1.47 rather than using 1.46
I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not
available from the drop-down menu in bugzilla form.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:02:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:02:24 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST -------
Created an attachment (id=971)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view)
This file is used by my example were information_content() fails when sequences
retrieved with AlignIO
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:16:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:16:03 -0400
Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and
Bio.AlignIO
In-Reply-To:
Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2443
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Specifying the alphabet in |Specifying the alphabet in
|Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST -------
I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and
Bio.AlignIO (both their read() and parse() functions).
See also alphabet issues raised on Bug 2542.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:19:50 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST -------
> Yes, this code with Bio.AlignIO also fails (I tried right now with
> AlignInfo.py rev. 1.17):
>
> from Bio.Align import AlignInfo
> from Bio.Align.AlignInfo import SummaryInfo
> from Bio import AlignIO
> fn = open("secu3.aln")
> alignment = AlignIO.read(fn, "clustal")
> summary = SummaryInfo(alignment)
> print summary.information_content()
>
> And I got (and this time I am not supplying any alphabet, at least not
> explicit):
>
> Traceback (most recent call last):
> ...
> ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
> frequencies
Good. That seems to be working as intended - alignment formats like FASTA or
Clustal do not specify the sequence type (unlike for example the Nexus format).
Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional
alphabet argument? I had already been considering this for Bio.SeqIO so this
is a natural extension. See Bug 2443.
Unless information_content() can determine the sequence type (protein or
nucleotide) from the alignment alphabet, you have to help it by supplying an
appropriate e_freq_table argument.
Perhaps:
from Bio.Alphabet import IUPAC
from Bio.SubsMat import FreqTable
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
#Have a generic alphabet, without a declared gap char, so must
#provide the frequencies and chars to ignore explicitly:
expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25},
FreqTable.FREQ, IUPAC.unambiguous_dna)
print summary.information_content(e_freq_table=expected,
chars_to_ignore=['-'])
This is probably safest. I'm doubtful that information_content() will choose
wisely if given mixed case or lower case sequences... if that is the case it
should be filed as a new bug.
>
> > P.S. Please update to Biopython 1.47 rather than using 1.46
>
> I was using Biopython 1.47, but I reported as 1.46 just because 1.47
> it is not available from the drop-down menu in bugzilla form.
Thanks for the reminder - I've added that to Bugzilla now :)
I'm marking this bug as fixed now (after the updates to AlignInfo.py)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From peter at maubp.freeserve.co.uk Sat Jul 12 09:45:46 2008
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 12 Jul 2008 14:45:46 +0100
Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW
Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com>
For some time now we've been discouraging the use of the HTML and
plain text Blast parsers in favour of the XML format.
I think it would be a good idea to now officially deprecate the HTML
parser in Bio.Blast.NCBIWWW with warning messages when it is used. I
don't even know if it still works with the recent big revision to the
BLAST webpages, but I suspect not.
However, the plain text parser in Bio.Blast.NCBIStandalone still has
its uses. In particular, right now the PSI-BLAST output in XML format
lacks some of the information found in the plain text output (new vs
reused entries) so it would be premature to deprecate our plain text
PSI parser. See Bug 2502 for details:
http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jul 13 12:23:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 Jul 2008 12:23:57 -0400
Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named
ancestors
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
Summary: Bio.Nexus.Trees can't handle named ancestors
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: FreeBSD
Status: NEW
Severity: normal
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: markd at soe.ucsc.edu
The following code produces:
ValueError: invalid literal for float(): Ancestor1
from Bio.Nexus import Trees
# from http://evolution.genetics.washington.edu/phylip/newicktree.html
treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);"
tree = Trees.Tree(treeStr)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 14 06:17:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 Jul 2008 06:17:14 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST -------
This sounds like a job for Frank (the Bio.Nexus module author).
Can I ask if you've actually come across trees with names ancestor nodes in
"real life"? That would make this bug more important. If so, the name of the
tool would be interesting, an example tree file would be great to add to
Biopython as a test case.
If on the other hand the only named ancestor tree you've ever tried is the
example from the Newick documentation, this doesn't seem such a high priority
(but still worth fixing).
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 15 16:07:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Jul 2008 16:07:56 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST -------
This is a suggested implementation of the split method for our Seq object,
modelled after the python string method which it calls internall. Note that I
have made the separator non-optional on the grounds that the string method's
default of white space isn't (usually) sensible for sequences. I'm happy to
change this if people this its better to be as close as possible to the string
method.
def split(self, sep, maxsplit=None) :
"""Split method, like that of a python string.
Return a list of the 'words' in the string (as Seq objects),
using sep as the delimiter string. If maxsplit is given, at
most maxsplit splits are done.
Unlike the python string method, sep must be specified (as
there shouldn't be any whitespace strings in a sequence).
e.g. print my_seq.split("-")
"""
if maxsplit :
parts = self.data.split(sep, maxsplit)
else :
parts = self.data.split(sep)
return [Seq(chunk, self.alphabet) for chunk in parts]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 05:39:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 05:39:01 -0400
Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
Summary: Bio.SeqIO improvements
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
$ python
Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> handle = open("genbank-synthetic.gb")
>>> print seq_record
ID: EF452680.2
Name: EF452680
Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
/comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
/sequence_version=2
/source=synthetic construct
/taxonomy=['other sequences', 'artificial sequences']
/keywords=['']
/references=[,
, , ]
/accessions=['EF452680']
/data_file_division=SYN
/date=11-JUN-2008
/organism=synthetic construct
/gi=166831528
Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
IUPACAmbiguousDNA())
>>>
I do not see how I could access the value 'DNA' from the LOCUS line:
LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Could seq_record.features have a repr() function to give me something useful
instead of this?
>>> print seq_record.features
[, , ]
>>>
I don't see documented anywhere in the biopython docs access the features,
pasting something like the following into docs would give a user clue where to
look for for values:
>>> print seq_record.features[0].qualifiers
{'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
>>> print seq_record.features[1].qualifiers
{'gene': ['NOS']}
>>> print seq_record.features[2].qualifiers
{'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
gondii'], 'db_xref': ['GI:166831529'], 'translation':
['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
>>> print seq_record.features[3].qualifiers
Traceback (most recent call last):
File "", line 1, in
IndexError: list index out of range
>>>
I wonder if I could access the above dicts as seq_record.features['source']
or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 06:30:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 06:30:21 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature
| |improvements
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST -------
(In reply to comment #0)
> $ python
>
> Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
> [GCC 4.3.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio import SeqIO
> >>> handle = open("genbank-synthetic.gb")
I'm guessing the missing line here was something like:
seq_record = SeqIO.read(handle, "genbank")
> >>> print seq_record
> ID: EF452680.2
> Name: EF452680
> Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
> /comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
> /sequence_version=2
> /source=synthetic construct
> /taxonomy=['other sequences', 'artificial sequences']
> /keywords=['']
> /references=[,
> , instance at 0x834ceac>, ]
> /accessions=['EF452680']
> /data_file_division=SYN
> /date=11-JUN-2008
> /organism=synthetic construct
> /gi=166831528
> Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
> IUPACAmbiguousDNA())
> >>>
>
>
> I do not see how I could access the value 'DNA' from the LOCUS line:
> LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
Currently the sequence type (DNA, RNA, Protein) is used internally by the
GenBank parser to determine the alphabet. It is not currently recorded in the
SeqRecord object's annotation but could be. How about something like this?:
seq_record.annotations["seq_type"]
> No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Assuming that the first feature is the source (typically the case), and
assuming it has a specified molecule type, then your suggestion is one work
around. But I agree, its not nice.
> Could seq_record.features have a repr() function to give me something useful
> instead of this?
>
> >>> print seq_record.features
> [, instance at 0x837b9cc>, ]
Yes we could add that, but you wouldn't want to do that on a typical genome
with thousands of features. Adding a repr method for the Reference object is
also something I had wondered about doing.
> I don't see documented anywhere in the biopython docs access the features,
> pasting something like the following into docs would give a user clue where to
> look for for values:
>
> >>> print seq_record.features[0].qualifiers
> {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
> construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
> aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
> >>> print seq_record.features[1].qualifiers
> {'gene': ['NOS']}
> >>> print seq_record.features[2].qualifiers
> {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
> ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
> gondii'], 'db_xref': ['GI:166831529'], 'translation':
> ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
> 'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
There is a minimal bit of text in what is currently Chapter 10 of the tutorial
on the SeqFeature object. I agree, this is an area that needs improvement.
Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter
would help?
> >>> print seq_record.features[3].qualifiers
> Traceback (most recent call last):
> File "", line 1, in
> IndexError: list index out of range
You must have only three features (indexed 0, 1 and 2) which explains the index
error.
> I wonder if I could access the above dicts as seq_record.features['source']
> or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
As the .type attribute, try this:
for feature in seq_record.features:
print feature.type
You can't access the features by type (e.g. seq_record.features['CDS']) because
there is generally more than one feature of each type.
Peter
P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the
underlying Bio.GenBank parser or the SeqFeature object. I have therefore
changed the title.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 06:49:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 11:49:19 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
Message-ID: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
Michiel,
I just noticed your CVS revision to Bio/Saf/__init__.py removing this
snippet of code:
dumpfile = open( 'dump', 'w' )
dumpfile.write( data )
dumpfile.close()
I recall seeing (and removing) a similar lump of diagnostics/debugging
code from another of Katharine Lindner's parsers.
There is still a similar bit of code in
Bio/IntelliGenetics/__init__.py which we could remove, but as the
whole module is now deprecated we could just wait for a few releases
and then remove it entirely.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 07:40:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 07:40:53 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807161140.m6GBerMH021048@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 07:40 EST -------
(In reply to comment #8)
> Whether or not to stop translating at the first stop codon could be an
> argument to the translate method. As an alternative, it may be preferable
> to have a split() method that splits the sequences at the stop codons.
> Such a method could be applied to all protein sequences, not only those
> created by translate().
Adding a split() method to the Seq object is a good idea in general (making the
Seq object more like a python string), and using my_protein.split("*") is an
nice example usage of this.
I have posted a possible implementation of the split() method for the Seq
object on Bug 2351 comment 15
http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 08:40:03 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 13:40:03 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
Message-ID: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
>> But, there is also a set of interconnected modules where it's not 100%
>> clear if they can be removed without causing some surprises:
>> Bio.builders
>> Bio.config
>> Bio.dbdefs
>> Bio.formatdefs
>> Bio.dbdefs
>> Bio.expressions
>> Bio.FormatIO
>> Bio.Std
>> Bio.StdHandler
>> It is probably OK to remove these, since these were deprecated we did
>> not get a barrage of complaints from our users. Personally, I think it is
>> important to keep the code base clean, so I am in favor of removing
>> these (and see if anybody complains; in that case, we can always put
>> these modules back in and make a new release). But I can live with
>> keeping these modules for another release round. If anybody thinks
>> that that would be better, please let us know.
>
> Given some of these are very interconnected, I would be inclined to leave
> them in for one more release. However I'm content to see them go. If no
> one else has any qualms, then please carry on.
Now that Biopython 1.47 is out, its probably time to remove
Bio.expressions (deprecated in 1.44) and explicitly deprecate the
rest:
Bio.builders
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.Std
Bio.StdHandler
(plus Bio.Writer which is part this "Bioformat" code base?)
The final entry from your list, Bio.FormatIO, has already been removed.
Peter
From mjldehoon at yahoo.com Wed Jul 16 10:14:07 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 16 Jul 2008 07:14:07 -0700 (PDT)
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
Message-ID: <729090.76301.qm@web62408.mail.re1.yahoo.com>
I removed a similar piece of code in one more module (I forgot which one).
While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
--Michiel.
--- On Wed, 7/16/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
> To: "BioPython-Dev Mailing List"
> Date: Wednesday, July 16, 2008, 6:49 AM
> Michiel,
>
> I just noticed your CVS revision to Bio/Saf/__init__.py
> removing this
> snippet of code:
>
> dumpfile = open( 'dump',
> 'w' )
> dumpfile.write( data )
> dumpfile.close()
>
> I recall seeing (and removing) a similar lump of
> diagnostics/debugging
> code from another of Katharine Lindner's parsers.
>
> There is still a similar bit of code in
> Bio/IntelliGenetics/__init__.py which we could remove, but
> as the
> whole module is now deprecated we could just wait for a few
> releases
> and then remove it entirely.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Wed Jul 16 10:44:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 15:44:28 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <729090.76301.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
<729090.76301.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
On Wed, Jul 16, 2008 at 3:14 PM, Michiel de Hoon wrote:
> I removed a similar piece of code in one more module (I forgot which one).
Bio/MetaTool/__init__.py if anyone wanted to know. The CVS changes
RSS feed is handy:
http://biopython.org/wiki/Tracking_CVS_commits
> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
Yes, it probably does - assuming anyone still uses the file format.
I'll take a look at that at some point.
Peter wrote:
>> There is still a similar bit of code in
>> Bio/IntelliGenetics/__init__.py which we could remove, but
>> as the whole module is now deprecated we could just wait
>> for a few releases and then remove it entirely.
I've removed the Bio.IntelliGenetics dumpfile code in CVS.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 11:01:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 11:01:41 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807161501.m6GF1fuG028930@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #26 from mdehoon at ims.u-tokyo.ac.jp 2008-07-16 11:01 EST -------
I've uploaded a fixed parser in Bio.Sequencing.Ace to CVS; feel free to have a
look and comment.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 11:32:03 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 16:32:03 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
<729090.76301.qm@web62408.mail.re1.yahoo.com>
<320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
Message-ID: <320fb6e00807160832w4eef825ek3ed4cfde1cc92cd2@mail.gmail.com>
>> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
>
> Yes, it probably does - assuming anyone still uses the file format.
> I'll take a look at that at some point.
I've been looking at the PredictProtein site's SAF (Simple Alignment
Format) specification, which as far as I know is the only definition
(spelling errors and all). Its a free-format somewhat like PHYLIP,
and for "nice" input files parsing shouldn't be too difficult.
However, some of the corner cases they give are frankly evil, and I
wonder if Bio.Saf is actually compliant.
See http://www.predictprotein.org/Dexa/optin_safDes.html
I'd like to propose deprecating Bio.Saf on the main mailing list.
If there are people wanting to use this SAF format, we can then
worrying about implementing a non-Martel parser for this file format
in Bio.AlignIO instead - and explicitly test it can cope with all the
examples given.
Peter
P.S. I updated Bio.Saf to use the new URL for the PredictProtein site.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 12:08:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 12:08:38 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807161608.m6GG8c0s031867@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 12:08 EST -------
Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit
repetitive. Might a sub-function help here? Also, I was wondering if you
managed to fix Bug 2446 as a nice bonus.
Regarding the Bio.Sequencing.Phd changes, Michiel has now deprecated Frank &
Cymon's original scanner/consumer parser. I didn't think it make sense to
leave the original header as it was (with their old version number etc and the
suggestion to contacting them directly with bugs). They are of course still
listed in the copyright header.
New Bio.Sequencing.Phd docstring header text in CVS:
"""
Parser for PHD files output by PHRED and used by PHRAP and CONSED.
This module can be used used directly which will return Record objects
which should contain all the original data in the file.
Alternatively, using Bio.SeqIO with the "phd" format will call this module
internally. This will give SeqRecord objects for each contig sequence.
"""
Previous text:
"""
Parser for PHD files output by PHRED and used by PHRAP and CONSED.
Works fine with PHRED 0.020425.c
Version 1.1, 03/09/2004
written by Cymon J. Cox (cymon.cox at gmail.com ) and
Frank Kauff (fkauff 'AT' biologie.uni-kl.de).
Comments, bugs, problems, suggestions to one of us are welcome!
"""
Frank & Cymon - I should have asked first, but is this revised wording OK with
you?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 16:35:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 16:35:13 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200807162035.m6GKZDOn012941@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz 2008-07-16 16:35 EST -------
(In reply to comment #1)
>
> I'm guessing the missing line here was something like:
> seq_record = SeqIO.read(handle, "genbank")
Yes, I forgot to paste one line, sorry.
> > I do not see how I could access the value 'DNA' from the LOCUS line:
> > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
>
> Currently the sequence type (DNA, RNA, Protein) is used internally by the
> GenBank parser to determine the alphabet. It is not currently recorded in the
> SeqRecord object's annotation but could be. How about something like this?:
>
> seq_record.annotations["seq_type"]
I am not much familiar with the official naming of the fields in LOCUS line
by Genbank but hope you are. Yes, it would be fine for me. I hope all other
values from LOCUS line can be accessed similarly as well.
> > Could seq_record.features have a repr() function to give me something useful
> > instead of this?
> >
> > >>> print seq_record.features
> > [, > instance at 0x837b9cc>, ]
>
> Yes we could add that, but you wouldn't want to do that on a typical genome
> with thousands of features. Adding a repr method for the Reference object is
> also something I had wondered about doing.
I think it could be there even for large records. It not up to the programmer
to use repr() or not, and while testing/learning it would be really useful. Or
at least internally the routine could check for number of features and in case
there would be thousands it could print some first and then stop with a clear
message how to force for full listing.
> > I don't see documented anywhere in the biopython docs access the features,
> > pasting something like the following into docs would give a user clue where to
> > look for for values:
> >
> > >>> print seq_record.features[0].qualifiers
> > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
> > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
> > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
> > >>> print seq_record.features[1].qualifiers
> > {'gene': ['NOS']}
> > >>> print seq_record.features[2].qualifiers
> > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
> > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
> > gondii'], 'db_xref': ['GI:166831529'], 'translation':
> > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
> > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
>
> There is a minimal bit of text in what is currently Chapter 10 of the tutorial
> on the SeqFeature object. I agree, this is an area that needs improvement.
Yes I read that before but it is too short, even after reading 2.4.2, 4.2.1,
9.2 and http://biopython.org/wiki/SeqIO.
>
> Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter
> would help?
Definitely, you should pick some exceptional record having different fields,
I think the one I have shown is quite OK.
>
> > >>> print seq_record.features[3].qualifiers
> > Traceback (most recent call last):
> > File "", line 1, in
> > IndexError: list index out of range
>
> You must have only three features (indexed 0, 1 and 2) which explains the
> index error.
I knew, it was intentional. ;-)
>
> > I wonder if I could access the above dicts as seq_record.features['source']
> > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
>
> As the .type attribute, try this:
>
> for feature in seq_record.features:
> print feature.type
>>> for feature in seq_record.features:
... print feature.type
...
source
gene
CDS
>>>
>
> You can't access the features by type (e.g. seq_record.features['CDS'])
> because there is generally more than one feature of each type.
Yes, but how about seq_record.features['CDS'][index]? Could that be provided?
> P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the
> underlying Bio.GenBank parser or the SeqFeature object. I have therefore
> changed the title.
Thanks!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Wed Jul 16 21:11:43 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 17 Jul 2008 02:11:43 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
Message-ID: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Hi all,
This year I will be delivering the Biopython presentation at BOSC
2008. The current draft is attached to this email (ppt format - yuck -
but the easieast to edit).
Comments, suggestions, changes are most welcome. Just one point, the
presenation is this Saturday, so if you have any comments, please send
them soon.
There is one slide still to be completed and a few presentation/looks
issues still to be edged out.
Many thanks,
Tiago
--
"Data always beats theories. 'Look at data three times and then come
to a conclusion,' versus 'coming to a conclusion and searching for
some data.' The former will win every time."
?Matthew Simmons,
http://www.tiago.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bosc2008.ppt
Type: application/vnd.ms-powerpoint
Size: 482816 bytes
Desc: not available
URL:
From biopython at maubp.freeserve.co.uk Thu Jul 17 09:07:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Jul 2008 14:07:53 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Message-ID: <320fb6e00807170607s32af2744j479eb2b2e545f454@mail.gmail.com>
> Comments, suggestions, changes are most welcome. Just one point, the
> presentation is this Saturday, so if you have any comments, please send
> them soon.
I've sent Tiago some specific comments directly (little things). One
issue which might deserve wider discussion is the project's short term
goals. I would suggest putting:
* Moving from CVS to Subversion
* Make Sequence objects more OO and string-like
* More file formats in Bio.SeqIO and Bio.AlignIO
And also perhaps the Numeric to numpy move, Bug 2251
http://bugzilla.open-bio.org/show_bug.cgi?id=2251
I subscribe to the numpy mailing list and they seem to have been
making big strides in the documentation. Also it looks like they plan
to make Travis Oliphant's "Guide to NumPy" book free after "SciPy
2008" - which probably means the August 2008 SciPy conference at
Caltech rather than EuroSciPy 2008 in July in Germany.
Peter
From tiagoantao at gmail.com Thu Jul 17 17:45:56 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 17 Jul 2008 22:45:56 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Message-ID: <6d941f120807171445t32178835n6f5dd77f11f3f004@mail.gmail.com>
Hi all,
I would like to thank all that sent comments. I used the vast majority
of comments sent, so feedback was really useful.
Tiago
On Thu, Jul 17, 2008 at 2:11 AM, Tiago Ant?o wrote:
> Hi all,
>
> This year I will be delivering the Biopython presentation at BOSC
> 2008. The current draft is attached to this email (ppt format - yuck -
> but the easieast to edit).
> Comments, suggestions, changes are most welcome. Just one point, the
> presenation is this Saturday, so if you have any comments, please send
> them soon.
>
> There is one slide still to be completed and a few presentation/looks
> issues still to be edged out.
>
> Many thanks,
> Tiago
>
> --
> "Data always beats theories. 'Look at data three times and then come
> to a conclusion,' versus 'coming to a conclusion and searching for
> some data.' The former will win every time."
> ?Matthew Simmons,
> http://www.tiago.org
>
--
"Data always beats theories. 'Look at data three times and then come
to a conclusion,' versus 'coming to a conclusion and searching for
some data.' The former will win every time."
?Matthew Simmons,
http://www.tiago.org
From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:07:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:07:02 -0400
Subject: [Biopython-dev] [Bug 1999] new frame translation method
In-Reply-To:
Message-ID: <200807190007.m6J0721C023043@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1999
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:09:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:09:26 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807190009.m6J09Qm2023188@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:30:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:30:36 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807190030.m6J0Ua27024398@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:30 EST -------
(In reply to comment #2)
> {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
> u'Jos\xe9'},
If I remember right this is the string-ified representation of utf8
data when you call str() or repr() on them. One could then in upper code
try to convert it back but one has to invent the magic code. In my programs
I avoid unicode but stick to utf8 and pass it back to the user. But as I say,
you may never use print(), str(), repr() because they are not utf8/unicode
safe. That should be one of the things to be fixed in python-3.
So in summary when I do raise an exception these values will get always
printed in the above escaped form, but it is the only exception. I believe
as long as you return the values the current code is ok. But, haven't tested.
grep-ing related stuff from my programs use e.g.:
self._connection = connect(unix_socket=unix_socket, db=dbname, user=username,
passwd=password, init_command='SET AUTOCOMMIT=0', charset='utf8',
use_unicode=False)
if self._connection.character_set_name() != 'utf8':
# test whether we really have utf8 connection
raise RuntimeError, "Connection to mysql not in utf8 mode: %s" %
self._connection.character_set_name()
value = unicode(value).encode('utf8')
http://evanjones.ca/python-utf8.html
http://www.idealliance.org/proceedings/xtech05/papers/02-08-01/
http://www.amk.ca/python/howto/unicode
http://diveintopython.org/xml_processing/unicode.html
http://www.jorendorff.com/articles/unicode/python.html
from elementtree.ElementTree import parse, Element, SubElement, ElementTree
# use 'utf8' and not 'utf-8' for Element.write() !!!
# We must supply unicode values to ElementTree and not just utf8 encoded
strings.
_value_node.text = _value.decode('utf8')
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:37:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:37:36 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807190037.m6J0baGc024748@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #6 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:37 EST -------
I was just about to report this bug. I use biopython to translate EST
sequences. They are full of sequencing errors although one knows the CDS
region, still it is often interrupted by N's or by literal STOP codons. The
current implementation in biopython-1.47 broke it for me. I haven't tested the
attached patches but would propose to make this strict check optional.
Currently it seems there is no way to pass down the code some variable not to
barf in such cases. Will attach my current hack.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:38:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:38:48 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200807190038.m6J0cmLK024884@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:38 EST -------
Created an attachment (id=972)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=972&action=view)
Hack not to break on Ns for unknown bases in ESTs
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 08:47:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 08:47:34 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191247.m6JClYEO004649@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #955 is|0 |1
obsolete| |
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:47 EST -------
(From update of attachment 955)
I've checked this code in, with the most of the assertions moved into a new
unit test. This patch is now obsolete.
Checking in Bio/Data/CodonTable.py;
/home/repository/biopython/biopython/Bio/Data/CodonTable.py,v <--
CodonTable.py
new revision: 1.9; previous revision: 1.8
done
RCS file: /home/repository/biopython/biopython/Tests/test_CodonTable.py,v
done
Checking in Tests/test_CodonTable.py;
/home/repository/biopython/biopython/Tests/test_CodonTable.py,v <--
test_CodonTable.py
initial revision: 1.1
done
RCS file: /home/repository/biopython/biopython/Tests/output/test_CodonTable,v
done
Checking in Tests/output/test_CodonTable;
/home/repository/biopython/biopython/Tests/output/test_CodonTable,v <--
test_CodonTable
initial revision: 1.1
done
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 08:52:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 08:52:02 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191252.m6JCq26c004896@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:52 EST -------
(In reply to comment #6)
> I was just about to report this bug. I use biopython to translate EST
> sequences. They are full of sequencing errors although one knows the CDS
> region, still it is often interrupted by N's or by literal STOP codons. The
> current implementation in biopython-1.47 broke it for me. I haven't tested the
> attached patches but would propose to make this strict check optional.
> Currently it seems there is no way to pass down the code some variable not to
> barf in such cases. Will attach my current hack.
Do you have an example which "worked" in an older version of Biopython, but is
"broken" in Biopython 1.47?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:40:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 12:40:58 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191640.m6JGew57014127@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:40 EST -------
>gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence
AAGAAAACGAGAAGGACGGGGTTATATAGTAAGGTACAAACAGGGCANNNNNNCCATTACACGACCAACT
TCTTCGCCTTGCCCTTTTTCTCAGAGTCCTTGTGCGACAGGAACTCGACCTCGGTCGCAAGAGGCCCAGC
AAGTCGCGCTCCCTCGGGGTACCCAAGCACACTCATCTTGAAATGCTTCCCAACTCCCTCAATCCTTTCC
CGCAGCCCCGCATCCTCCTCGGTCGGTGCAAGTCGCGTCCATATCGACAATCGATAAAACTGCGGCCGCG
TCGACACGATCACACCTGTAATCAGCGACGCAGACCCACTTCCACCCGTCAGCGTCGGCGATGGGTCAAA
TGTTTCCCCGATCGCAGCCAGCATCGTATACAGCCACATCTTGTCTACGTTGGGTCGGTTTTTATCTTTG
GGCAGTTGGATACTCCATTTTCCTCCAAGCTTGTTCGCCTCGTCCTCCCATGCGGGAATAATTCCCTCCT
TGAAAAGGTAATAATTTGCCTTCTGGGGCAGTTGAGATGGCGGTATGATGTTGTTATATAACCCCCAAAA
CTCCNNNNNGCTATCAAAGNNNNNGACCCGCNNNNNGTCCNCCANNNACCCTTNNNCCNNNNNANNNCCG
GNNNNNNNNNNNNTGNGGGTCNNNNNNNNNGCTNNNNNNNNNNTNNNNNG
resulted as of Aug 5 2007 in a six-frame translation
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+1
KKTRRTGLYSKVQTG***HYTTNFFALPFFSESLCDRNSTSVARGPASRAPSGYPSTLILKCFPTPSILSRSPASSSVGASRVHIDNR*NCGRVDTITPVISDADPLPPVSVGDGSNVSPIAASIVYSHILSTLGRFLSLGSWILHFPPSLFASSSHAGIIPS
LKR**FAFWGS*DGGMMLLYNPQNS**LSK**TR**S***P*****P******V***A*****
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+2
RKREGRGYIVRYKQG***ITRPTSSPCPFSQSPCATGTRPRSQEAQQVALPRGTQAHSS*NASQLPQSFPAAPHPPRSVQVASISTIDKTAAASTRSHL*SATQTHFHPSASAMGQMFPRSQPASYTATSCLRWVGFYLWAVGYSIFLQACSPRPPMRE*FPP
*KGNNLPSGAVEMAV*CCYITPKT***YQ***P****P*TL*****R*****G**********
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+3
ENEKDGVI**GTNRA**PLHDQLLRLALFLRVLVRQELDLGRKRPSKSRSLGVPKHTHLEMLPNSLNPFPQPRILLGRCKSRPYRQSIKLRPRRHDHTCNQRRRPTSTRQRRRWVKCFPDRSQHRIQPHLVYVGSVFIFGQLDTPFSSKLVRLVLPCGNNSLLEKVIICLLGQLRWRYDVVI*PPKL**AIK**DP**V***P************G**********
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-1
**********W************P***L**AQ**KLS**LKTPNILL*YGGRVDGVFRLIMEKFLP**GRTLLLRLFEPPFTS*VDGFLFLAGLHLFYTDICYDRR*PLCKLGSGCDCPPSPRRSD*CPH*HSCAGVKIANSYTCAERGWLLLRPDALS*LPQPFVKFYSHEPMGLPRAERPGERWLQLKDSVFLRLFFPFRFFNQHIT**TGQTWNDILGQEEQK
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-2
**********G*****G*****FP*T****P***NY***SKPPIYCCSMAVELTGSSV**WKSSSLNKGVPSCSACSNLLLPHRLTGFYFWLGCICSTPTYATTDASPFVNWVAAATAHLHPDAATNVHTSTAAPASK*LTAIPALNVAGSSYAPTPFPNSLNPS*SSTHTNPWGSLALNDPENAGSSSRTACS*DSFSRSASSTSTL***RDKHGMIYWGRKSKR
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-3
*****S***L******A*****S***P**RP**ETI**PQNPQYIVVVWR*S*RGLPFNNGKVPPLIRAYPPAPLVRTSFYLIG*RVSIFGWVASVLHRHMLRPTLAPL*TG*RLRLPTFTQTQRLMSTLAQLRRRQNS*QLYLR*TWLAPPTPRRPFLTPSTLRKVLLTRTHGAPSR*TTRRTLAPAQGQRVPETLFPVPLLQPAHY***GTNME*YIGAGRAKE
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:44:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 12:44:36 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191644.m6JGiahx014350@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:44 EST -------
BTW, formatdb silently ignores asterisks so you have to replace them with X
yourself otherwise alignment outputs from blast do not reflect reality.
Don't know if I would prefer biopython to give me 'X' instead of '*', maybe
for codons with 'N', 'R' would prefer X while for true STOP codons would prefer
'*'. In PIR database is nice that proteins really ending at a STOP codon have a
trailing '*'.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:24:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 16:24:41 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807192024.m6JKOffD023599@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 16:24 EST -------
How did you do the six translations in comment 9? Using Bio.Seq.translate()
would have failed with a TranslationError on any "NNN" codon or similar.
By common agreement "*" is used for a stop symbol. While "X" generally means
any amino acid, I have somethimes seen it used to mean any amino acid OR a stop
codon (in the NCBI translations in certain GenBank files).
Personally I think it would be nice if there was an agreed character for an
amino acid OR stop codon (e.g. "!" for example). However, as far as I know no
such convention exists, so we shouldn't invent one as the default in Biopython.
P.S. The nicest way to handle translate("NNN") isn't what I filed this bug
about. Its the fact that translate("{@}") or anything else like that returns
"*" and not an error.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 17:40:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 17:40:35 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807192140.m6JLeZQR025907@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #12 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 17:40 EST -------
Created an attachment (id=973)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view)
translate_ESTs.py
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 10:46:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 10:46:23 -0400
Subject: [Biopython-dev] [Bug 2547] New: Translation of ambiguous codons
like NNN and TAN
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
Summary: Translation of ambiguous codons like NNN and TAN
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
It is often useful to want to translate ambiguous nucleotide sequences (e.g.
EST sequences), and this may contain codons which could code for an amino acid
OR a stop codon (e.g. NNN, TNN or TAN).
See for example Bug 2530 comment 6 and comment 9.
Currently Bio.Seq.translate() will not translate such sequences and raises an
exception.
The following example shows correct translation of ambiguous codons which only
encode valid amino acid(s) OR valid stop codons (but not both):
from Bio.Seq import translate
assert translate("TAA") == "*"
assert translate("TAG") == "*"
assert translate("TAT") == "Y"
assert translate("TAC") == "Y"
#Recall ambiguous nucleotide Y means T or C (pYrimidine)
#so TAY = TAT or TAC which both code for Y (Tyr, Tyrosine)
assert translate("TAY") == "Y"
#Recall ambigous nucleoide R means G or A (puRine)
#so TAR = TAG or TAA which both code for a stop codon
assert translate("TAR") == "*"
However, in Biopython 1.47 the following all raise an exception:
translate("TAN")
translate("TAM")
translate("TAK")
translate("TRR")
translate("TNN")
translate("NNN")
TAN, TAM, TAK, ... can code for Y or stop. More generally, "TRR" and "TNN" can
code multiple amino acids or a stop codon, and "NNN" can code for any amino
acid or a stop codon.
According to IUPAC, the single letter protein code X is an "unknown or 'other'
amino acid" (igoring its historic and obsolete usage for selenocysteine, now
U).
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html
This document does NOT cover the idea of stop codons, and I am not aware of any
additional symbol to mean "any amino acid OR a stop codon" which would be ideal
for this situation.
For comparison, the EMBOSS transeq tool will use X when given a codon which
could be either an amino acid OR a stop codon:
$ transeq -filter asis:NNNTANTARTAGTAYTAC
XX**YY
Therefore one solution would be to follow EMBOSS and return X for codons which
could be an amino acid OR a stop codon.
See also Bug 2530 on the related issue that Bio.Seq.translate() currently
translates invalid codons as "*" (presumably an accidental side effect of the
implementation).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 10:50:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 10:50:22 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807201450.m6KEoMVZ017607@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 10:50 EST -------
Martin,
I've filed Bug 2547 ("Translation of ambiguous codons like NNN and TAN") on the
separate issue of wanting to translate ambigous codons as found in EST
sequences.
This bug (Bug 2530) is only for the mis-translation of invalid codons as stop
characters.
If there is agreement that changing the behaviour of Bio.Seq.translate() as
described in Bug 2547 is desirable, then we end up fixing both issues at the
same time.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Sun Jul 20 11:03:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 20 Jul 2008 16:03:48 +0100
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To: <200807192140.m6JLeZQR025907@portal.open-bio.org>
References:
<200807192140.m6JLeZQR025907@portal.open-bio.org>
Message-ID: <320fb6e00807200803v57820ab8v2502d6e5671933cc@mail.gmail.com>
> ------- Comment #12 from mmokrejs -------
> Created an attachment (id=973)
> --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view)
> translate_ESTs.py
Martin,
I had some general comments on your code which you might find helpful.
Most of your variable name start with an underscore - this is very
unusual. There is a convention in Python that a single leading
underscore is used for private properties or methods of an object.
You used the following code to reverse a string by turning it into a
list and back again:
_reversed = list(_record.sequence)
_reversed.reverse()
_reversed = ''.join(_reversed)
For simply reversing a string, I would suggest using a stride of minus
one instead, reversed_string = old_string[::-1]
You then go on to take the reverse complement (without worrying about
ambiguous characters which could be present, e.g. R -> Y):
_reversed = list(_record.sequence)
_reversed.reverse()
_reversed = ''.join(_reversed)
_reversed =
_reversed.translate(string.maketrans('AaTtGgCcUu', 'TtAaCcGgAa'), '')
I would suggest using the Bio.Seq.reverse_complement() function here instead.
Finally are you aware of the string formatting operator (%) in python?
The following code:
_outprothandle.write(''.join(('>', _record.gi, ' ',
_record.definition, ' frame:-3', '\n',
translate(_reversed[2:]).replace('*','X'), '\n')))
might typically be written as:
_outprothandle.write('>%s %s frame:-3\n%s\n" % (_record.gi,
_record.definition, translate(_reversed[2:]).replace('*','X')))
See http://docs.python.org/lib/typesseq-strings.html for more details
(and how to use named insertion points).
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:08:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:08:22 -0400
Subject: [Biopython-dev] [Bug 2548] New: Updating IUPACData and
ExtendedIUPACProtein for U and O
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
Summary: Updating IUPACData and ExtendedIUPACProtein for U and O
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
The IUPAC data in Biopython has not been updated to officially use X for any
amino acid and U for selenocysteine (Sec). Nor do we support O for pyrrolysine
(Pyl) .
I haven't found an official statement from the IUPAC-IUBMB Joint Commission on
Biochemical Nomenclature via Google, but several major resources confirm this:
http://www.ebi.ac.uk/RESID/faq.html
http://www.uniprot.org/news/2008/02/26/release
http://doc.bioperl.org/bioperl-live/Bio/Tools/IUPAC.html
Patch to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:26:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:26:10 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807201626.m6KGQAQZ021741@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:26 EST -------
See also: http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html
Taking the following as the current IUPAC standard, there is no direct mention
of the use of J in NMR as designation for signals assigned either to leucine
(L) or to isoleucine (I) which cannot be distinguished from each other.
I am therefore NOT intending to add J to Biopython's IUPAC extend protein
alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:54:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:54:51 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807201654.m6KGsp7L022759@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:54 EST -------
Created an attachment (id=974)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=974&action=view)
Adds U and O, clearly defines X, but does not add J
Does anyone have any definative sources on the MW of these "new" amino acids?
Also I'd like to confirm if IUPAC have officially accepted "J" or not.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:30:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 14:30:17 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807201830.m6KIUHMb028714@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz 2008-07-20 14:30 EST -------
Regarding the selenocystein issue, expect "inconsistencies" between data files
released from NCBI. I haven't check now but in 2002 I had the following
communication with NCBI staff:
GenBank format requires official IUPAC amino acid code that doesn't include
Selenocystein and therefore it uses 'X'. FASTA format uses the NCBI extended
amino acid code that does include Selenecystein 'U'.
> >gi_2983532 formate dehydrogenase alpha subunit [Aquifex aeolicus]
> MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG
> AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV
> KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT
> ------------------------^
[cut]
>
> It seems there's buggy version in
> ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa
> although the .gbk flatfile says "X" in case of "U".
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 17:16:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 17:16:48 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807202116.m6KLGmdb005982@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #974 is|0 |1
obsolete| |
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 17:16 EST -------
Created an attachment (id=975)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=975&action=view)
Tested version of previous patch
This revision includes a work arround for missing molecular weights in
_make_ambiguous_ranges() function, and has been tested with the full test suite
on Linux.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 06:55:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 06:55:13 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211055.m6LAtDHp009314@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 06:55 EST -------
(In reply to comment #1)
> Regarding the selenocystein issue, expect "inconsistencies" between data files
> released from NCBI. I haven't check now but in 2002 I had the following
> communication with NCBI staff ...
I think you meant to post this on Bug 2548.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:04:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:04:14 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211104.m6LB4E0w009769@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-21 07:04 EST -------
Yes, sorry. :(
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:10:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:10:02 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211110.m6LBA2H8010005@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:10 EST -------
I've gone over the GenBank release notes on this issue...
Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb131.release.notes (Dated
August 15 2002, similar text appears in earlier files too as a warning of
intended changes)
==============================================================
1.3.3 Selenocysteine representation
At the May 1999 DDBJ/EMBL/GenBank collaborative meeting, it was learned
that IUPAC plans to adopt the letter 'U' for selenocysteine.
With this August 2002 release, selenocysteine residues are now presented
via residue abbreviation 'U', in both /translation and /transl_except
qualifiers.
==============================================================
By now they SHOULD have fixed any sequences which were using X for
selenocysteine to use U instead.
Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb156.release.notes (Dated
October 15 2006, similar text appears in earlier files too as a warning of
intended changes)
==============================================================
1.3.4 New protein residue abbreviation for Pyrrolysine
Sequence databases use single-letter amino acid abbreviations to
record the primary structure (sequence) of amino acids in a polypeptide.
The table of abbreviations includes only those amino acids that are
encoded in the genetic code and directly inserted by a tRNA during the
process of protein translation. Post-translational modifications are
not represented in the sequence data itself, but may be described by
features annotated on the sequence.
The discovery of the 22nd naturally encoded amino acid, pyrrolysine,
and the recent submission of sequence records that should contain
this residue, require the adoption of a new amino acid abbreviation.
Because several letters are assigned to represent different experimental
ambiguities, the only letter still available for use is O (uppercase
letter o). Scientists working in the field have independently suggested
use of this letter, and it has a reasonable mnemonic, pyrrOlysine.
The IUPAC-IUBMB Joint Commission on Biochemical Nomenclature has agreed
that Pyl/O will be recommended for this amino acid.
The consequences for flatfile users are that O can now appear in CDS
/translation qualifiers, and that Pyl (the three-letter abbreviation)
can appear in CDS /transl_except qualifiers and in the /product and
/anticodon qualifiers of tRNA features. These changes are legal as of this
October 2006 GenBank Release.
Sample ASN.1, FASTA, GenBank flatfile, and INSDSeq XML files for CP000099,
which has a protein with a pyrrolysine residue, are available for testing
purposes at the NCBI FTP site:
ftp://ftp.ncbi.nih.gov/genbank/Pyrrolysine_Samples
Files:
CP000099.pse (print-form ASN.1 Seq-entry)
CP000099.gbff (GenBank flatfile)
CP000099.aa_fsa (protein FASTA)
CP000099.isx (INSDSeq XML)
==============================================================
And later on in the same file,
==============================================================
1.3.5 Protein residue J for leucine/isoleucine ambiguities
The residue abbreviation J is reserved for mass spectrometry experiments that
cannot distinguish leucine from isoleucine. Although this abbreviation has
been part of the IUPAC recommendations for some time, it has not previously
appeared in protein sequences in the GenBank database.
As of October 2006, abbreviation J is legal in CDS /translation qualifiers,
and Xle (the three-letter abbreviation) will be allowed in CDS /transl_except
qualifiers and in the /product and /anticodon qualifiers of tRNA features.
J will also be mapped to unknown (X) for the purpose of BLAST and other
sequence similarity search tools.
==============================================================
So, according to GenBank, "The residue abbreviation J is reserved for mass
spectrometry experiments that cannot distinguish leucine from isoleucine ...
this abbreviation has been part of the IUPAC recommendations for some time".
I would prefer a direct citation, but that seems good enough evidence to me to
include J in the Biopython IUPAC extended protein alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:18:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:18:12 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211118.m6LBICM8010531@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:18 EST -------
Regarding Martin's example (erroneously added to Bugzilla as Bug 2547 comment
1), the protein GI:2983532
Martin wrote "GenBank format requires official IUPAC amino acid code that
doesn't include Selenocystein and therefore it uses 'X'."
That is out of date - IUPAC and GenBank both accept U for selenocysteine now
(see my notes in comment 4 of this bug).
Looking at these files:
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.gbk
(feature translation)
They both give the same amino acid sequence for GI:2983532, which includes "U"
but not "X" as I had expected.
>gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [Aquifex aeolicus VF5]
MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG
AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV
KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT
FGRGAMTNNWVDISNSDLVFVMGGNPAENHPCGFKWAIKAREKRGAKIICIDPRFNRTAAVADIFVQIRP
GTDIAFLGGLINYVLQNEKYQKEYVRLHTTGPFIVREDFGFKDGLFTGYDPKTRSYDTTTWDYEFDPATG
YPKMDPEMKHPRCVLNILKEHYSRYTPEVVSQICGCSKEDFLRVAEEVAKCGAPNKFMTILYALGWTHHS
YGTQLIRTACMLQLLLGNIGCPGGGINALRGHSNVQGMTDLAGQNKNLPTYIKPPKPEEQTLAQHLKNRT
PRKLHPTSLNYWANYPKFFISFLKCMWGDAATPENDFAYDYLYKPEGGYNSWDKFIDDMYKGKIEGVVTA
ALNFLNNTPNAKKTVRALKNLKWMVVMDPFMIETAQFWKAEGLDPKEVKTEILVLPTAVFLEKEGSFTNS
ARWVKWKYKATDPPGDAKDEFWIFGRFFMKLKEFYEKEGGAFPEPILNLVWPYKNPYYPTAEEILTEING
YYTRDVDGHKKGERVRLFTDLRDDGSTACGGWLYCGVFPPEGNLAKRTDLSDPLGLGTYPNYAWNWPANR
RVLYNRASCDEKGRPWDPERPLLRWDPERDMWVGDIPDYPATAPPEKGIGAFIMLPEGKGRLFAAKSYVT
FKDGPLPEHYEPYESPVTNILHPNVPHNPVAKVYKSDLDLLGTPDKFPHVATTYRLTEHYHFWTKHLYGP
SLLAPVMFIEIPEELAKEKGIQNGDLVRVSTARASIEAIALVTKRIKPLKVAGKTVYTIGIPIHWGFEGL
VKGAITNFITPNVWDPNSRTPEFKGFLANIEKVKT
It is quite possible that during the transition from X to U for selenocysteine
there were inconsistencies in GenBank - but I hope/expect the NCBI have fixed
them all by now.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:49:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:49:21 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211149.m6LBnLli012323@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #975 is|0 |1
obsolete| |
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:49 EST -------
Created an attachment (id=976)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=976&action=view)
Adds J, U and O, and clearly defines X as an unknown amino acid
Based on the GenBank release notes indirect confirmation that J is now an IUPAC
recommendation, I have updated my patch to include J as well. Note that this
requires a trivial update to test_seq.py (included in this patch).
Still ideally needs the MW filled in for U and O.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:25:59 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211525.m6LFPxgs022821@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:25 EST -------
I've managed to cobble together my first ever Perl program from scratch, and
established that BioPerl does the same as EMBOSS - they use an "X" when the
codon could be either an amino acid OR a stop codon.
My quick BioPerl script,
================================================
use Bio::Seq
$nuc_str = 'NNNTANTARTAGTAYTAC';
print "BioPerl translation of:\n";
$seq_obj = Bio::Seq->new(-seq => $nuc_str);
print $seq_obj->seq();
print "\n\n";
print "Sequence object's translation method:\n";
print $seq_obj->translate()->seq();
print "\n\n";
use Bio::Perl;
print "translate_as_string:\n";
print translate_as_string($nuc_str);
print "\n";
================================================
And the output:
================================================
BioPerl translation of:
NNNTANTARTAGTAYTAC
Sequence object's translation method:
XX**YY
translate_as_string:
XX**YY
================================================
There does seem to be a consensus building here!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:38:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:38:03 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807211538.m6LFc327023466@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:38 EST -------
For comparison, the following is copied from the BioPerl documentation about
their sequence object's translate method. It would be nice to follow some of
the same naming conventions for any optional arguments.
http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods
If we want to translate full coding regions (CDS) the way major nucleotide
databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform
more checks. Specifically, translate() needs to confirm that the sequence has
appropriate start and terminator codons at the very beginning and the very end
of the sequence and that there are no terminator codons present within the
sequence in frame 0. In addition, if the genetic code being used has an
atypical (non-ATG) start codon, the translate() method needs to convert the
initial amino acid to methionine. These checks and conversions are triggered by
setting ``complete'' to 1:
$prot_obj = $my_seq_object->translate(-complete => 1);
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:41:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:41:47 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211541.m6LFflk5023670@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:41 EST -------
For reference, using the older Bio.Translate approach suffers the same
limitation (which is not surprising if you consider they both use the same
CodonTable objects internally):
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> from Bio import Translate
>>> standard_translator = Translate.ambiguous_dna_by_id[1]
The clear cut cases are fine,
>>> standard_translator.translate(Seq("TAR", IUPAC.ambiguous_dna))
Seq('*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> standard_translator.translate(Seq("TAY", IUPAC.ambiguous_dna))
Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*'))
When the codon could be an amino acid or a stop, we raise an exception:
>>> standard_translator.translate(Seq("NNN", IUPAC.ambiguous_dna))
Traceback (most recent call last):
...
Bio.Data.CodonTable.TranslationError: NNN
>>> standard_translator.translate(Seq("TAN", IUPAC.ambiguous_dna))
Traceback (most recent call last):
...
Bio.Data.CodonTable.TranslationError: TAN
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 07:32:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 07:32:10 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807221132.m6MBWAAF016950@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-07-22 07:32 EST -------
(In reply to comment #27)
> Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit
> repetitive. Might a sub-function help here?
I thought about that, but each time the repetitive code is slightly different,
and I wonder if the end result will be clearer than what we have now.
> Also, I was wondering if you managed to fix Bug 2446 as a nice bonus.
I am planning to do so. I am checking with the polyphred people if the COMMENT
blocks are really intended and are here to stay (note that the polyphred
version that writes these COMMENT blocks is a beta version). Will update the
code once I hear back from them.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Tue Jul 22 07:38:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 22 Jul 2008 04:38:13 -0700 (PDT)
Subject: [Biopython-dev] Bio.KDTree
Message-ID: <108429.69921.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't compile cleanly on all platforms (for example it is missing in the Biopython installer for Python 2.3 on Windows); some platforms don't even have a C++ compiler. For this reason, setup.py asks the user each time if Bio.KDTree should be compiled. Does anybody (Thomas?) mind if I convert this code to plain C? That would be a nice weekend project. Then we can get rid of the question in setup.py, and Bio.KDTree can be made available on all platforms.
--Michiel.
From biopython at maubp.freeserve.co.uk Tue Jul 22 12:13:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Jul 2008 17:13:34 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
Message-ID: <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
On June 27, Michiel wrote:
> ..., there is also a set of interconnected modules where it's not 100%
> clear if they can be removed without causing some surprises:
> Bio.builders
> Bio.config
> Bio.dbdefs
> Bio.formatdefs
> Bio.dbdefs
> Bio.expressions
> Bio.FormatIO [already deprecated and removed]
> Bio.Std
> Bio.StdHandler
> It is probably OK to remove these, since these were deprecated we did
> not get a barrage of complaints from our users. Personally, I think it is
> important to keep the code base clean, so I am in favor of removing
> these (and see if anybody complains; in that case, we can always put
> these modules back in and make a new release). But I can live with
> keeping these modules for another release round. If anybody thinks
> that that would be better, please let us know.
Bio.expressions was already deprecated, and seems to be a dependency
of the following modules, which I have now explicitly deprecated in
CVS:
Bio.expressions (deprecated in Biopython 1.44)
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.dbdefs
It probably would be fine to remove these five modules now
(Bio.expressions, Bio.config, Bio.dbdefs, Bio.formatdefs and
Bio.dbdefs), since the indirect warning from Bio.expressions should
have alerted anyone who was using them. Or we can ship one more
release with them included?
Moving on, Bio.Std and Bio.StdHandler appear to be used by:
- Bio.expressions (deprecated in Biopython 1.44)
- Bio.config (now deprecated in CVS)
- Bio.builders (used by Mindy)
- Bio.Mindy (used by Bio.config which is now deprecated)
As far as I can tell, other historic usage of Mindy (e.g. in Bio.Fasta
and Bio.GenBank) has already been deprecated and removed. I think it
would therefore also be safe to deprecate these four together
(Bio.expressions, Bio.config, Bio.builders and Bio.Mindy), or start by
deprecating Bio.Mindy on its own.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 12:29:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 12:29:27 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807221629.m6MGTRuo002799@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-22 12:29 EST -------
Frank,
Would you mind if I removed this print statement from the add_sequence()
method?:
print "WARNING: Sequence name %s is already present. Sequence was added as %s."
% (name,unique_name)
I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to
write alignments, without getting warnings printed out.
Thanks
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 22 12:33:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Jul 2008 17:33:53 +0100
Subject: [Biopython-dev] Bio.KDTree
In-Reply-To: <108429.69921.qm@web62404.mail.re1.yahoo.com>
References: <108429.69921.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00807220933v1e6125a7lcb91b963a5dd5195@mail.gmail.com>
On Tue, Jul 22, 2008 at 12:38 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't
> compile cleanly on all platforms (for example it is missing in the Biopython
> installer for Python 2.3 on Windows); some platforms don't even have a C++
> compiler. For this reason, setup.py asks the user each time if Bio.KDTree
> should be compiled. Does anybody (Thomas?) mind if I convert this code to
> plain C? That would be a nice weekend project. Then we can get rid of the
> question in setup.py, and Bio.KDTree can be made available on all platforms.
If you want to spend your weekend doing this, it does sounds like a
worthwhile incremental improvement to Biopython - and should simplify
the build process which is great.
Peter
P.S.
Have you noticed Bug 2489 "KDTree NN search without specifying radius"?
http://bugzilla.open-bio.org/show_bug.cgi?id=2489
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 19:50:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 19:50:31 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807222350.m6MNoVXd024298@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #15 from mmokrejs at ribosome.natur.cuni.cz 2008-07-22 19:50 EST -------
(In reply to comment #5)
> Another bonus for people who think OO, is doing dir(my_seq) would
> list these useful methods. Right now the user has to know to go
> looking in the Bio.Seq module for a function.
I do this quite often and this is a weak point in current biopython. Good
catch!
Regarding the back_translate, I don't use it but people ask for it often so
don't remove it. Otherwise I won't know where else to get this functionality.
;-)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 20:05:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 20:05:09 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807230005.m6N059QE025415@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #12 from fkauff at biologie.uni-kl.de 2008-07-22 20:05 EST -------
Peter,
No problem.
Cheers,
Frank
(In reply to comment #11)
> Frank,
>
> Would you mind if I removed this print statement from the add_sequence()
> method?:
>
> print "WARNING: Sequence name %s is already present. Sequence was added as %s."
> % (name,unique_name)
>
> I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to
> write alignments, without getting warnings printed out.
>
> Thanks
>
> Peter
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 07:49:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 07:49:33 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807231149.m6NBnX4P014410@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 07:49 EST -------
(In reply to comment #12)
> Peter,
>
> No problem.
>
> Cheers,
> Frank
Great. I've removed that print statement (and tweaked a few doc strings) in
CVS.
Checking in Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.19; previous revision: 1.18
done
I'm just working on some alphabet stuff before adding support to write "nexus"
format files with Bio.SeqIO and Bio.AlignIO
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 08:33:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 08:33:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807231233.m6NCXAk6018007@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 08:33 EST -------
Fixed in CVS - you can now write Nexus files using Bio.SeqIO or Bio.AlignIO,
provided the alphabet is declared as DNA, RNA or protein. You cannot use
generic alphabets or just nucleotide alphabets.
Multiple files have been changed, so a complete CVS update is the best way to
test this before the next release of Biopython.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 10:12:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 10:12:38 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200807231412.m6NECc33027073@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-23 10:12 EST -------
I recently got some code that is supposed to be able to deal with labeled nodes
(probably from the author of this bug - can't check now, as I'm traveling and
don't have access to the files). haven't looked at or tested the code yet, but
will do soon when I'm back.
Frank
(In reply to comment #1)
> This sounds like a job for Frank (the Bio.Nexus module author).
>
> Can I ask if you've actually come across trees with names ancestor nodes in
> "real life"? That would make this bug more important. If so, the name of the
> tool would be interesting, an example tree file would be great to add to
> Biopython as a test case.
>
> If on the other hand the only named ancestor tree you've ever tried is the
> example from the Newick documentation, this doesn't seem such a high priority
> (but still worth fixing).
>
> Peter
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Jul 24 07:41:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 24 Jul 2008 12:41:41 +0100
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
Message-ID: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
Hi all,
We (Michiel) deprecated the Bio.WWW.* modules in Biopython 1.45, after
relocating most of the functionality:
Bio.WWW.ExPASy -> Bio.ExPASy
Bio.WWW.InterPro -> Bio.InterPro
Bio.WWW.NCBI -> Bio.Entrez
Bio.WWW.SCOP -> Bio.SCOP
Now that the deprecation warnings have been in place for a couple of
releases, I'd like to remove the four Bio.WWW.* modules, and leave
just Bio/WWW/__init__.py with a deprecation warning telling people
where to look for the relocated code.
Any comments or objections?
Peter
From mjldehoon at yahoo.com Thu Jul 24 20:19:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 24 Jul 2008 17:19:33 -0700 (PDT)
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
In-Reply-To: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
Message-ID: <502434.4415.qm@web62406.mail.re1.yahoo.com>
Note that Bio.WWW.__init__.py contains some code that is used in other modules. Most (but not all) of these modules are deprecated themselves. For the non-deprecated modules, it's probably easiest to just copy the code from Bio.WWW.__init__.py over to avoid having to import Bio.WWW.
--Michiel.
--- On Thu, 7/24/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
> To: "BioPython-Dev Mailing List"
> Date: Thursday, July 24, 2008, 7:41 AM
> Hi all,
>
> We (Michiel) deprecated the Bio.WWW.* modules in Biopython
> 1.45, after
> relocating most of the functionality:
>
> Bio.WWW.ExPASy -> Bio.ExPASy
> Bio.WWW.InterPro -> Bio.InterPro
> Bio.WWW.NCBI -> Bio.Entrez
> Bio.WWW.SCOP -> Bio.SCOP
>
> Now that the deprecation warnings have been in place for a
> couple of
> releases, I'd like to remove the four Bio.WWW.*
> modules, and leave
> just Bio/WWW/__init__.py with a deprecation warning telling
> people
> where to look for the relocated code.
>
> Any comments or objections?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Fri Jul 25 06:31:49 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 25 Jul 2008 11:31:49 +0100
Subject: [Biopython-dev] Updating the installation instructions
Message-ID: <320fb6e00807250331k47ec64dcoe246933f0d02682b@mail.gmail.com>
As Nick Matzke has pointed out,
http://biopython.org/DIST/docs/install/Installation.html and
http://biopython.org/DIST/docs/install/Installation.pdf are somewhat
out of date.
I've updated the source LaTeX file in CVS to cover python 2.5 being
the latest stable python, mxTextTools is now optional (but 2.0 is
preferred over 3.0), and removed the bits about the "Classic" Mac (pre
OS X).
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/install/Installation.tex?cvsroot=biopython
The reportlab instructions probably need updating too - although we
should double check if everything is happy with ReportLab 2 as part of
this.
If anyone wants to skim over the revised version and look for anything
I've missed or other improvements that would be great.
Peter
From biopython at maubp.freeserve.co.uk Fri Jul 25 07:21:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 25 Jul 2008 12:21:31 +0100
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
In-Reply-To: <502434.4415.qm@web62406.mail.re1.yahoo.com>
References: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
<502434.4415.qm@web62406.mail.re1.yahoo.com>
Message-ID: <320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com>
On Fri, Jul 25, 2008 at 1:19 AM, Michiel de Hoon wrote:
> Note that Bio.WWW.__init__.py contains some code that is used in other modules.
> Most (but not all) of these modules are deprecated themselves. For the
> non-deprecated modules, it's probably easiest to just copy the code from
> Bio.WWW.__init__.py over to avoid having to import Bio.WWW.
Good catch - I didn't do my recursive grep correctly. The file
Bio/WWW/__init__.py just contains a RequestLimiter class, and this is
currently used in:
Bio/Blast/NCBIWWW.py (used in qblast, simple to recode as in Bio.Entrez)
Bio/config/_support.py (completely deprecated)
Bio/Prosite/__init__.py (in the deprecated ExPASyDictionary class)
Bio/SwissProt/SProt.py (in the deprecated ExPASyDictionary class)
Note I have just updated Bio.Prosite and Bio.SwissProt to use
Bio.ExPASy rather than Bio.WWW.ExPASy which means we can delete the
deprecated Bio/WWW/ExPASy.py, InterPro.py, NCBI.py and SCOP.py now.
Peter
From bugzilla-daemon at portal.open-bio.org Sat Jul 26 18:05:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 26 Jul 2008 18:05:24 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807262205.m6QM5Ow9021435@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-26 18:05 EST -------
Checking in Bio/Alphabet/IUPAC.py;
/home/repository/biopython/biopython/Bio/Alphabet/IUPAC.py,v <-- IUPAC.py
new revision: 1.3; previous revision: 1.2
done
Checking in Bio/Data/IUPACData.py;
/home/repository/biopython/biopython/Bio/Data/IUPACData.py,v <-- IUPACData.py
new revision: 1.5; previous revision: 1.4
done
Checking in Tests/test_seq.py;
/home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py
new revision: 1.15; previous revision: 1.14
done
Marking as fixed, although still ideally needs the MW filled in for U and O.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 11:30:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 11:30:37 -0400
Subject: [Biopython-dev] [Bug 2550] New: Alphabet problems when adding
sequences
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
Summary: Alphabet problems when adding sequences
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
#Create three sequences as Seq objects,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
#Now try adding them together...
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
95, in contains
return other.gap_char == self.gap_char and \
AttributeError: DNAAlphabet instance has no attribute 'gap_char'
I would expect to get:
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
Similar example, but using proteins
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
#Now try adding these together...
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
110, in contains
return other.stop_symbol == self.stop_symbol and \
AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol'
Here is an example of a more reasonable failure,
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
80, in __add__
raise TypeError, ("incompatable alphabets", str(self.alphabet),
TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')",
"Gapped(IUPACUnambiguousDNA(), '.')")
I am OK with this failing with a TypeError. However, one might argue that
reverting to a generic DNA alphabet with no declared alphabet was desirable:
Seq("AC-TGAC.TG", DNAAlphabet()))
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 11:59:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 11:59:50 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271559.m6RFxoej018165@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 11:59 EST -------
Trying to fix this by chaning the Alphabet and AlphabetEncoder classes'
contains method only is nasty, and wouldn't cover situations like this:
p = Seq("PKL-PAK", Gapped(generic_protein,"-"))
q = Seq("ADKS*", HasStopCodon(generic_protein,"*"))
where you might expect something like:
p+q == Seq("PKL-PAKADKS*", HasStopCodon(Gapped(generic_protein,"-"),"*")
Taken literally, neither of these two alphabets contains the other - but there
is a fairly obvious consensus alphabet! I think the best solution would
require changes to the Seq object's add method to pick a consensus alphabet in
the non-simple cases where one alphabet is clearly a sub-set of the other.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 14:54:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 14:54:01 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271854.m6RIs1wZ025718@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:54 EST -------
Created an attachment (id=977)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=977&action=view)
Patch to Bio/Seq.py and Bio/Alphabet/__init__.py
This uses some (private) alphabet functions in Bio/Alphabet/__init__.py (where
I have already put a few bits extracted from or used by Bio.Align and
Bio.AlignIO), and makes the old Alphabet .contains method effectively obsolete.
Test case update to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 14:56:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 14:56:47 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271856.m6RIulpl025828@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:56 EST -------
Created an attachment (id=978)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=978&action=view)
Patches for test_seq.py and test_GACrossover.py
Adds a new block of tests to test_seq.py to explicitly check a number of
different alphabet combinations.
Also tweaks test_GACrossover.py to define its test alphabet as a subclass of a
suitable generic class in Bio.Alphabet, as otherwise it is not recognised as a
valid alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:06:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 15:06:22 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271906.m6RJ6MBk026364@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 15:06 EST -------
With the patch, repeating the example in my comment 0,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+c
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
i.e. All the above additions work now.
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*'))
These work too.
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 199,
in _consensus_alphabet
raise ValueError("More than one gap character present")
ValueError: More than one gap character present
The error message has changed (and is more explicit), but I think this is a
real failure case.
Then based on the example in my comment 1,
>>> p = Seq("PKL-PAK", Alphabet.Gapped(Alphabet.generic_protein,"-"))
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> p+q
Seq('PKL-PAKADKS*', HasStopCodon(Gapped(ProteinAlphabet(), '-'), '*'))
This works now too.
One final example of a valid failure:
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> r = Seq("SRFG@", Alphabet.HasStopCodon(Alphabet.generic_protein,"@"))
>>> q+r
Traceback (most recent call last):
File "", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 208,
in _consensus_alphabet
raise ValueError("More than one stop symbol present")
ValueError: More than one stop symbol present
I'd be grateful if anyone could test this, or comment on the code. While
adding private functions to Bio.Alphabet is a reasonable short term solution
(and means we can change arguments and names without breaking people's
scripts!), some of this functionality might be best exposed publically.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:26:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:26:03 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200807280926.m6S9Q3Cn032456@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #943 is|0 |1
obsolete| |
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:26 EST -------
(From update of attachment 943)
Checked in as part of
Bio/Align/Generic.py revision 1.10
Adding __len__ would also be sensible, and perhaps __nonzero__ (which could
check the number of rows AND columns).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:37:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:37:27 -0400
Subject: [Biopython-dev] [Bug 2551] New: Adding advanced __getitem__ to
generic alignment, e.g. align[1:2, 5:-5]
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2551
Summary: Adding advanced __getitem__ to generic alignment, e.g.
align[1:2,5:-5]
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BugsThisDependsOn: 2507
I'm filing this as a separate sub-issue from Bug 1944. The idea is to enhance
the minimal __getitem__ method now in CVS to allow accessing of rows
(sequences), columns, or sub-alignments.
A possible __getitem__ doc string:
Depending on the indices, you can get a SeqRecord object
(representing a single row), a Seq object (for a single columns),
a string (for a single characters) or another alignment
(representing some part or all of the alignment).
align[r,c] gives a single character as a string
align[r] gives a row as a SeqRecord
align[r,:] gives a row as a SeqRecord
align[:,c] gives a column as a Seq (using the alignment's alphabet)
align[:] and align[:,:] give a copy of the alignment
Anything else gives a sub alignment, e.g.
align[0:2] or align[0:2,:] uses only row 0 and 1
align[:,1:3] uses only columns 1 and 2
align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2
Doing this nicely will build on adding annotation aware slicing support to the
SeqRecord, which is Bug 2507.
There is some __getitem__ code on Bug 1944 Attachment 732 and Bug 1944
Attachment 770.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:37:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:37:29 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200807280937.m6S9bTY8000615@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |2551
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:48:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:48:56 -0400
Subject: [Biopython-dev] [Bug 2552] New: Adding alignments
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2552
Summary: Adding alignments
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
This is related to the very broad alignment bug 1944.
Given two alignments, it can make sense to talk about adding them together.
However we can either add by row, or by column.
e.g. Consider this alignment, a
DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
Doing a+a by column would give:
DNAAlphabet() alignment with 3 rows and 28 columns
ACGATCAGCTAGCTACGATCAGCTAGCT Alpha
CCGATCAGCTAGCTCCGATCAGCTAGCT Beta
ACGATGAGCTAGCTACGATGAGCTAGCT Gamma
This sort of operation is often done to combined alignments from multiple genes
(after first sorting the rows to ensure the species names are in the same
order). To implement this would ideally require the ability to add SeqRecord
objects together, doing something sensible with the annotation and in
particular the identifies.
Doing a+a by row would give:
DNAAlphabet() alignment with 6 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
This particular example, a+a, is perhaps unrealistic due to the repeated
identifiers, but I imagine there are some real use cases for this operation.
More generally, suppose we have two alignments a and b. Treating each
alignment as a list of SeqRecord objects, you might expect:
a.extend(b) -> addition by row
a+b -> addition by row
However, I would suggest for alignment objects:
a.extend(b) -> addition by row, requires sequence all be same length (same
number of columns)
a+b -> addition by column, requires same number of sequences (rows)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:53:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:53:34 -0400
Subject: [Biopython-dev] [Bug 2553] New: Adding SeqRecord objects to an
alignment (append or extend)
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
Summary: Adding SeqRecord objects to an alignment (append or
extend)
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Currently a Bio.Align.Generic.Alignment object stores the rows as SeqRecord
objects, but only exposes a public API for adding row sequences as strings.
As suggested on Bug 1944, it would make sense to treat the Alignment as a list
of SeqRecord objects and therefore support the list methods .append() and
.extend() for the addition of more rows as SeqRecord objects.
I would make the .append() method enforce the expectation that all rows are the
same length, and that the new sequence's alphabet is compatible with the
declared alignment alphabet.
See also Bug 2552 - Adding alignments
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:57:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:57:52 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200807280957.m6S9vqJd001617@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn|2507 |
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:57 EST -------
I've filed bugs on what I think are the remaining issues raised here (Bug
1944), and am now closing this issue (as its getting very long and hard to
follow):
Bug 2551 - The __getitem__ method (accessing part of the alignment as an
character string, row, column or sub-alignment).
Bug 2552 - Adding alignments
Bug 2553 - Adding SeqRecord objects to an alignment (append or extend)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:57:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:57:54 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200807280957.m6S9vspm001632@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO|1944 |
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:13:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:13:38 -0400
Subject: [Biopython-dev] [Bug 2554] New: Creating an Alignment from a list
of SeqRecord objects
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
Summary: Creating an Alignment from a list of SeqRecord objects
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BugsThisDependsOn: 2553
It would be nice to be able to supply a list (or iterator) of SeqRecord objects
when creating an alignment object. This would also make the
Bio.SeqIO.to_alignment() function obsolete.
Currently, the __init__ method takes just an alphabet:
def __init__(self, alphabet):
"""Initialize a new Alignment object.
Arguments:
o alphabet - The alphabet to use for the sequence objects that are
created. This alphabet must be a gapped type.
"""
#...
My plan is to accept a list of SeqRecord objects (possibly empty) and an
optional alphabet. If the alphabet is omitted, a consensus can be determined
from the SeqRecord alphabets.
This can be made backwards compatible:
def __init__(self, records, alphabet=None):
"""Initialize a new Alignment object.
Arguments:
records - A list (or iterator) of SeqRecord objects, whose sequences
are all the same length. This an be an empy list.
alphabet - The alphabet for the whole alignment, typically a gapped
alphabet, which should be a superset of the individual
record alphabets. If ommited, a consensus alphabet is used.
"""
if not (isinstance(records, Alphabet.Alphabet) \
or isinstance(records, Alphabet.AlphabetEncoder)):
if alphabet is None :
#Backwards compatible mode!
alphabet = records
records = []
else :
raise ValueError("Invalid records argument")
#...
I would expect the implementation to depend on Bug 2553 - Adding SeqRecord
objects to an alignment (append or extend).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:13:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:13:41 -0400
Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment
(append or extend)
In-Reply-To:
Message-ID: <200807281013.m6SADf6o002429@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |2554
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:49:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:49:45 -0400
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
SeqRecord objects
In-Reply-To:
Message-ID: <200807281049.m6SAnjbE003984@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:49 EST -------
There is an unwanted "not" in the code snippet in comment 0.
Here is a preliminary implementation of the revised __init__ method plus append
and extend (Bug 2533):
def __init__(self, records, alphabet=None):
"""Initialize a new Alignment object.
Arguments:
records - A list (or iterator) of SeqRecord objects, whose sequences
are all the same length. This an be an empty list.
alphabet - The alphabet for the whole alignment, typically a gapped
alphabet, which should be a super-set of the individual
record alphabets. If omitted, a consensus alphabet is used.
NOTE - Earlier versions of Biopython only accepted a single argument,
an alphabet. This is still supported via a backwards compatible
"hack" so as not to disrupt existing scripts and users.
"""
if isinstance(records, Alphabet.Alphabet) \
or isinstance(records, Alphabet.AlphabetEncoder):
if alphabet is None :
#Backwards compatible mode!
alphabet = records
records = []
else :
raise ValueError("Invalid records argument")
if alphabet is not None :
if not (isinstance(alphabet, Alphabet.Alphabet) \
or isinstance(alphabet, Alphabet.AlphabetEncoder)):
raise ValueError("Invalid alphabet argument")
self._alphabet = alphabet
else :
#Default while we add sequences, will take a consensus later
self._alphabet = Alphabet.single_letter_alphabet
self._records = []
self.extend(records)
if alphabet is None :
#No alphabet was given, take a consensus alphabet
#TODO - Use a generator expression once we drop python 2.3:
self.alphabet = Alphabet._consensus_alphabet([rec.seq.alphabet for
\
rec in
self._records])
self._records = []
def extend(self, records) :
"""Add more SeqRecord objects to the alignment as rows.
They must all have the same length as the original alignment, and have
alphabets compatible with the alignment's alphabet."""
for rec in records :
self.append(rec)
def append(self, record) :
"""Add one more SeqRecord object to the alignment as a new row.
This must have the same length as the original alignment (unless this
is
the first record), and have an alphabet compatible with the alignment's
alphabet."""
if not isinstance(record, SeqRecord) :
raise TypeError("New sequence is not a SeqRecord object")
if self._records and len(record) <> self.get_alignment_length() :
raise ValueError("New sequence is not of length %i" \
% self.get_alignment_length())
#Using not self._alphabet.contains(record.seq.alphabet) needs fixing
#for AlphabetEncoders (e.g. gapped versus ungapped).
if not Alphabet._are_alphabets_compatible(self._alphabet, \
record.seq.alphabet) :
raise ValueError("New sequence's alphabet is incompatible")
self._records.append(record)
The unit tests look fine with this addition. Of course, new tests to verify
this functionality explicitly should then be added (and I could take advantage
of this in Bio.AlignIO too).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:54:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:54:12 -0400
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
SeqRecord objects
In-Reply-To:
Message-ID: <200807281054.m6SAsClZ004173@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:54 EST -------
Regarding the code in comment 1, the private function
_are_alphabets_compatible() isn't in CVS, its something I was playing with on
Bug 2550 - Alphabet problems when adding sequences.
However, I hope that this conveys my overall intention for the Alignment
object.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 29 22:22:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 29 Jul 2008 22:22:59 -0400
Subject: [Biopython-dev] [Bug 2557] New: AlignIO::write fails when
delegating to SeqIO::write
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2557
Summary: AlignIO::write fails when delegating to SeqIO::write
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: rsuri at cs.utexas.edu
In line 185 of "biopython/Bio/AlignIO/__init__.py" in the current CVS version,
there's a call to SeqIO::write with only 2 arguments instead of the required 3
["SeqIO.write(alignment.get_all_seqs(), format)"] should be
["SeqIO.write(alignment.get_all_seqs(), handle, format)"] (i.e. pass the handle
object).
I know this happens when trying to output to FASTA format, and it appears to do
so more generally whenever the SeqIO module can be used for output.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 29 22:36:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 29 Jul 2008 22:36:07 -0400
Subject: [Biopython-dev] [Bug 2558] New: AlignIO nexus parsing chokes on
superfluous comma
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
Summary: AlignIO nexus parsing chokes on superfluous comma
Product: Biopython
Version: 1.47
Platform: All
URL: http://www.cs.utexas.edu/~rsuri/M3579.NX
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: rsuri at cs.utexas.edu
The URL above points to a nexus file (also available from TreeBase with Matrix
accession #M3579) that causes BioPython to raise an error when reading it with
the AlignIO module. In the "Trees" section of the input file, the final taxon
("Lecanorales") has a trailing comma that causes BioPython to fail (search for
the line beginning with "59"). I've verified that manually deleting the
offending comma is a valid workaround.
I don't know what the nexus format specification says, but this is poor form
for BioPython, in my opinion. It seems reasonable enough to allow for some
slack like this when reading formatted files.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 04:55:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 04:55:42 -0400
Subject: [Biopython-dev] [Bug 2557] AlignIO::write fails when delegating to
SeqIO::write
In-Reply-To:
Message-ID: <200807300855.m6U8tgLU019854@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2557
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 04:55 EST -------
That's embarrassing for me! Bug confirmed and fixed in CVS.
I've used a very slightly simpler fix, taking advantage of the fact that you
can iterate for the SeqRecords within an alignment:
SeqIO.write(alignment, handle, format)
I've also updated the Bio.AlignIO unit test to cover writing a couple of the
formats supported via Bio.SeqIO ("fasta" and "tab"), although it might make
sense to try all of them...
Checking in Bio/AlignIO/__init__.py;
/home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <--
__init__.py
new revision: 1.10; previous revision: 1.9
done
Checking in Tests/test_AlignIO.py;
/home/repository/biopython/biopython/Tests/test_AlignIO.py,v <--
test_AlignIO.py
new revision: 1.12; previous revision: 1.11
done
Checking in Tests/output/test_AlignIO;
/home/repository/biopython/biopython/Tests/output/test_AlignIO,v <--
test_AlignIO
new revision: 1.10; previous revision: 1.9
done
Thank you for reporting this oversight,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 05:23:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 05:23:59 -0400
Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with
superfluous comma
In-Reply-To:
Message-ID: <200807300923.m6U9Nx8l021492@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|AlignIO nexus parsing chokes|Bio.Nexus chokes on
|on superfluous comma |TRANSLATE block with
| |superfluous comma
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 05:23 EST -------
This is an issue in the Bio.Nexus module, so its a job for Frank.
Do you know if this affects all the NEXUS files from www.treebase.org? I've
tried downloading several trees, but their FTP site is just timing out for me.
According to http://www.treebase.org/treebase/submit.html the request trees be
uploaded in the NEXUS file format so its possible that just a minority of their
trees have this trailing comma.
Note that this may be an invalid file (a TRANSLATE block with trailing comma),
but as you say it looks relatively straight forward to cope with. However, I
have had a quick look at the Bio.Nexus code, and I don't entirely understand
what Frank's parser is doing here - so its not going to be a quick fix from me.
Quick bit of python to show the stack trace:
>>> from Bio.Nexus import Nexus
>>> n = Nexus(open("M3579.NX"))
Traceback (most recent call last):
File "", line 1, in
TypeError: 'module' object is not callable
>>> n = Nexus.Nexus(open("M3579.NX"))
Traceback (most recent call last):
File "", line 1, in
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 552, in __init__
self.read(input)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 614, in read
self._parse_nexus_block(title, contents)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 655, in _parse_nexus_block
getattr(self,'_'+line.command)(line.options)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 922, in _translate
raise NexusError,'Format error in line %s.' % options
Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides',
2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4
'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6
'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8
'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10
'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12
'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14
'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16
'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens',
19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21
'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24
'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27
'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30
'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33
'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36
'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39
'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42
'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45
'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48
'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51
'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54
'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57
'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 08:57:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 08:57:00 -0400
Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with
superfluous comma
In-Reply-To:
Message-ID: <200807301257.m6UCv0co031445@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-30 08:57 EST -------
I'm all for a little bit of slack in parsers, but this looks in my opinion like
a straightforward syntax error in the nexus file. I work with nexus files
daily, and have never encountered such a trailing comma. What really confuses
me is that there are 58 taxa in the data set, and no. 59 Lecanorales is in
addition, with no data and no occurence in the tree. I don't think this is
proper nexus format.
Frank
(In reply to comment #1)
> This is an issue in the Bio.Nexus module, so its a job for Frank.
>
> Do you know if this affects all the NEXUS files from www.treebase.org? I've
> tried downloading several trees, but their FTP site is just timing out for me.
> According to http://www.treebase.org/treebase/submit.html the request trees be
> uploaded in the NEXUS file format so its possible that just a minority of their
> trees have this trailing comma.
>
> Note that this may be an invalid file (a TRANSLATE block with trailing comma),
> but as you say it looks relatively straight forward to cope with. However, I
> have had a quick look at the Bio.Nexus code, and I don't entirely understand
> what Frank's parser is doing here - so its not going to be a quick fix from me.
>
>
> Quick bit of python to show the stack trace:
>
> >>> from Bio.Nexus import Nexus
> >>> n = Nexus(open("M3579.NX"))
> Traceback (most recent call last):
> File "", line 1, in
> TypeError: 'module' object is not callable
> >>> n = Nexus.Nexus(open("M3579.NX"))
> Traceback (most recent call last):
> File "", line 1, in
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 552, in __init__
> self.read(input)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 614, in read
> self._parse_nexus_block(title, contents)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 655, in _parse_nexus_block
> getattr(self,'_'+line.command)(line.options)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 922, in _translate
> raise NexusError,'Format error in line %s.' % options
> Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides',
> 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4
> 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6
> 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8
> 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10
> 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12
> 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14
> 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16
> 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens',
> 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21
> 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24
> 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27
> 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30
> 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33
> 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36
> 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39
> 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42
> 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45
> 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48
> 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51
> 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54
> 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57
> 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 31 11:58:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 Jul 2008 11:58:08 -0400
Subject: [Biopython-dev] [Bug 2560] New: Adding BLAST support to Bio.AlignIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2560
Summary: Adding BLAST support to Bio.AlignIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
I think it can sometimes be useful to regard a BLAST output file as a series of
pairwise alignments - and therefore it makes sense to add it to Bio.AlignIO and
another input file format.
http://biopython.org/wiki/AlignIO
Note that the AlignIO API will not allow any "clumping" of the pairwise
alignments (or HSPs in Blast terminology) according to the query or the target
sequence - you just get them all one after the other.
I will attach a rough Bio/AlignIO/BlastIO.py file which attempts to mimic the
naming conventions in the fasta-m10 parser. This currently using Bio.Blast to
do the actual parsing, and then just using the Blast results to build alignment
objects with two sequences each.
I suggest using the format names "blast" and "blastxml" for the plain text and
XML output formats following BioPerl (although I would prefer "blast-xml" to
"blastxml"), see http://www.bioperl.org/wiki/HOWTO:SearchIO#Design
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 31 12:00:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 Jul 2008 12:00:23 -0400
Subject: [Biopython-dev] [Bug 2560] Adding BLAST support to Bio.AlignIO
In-Reply-To:
Message-ID: <200807311600.m6VG0NAq021299@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2560
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-31 12:00 EST -------
Created an attachment (id=980)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=980&action=view)
New file Bio/AlignIO/BlastIO.py
The included "self test" just parses all the unit tests (excluding the
PSI-Blast and HTML files).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 08:36:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 04:36:33 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST -------
Just uploaded a new Nexus.py to CVS.
First, the taxlabels command in a taxa block is now ignored. For a standard
nexus file, taxon labels are in the matrix, and a taxon block is irrelevant.
The only exception are transposed matrices, which are not supported by Nexus.py
anyway.
Without the added confusion of a separate taxlabels command, it is now fairly
easy to deal with duplicate names. Both self.taxlabels and self.matrix now
carry the same set of unique taxon names.
All example files seem to work fine for me. unless I hear otherwise, I close
this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:01:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:01:29 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST -------
Does this mean that there will be no way to see the original non-unique names
from within Bio.Nexus? I agree they are a pain, but it would be nice to
preserve them.
I haven't read the Nexus specs (restricted article), but does this comment on
the issue of repeated identifiers?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:13:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:13:02 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST -------
Yes, the original non-unique names are currently not preserved. It would be
fairly easy to keep them, if desired.
The NEXUS specs (Maddison et al.) state that unique names "should be avoided if
this might cause ambiguity", which imho they always do. But I experienced that
sometimes names become identical due to truncation etc, so I needed a way to
deal with it instead of just throwing an error.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 13:16:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 09:16:57 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST -------
I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
everyone (instead creating their own uppercase-lowercase variants of those
terribly complicated biopython alphabet classes), and easy to change for all
other modules if lowercase-uppercase is what they want (or need).
Nexus.py and Phd.py certainly need to allow lowercase characters, as this is
very common.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 15:56:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:56:03 -0400
Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Summary: Support for simple "tab" format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Requested on the mailing list by Giovanni Marco Dall'Olio:
http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html
See BioPerl:
http://www.bioperl.org/wiki/Tab_sequence_format
Suggested implementation to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 15:57:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:57:26 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST -------
Created an attachment (id=962)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view)
New file Bio/SeqIO/TabIO.py
Treats the first field as the record's .id (and .name)
Treats the second field as the record's sequence.
When writing, uses only record.id and record.seq
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 16:00:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 12:00:59 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST -------
Created an attachment (id=963)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view)
Patch to add the "tab" format to Bio.SeqIO and update the unit test output
The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format.
Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the
updated output from test_SeqIO.py (trivial to regenerate with "python
run_tests.py -g test_SeqIO.py").
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:33:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 11:33:35 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Hello Michiel et al.,
I've already added a few if statements to the end of
Bio.Entrez._open() to catch a few errors I'd observed, and I've just
found another example:
>>> from Bio import Entrez
>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
'\n'
>>> Entrez.efetch("nucleotide", id="fiction").read()
'\n'
This seems to happen for any invalid identifier. Are you happy for me
to check for this as an error too? Are there any valid reasons to get
back an empty dataset like this?
Also, I was wondering if we should raise a ValueError rather than
IOError if we are fairly sure the problem is with the arguments rather
than the network or the sever being unavailable.
Peter
From sdavis2 at mail.nih.gov Wed Jul 2 11:18:43 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 Jul 2008 07:18:43 -0400
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote:
> Hello Michiel et al.,
>
> I've already added a few if statements to the end of
> Bio.Entrez._open() to catch a few errors I'd observed, and I've just
> found another example:
>
>>>> from Bio import Entrez
>>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
> '\n'
>>>> Entrez.efetch("nucleotide", id="fiction").read()
> '\n'
>
> This seems to happen for any invalid identifier. Are you happy for me
> to check for this as an error too? Are there any valid reasons to get
> back an empty dataset like this?
If the ability to use history is added, then an empty dataset could be
a valid return after an empty search. For id-based-searches, I'm not
sure I would raise an error for an empty set being returned anyway.
Just my $0.02.
Sean
> Also, I was wondering if we should raise a ValueError rather than
> IOError if we are fairly sure the problem is with the arguments rather
> than the network or the sever being unavailable.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From biopython at maubp.freeserve.co.uk Wed Jul 2 11:34:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 12:34:32 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
<264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com>
>> This seems to happen for any invalid identifier. Are you happy for me
>> to check for this as an error too? Are there any valid reasons to get
>> back an empty dataset like this?
>
> If the ability to use history is added, then an empty dataset could be
> a valid return after an empty search. ...
Bio.Entrez has always supported the history, its just up to the user
to take advantage of it. I've included an example in the tutorial to
explain how to do this, cut and pasted below:
from Bio import Entrez
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16",
usehistory="y", email="history.user at example.com")
search_results = Entrez.read(search_handle)
search_handle.close()
gi_list = search_results["IdList"]
count = int(search_results["Count"])
assert count == len(gi_list)
session_cookie = search_results["WebEnv"]
query_key = search_results["QueryKey"]
#Now use the history session cookie and query key to download the
results in batchs
batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0,count,batch_size) :
end = min(count, start+batch_size)
print "Going to download record %i to %i" % (start+1, end)
fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta",
retstart=start, retmax=batch_size,
webenv=session_cookie, query_key=query_key,
email="history.user at example.com")
data = fetch_handle.read()
fetch_handle.close()
out_handle.write(data)
out_handle.close()
Feedback on the tutorial or the example is of course welcome.
> For id-based-searches, I'm not sure I would raise an error for an empty
> set being returned anyway.
>
> Just my $0.02.
I was wondering about this kind of thing... maybe some more testing of
these kinds of examples would be in order.
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 13:03:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:03:36 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Hi all,
Do any of you have any comments or feedback on this suggested new
"simple tab separated" format for Bio.SeqIO? To match BioPerl I plan
on calling it the "tab" format - see below.
Any real world example files would be good for the test suite.
One nice thing is it adds another output format, something we're a bit
short of in Bio.SeqIO with only fasta and some alignment formats (now
handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip).
Peter
---------- Forwarded message ----------
From: Peter
Date: Tue, Jul 1, 2008 at 5:06 PM
Subject: Re: [BioPython] Sequence from Fasta
To: dalloliogm at gmail.com
Cc: biopython at biopython.org
Giovanni wrote:
> yes, I think it will be useful to implement.
> I know of people who have written a customized fasta2tab script and
> use it quite frequently, so it would be good to support such a task.
> As you said before this format is commonly used in combination with
> grep/gawk scripts.
I've gone for the simple option about how to parse the first field, its used
as the record identifer (.id) and name only (nothing clever). Here is my
suggested code, which you are welcome to download and try out.
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
If you want to try this yourself you'll need to download the new file
TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to
tell it about the new format (two new lines, see patch).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 13:21:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:21:29 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Hello again Frank,
As Biopython's NEXUS expect, I've got a couple of hopefully trivial
questions about the format, which connect to how best to handle it the
Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/AlignIO
My short questions are:
Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
with more than one #NEXUS line)?
Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
If the answer to either of those is a "yes", then any example files
you could contribute would be very helpful.
I have a more complicated question too, which would help me to resolve Bug 2227:
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
Q3: Given a generic Alignment object (e.g. from one of the other
parsers), can I construct a corresponding Nexus object where the
aligned sequences are used for the matrix? If so, how?
Thank you,
Peter
From mjldehoon at yahoo.com Wed Jul 2 13:30:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file.
Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record.
In that case, Bio.SeqIO looks like a more suitable place for this parser.
The user would see something like this:
>>> from Bio import SeqIO
>>> handle = open("mydatafile.txt")
>>> records = SeqIO.parse(handle, "ig")
>>> records.comment
"This is the overall comment"
>>> for record in records:
# ... record is a SeqRecord.
Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator.
Any objections, anybody?
--Michiel
From biopython at maubp.freeserve.co.uk Wed Jul 2 13:48:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:48:31 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote:
> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
Just to be upfront, I'm not familiar with this format, but I've had a
look at the examples.
> In this format, each sequence has a name and comments, and in addition there can
> also be an overall comment to the file.
OK. This is also the case in other file formats, for example GenBank
files can have free format text file header at the start but we ignore
this.
How would you separate the file header comment from the first record
comment? Some files include what looks like a file header but the
lines all seem to start with "; ". Maybe look for "; LOCUS..."?
Given the whole comment seems to be free format I don't think this is
very nice.
On the other hand, some of the sample inputs includes a number of
lines starting ";; Modified by ..." which would be easy to separate
(one semi colon versus two semi colons). These are clearly file-level
header lines, rather than being part of the first record.
> Currently the parser in Bio.IntelliGenetics stores this information in
> Bio.IntelliGenetics.Record.Record objects (one record per sequence; the
> overall comment is inadvertently added to the first sequence in the file). I
> think it makes more sense to use a SeqRecord for that, and to deprecate
> Bio.IntelliGenetics.Record.Record.
If all the data extracted by the Bio.IntelliGenetics parser could be
dealt with using the SeqRecord parser added to Bio.SeqIO, then yes
deprecating Bio.IntelliGenetics sounds fine.
> In that case, Bio.SeqIO looks like a more suitable place for this parser.
> The user would see something like this:
>>>> from Bio import SeqIO
>>>> handle = open("mydatafile.txt")
>>>> records = SeqIO.parse(handle, "ig")
>>>> records.comment
> "This is the overall comment"
>>>> for record in records:
> # ... record is a SeqRecord.
I see you are using "ig" as the format name, matching EMBOSS. Good :)
http://emboss.sourceforge.net/docs/themes/seqformats/ig
> Because of the overall comment, SeqIO.parse cannot simply return a
> generator function. It must return a full-fledged class, but one with an iterator.
Not necessarily. We can still use a simple generator function and either throw
away the header comment, or included it with the first record (or even
with every
record). If you did create an iterator class, would you make the
header available
as a property of the iterator?
Given the apparently fuzzy boundary between the file header and the first record
header, I would just opt to treat it all as a comment for the first
record. And use a
simple generator function.
Peter
From fkauff at biologie.uni-kl.de Wed Jul 2 14:01:01 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Wed, 02 Jul 2008 16:01:01 +0200
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Message-ID: <486B8A1D.8090806@biologie.uni-kl.de>
Hi Peter,
Peter wrote:
> Hello again Frank,
>
> As Biopython's NEXUS expect, I've got a couple of hopefully trivial
> questions about the format, which connect to how best to handle it the
> Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
> http://biopython.org/wiki/AlignIO
>
> My short questions are:
>
> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
> with more than one #NEXUS line)?
>
As far as I know: no. #NEXUS just indicates the file being a NEXUS file,
the concept of "records" is not part of a nexus file
> Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
>
>
I just had a quick look at the old Maddison et al. introductory paper of
Nexus, and it says that "although the nexus standard does not impose
constraints on the number of blocks, particular programs will". I don't
know of any program that would read more than one data block and keep
both of them.
> If the answer to either of those is a "yes", then any example files
> you could contribute would be very helpful.
>
> I have a more complicated question too, which would help me to resolve Bug 2227:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2227
>
> Q3: Given a generic Alignment object (e.g. from one of the other
> parsers), can I construct a corresponding Nexus object where the
> aligned sequences are used for the matrix? If so, how?
>
Hmmm - not really. Nexus.py does not support "empty" nexus class objects
that could be filled with data (just tried) . But it would actually be a
nice thing to have. I'll put this on my to do list.
Cheers,
Frank
> Thank you,
>
> Peter
>
>
'
From biopython at maubp.freeserve.co.uk Wed Jul 2 14:01:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:01:13 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com>
Hello again,
Interestingly the IntelliGenetics looks the same as the MASE alignment
file format:
http://www.bioperl.org/wiki/Mase_multiple_alignment_format
On the other hand, the EMBOSS example is clearly not a multiple
sequence alignment:
http://emboss.sourceforge.net/docs/themes/seqformats/ig
Adding the parser to Bio.SeqIO would let us read in alignments too via
Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try
and convert the SeqRecords into an Alignment).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 14:06:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:06:40 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com>
Forgot to send this to the list, another point about IntelliGenetics vs MASE
---------- Forwarded message ----------
From: Peter
Date: Wed, Jul 2, 2008 at 3:05 PM
Subject: Re: [Biopython-dev] Bio.IntelliGenetics
To: mjldehoon at yahoo.com
> How would you separate the file header comment from the first record
> comment? Some files include what looks like a file header but the
> lines all seem to start with "; ". Maybe look for "; LOCUS..."?
> Given the whole comment seems to be free format I don't think this is
> very nice.
>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
I found an old link I had added on the wiki page for SeqIO development,
http://pbil.univ-lyon1.fr/help/formats.html
This clearly describes MASE format format s having (optional) header
lines as starting with two semi colons. But are MASE and
IntelliGenetics the same thing?
Petet
From biopython at maubp.freeserve.co.uk Wed Jul 2 14:12:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:12:48 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
<486B8A1D.8090806@biologie.uni-kl.de>
Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com>
>> My short questions are:
>>
>> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
>> with more than one #NEXUS line)?
>
> As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the
> concept of "records" is not part of a nexus file
OK, thank you.
>> Q2: Can a NEXUS record/file contain more than one alignment (matrix
>> block)?
>
> I just had a quick look at the old Maddison et al. introductory paper of
> Nexus, and it says that "although the nexus standard does not impose
> constraints on the number of blocks, particular programs will". I don't know
> of any program that would read more than one data block and keep both of
> them.
So that is a "yes in theory", but it doesn't sound worth worrying about.
>> Q3: Given a generic Alignment object (e.g. from one of the other
>> parsers), can I construct a corresponding Nexus object where the
>> aligned sequences are used for the matrix? If so, how?
>
> Hmmm - not really. Nexus.py does not support "empty" nexus class objects
> that could be filled with data (just tried) . But it would actually be a
> nice thing to have. I'll put this on my to do list.
Thanks,
Peter
From mjldehoon at yahoo.com Wed Jul 2 14:15:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon.
http://pbil.univ-lyon1.fr/help/formats.html
> If you did create an iterator class, would you make the
> header available as a property of the iterator?
I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class.
---Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 14:38:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools
in run_tests.py
In-Reply-To:
Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2524
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Documentation |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST -------
Filing under "Unit Tests".
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 14:39:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:39:22 -0400
Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test
suite)
In-Reply-To:
Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2469
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Main Distribution |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST -------
Filing under "Unit Tests"
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 14:56:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:56:00 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<529945.38158.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com>
>> If you did create an iterator class, would you make the
>> header available as a property of the iterator?
>
> I am not sure what you mean by a property of the iterator. I was
> thinking to simply add a field to the class.
Adding the file header field to the iterator class? You could do I suppose.
Right now all the Bio.SeqIO parsers use generator functions (although
not in AlignIO), although I have no objection to returning iterator
classes instead.
I don't really like the idea of Bio.SeqIO parsers returning anything
other than SeqRecord objects - even if it is indirectly via a richer
iterator object. I see the Bio.SeqIO as a common unified API, and the
downside is sometimes extra data doesn't really fit.
If there really is some important meta-data in a file format that
applies to all the records, then it cannot easily be represented in
the Bio.SeqIO system except as annotation added to every single
SeqRecord. e.g. Add the header to the annotations dictionary under
"file-header" or something.
Peter
From mjldehoon at yahoo.com Wed Jul 2 15:29:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com>
--- On Wed, 7/2/08, Peter wrote:
> I found an old link I had added on the wiki page for SeqIO
> development,
> http://pbil.univ-lyon1.fr/help/formats.html
>
> This clearly describes MASE format format s having
> (optional) header
> lines as starting with two semi colons. But are MASE and
> IntelliGenetics the same thing?
It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments.
So let's do the following:
For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway.
Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments.
Does that sound OK?
--Michiel.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 16:28:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:28:19 -0400
Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
Summary: Support for PIR / NBRF format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also
supports "nbrf" as an alternative.
http://bioperl.org/wiki/PIR_sequence_format
Patch to follow, a new parser and writer in plain python. The existing Martel
based parser in Bio.NBRF could then be deprecated.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 16:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:30:28 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST -------
Created an attachment (id=964)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view)
New file Bio/SeqIO/PirIO.py
Note that the details of storing the sequence type may need tweaking for better
agreement with the de-facto conventions from the GenBank parser.
As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py
valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \
"DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \
"RC":"circular RNA", "N3": "transfer RNA", "N1": "other"
}
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 17:37:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 13:37:05 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST -------
My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py
Also when recording a SeqRecord from a non-PIR input, we could try and guess
the sequence type. The alphabet itself is one clue. GenBank and EMBL files
should also record if the sequence is linear or circular, as well as a sequence
type.
For proteins, I don't see how to decide between P1 and F1 though (complete
protein vs protein fragment). Maybe default to F1?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 19:51:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 15:51:49 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST -------
Even better docs:
http://blog.doughellmann.com/2007/07/pymotw-subprocess.html
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 13:24:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:24:32 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST -------
Hi Frank,
I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the
original taxon order with and without the name changes.
n.unaltered_taxlabels = Original names in order with duplicates
n.original_taxon_order = Modified names in order, suitable as keys to n.matrix
I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing
the original name and the modified unique name as the SeqRecord's name and id
properties.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 13:52:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:52:08 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST -------
Hi Peter,
I'd strongly suggest to use self.taxlabels instead of
self.original_taxon_order. The latter is only for compatibility, and
original_taxon_order just maps taxlabels. Actually it might make sense to give
a deprecation warning if original_taxon_order is used, and it should be removed
in some future release.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 14:06:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:06:46 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #584 is|0 |1
obsolete| |
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST -------
(From update of attachment 584)
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it. This code it now obsolete.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 14:13:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:13:38 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST -------
Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view)
Bio/Nexus/Nexus.py handle support in write_nexus_data()
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it:
#Read in an alignment object, e.g. with Bio.AlignIO
from Bio import AlignIO
alignment = AlignIO.read(open("example.aln"), "clustal")
#Make a Nexus object
from Bio.Nexus import Nexus
handle = open("test.nex", "w")
n = Nexus.Nexus()
n.alphabet = alignment._alphabet
for record in alignment :
n.add_sequence(record.id, record.seq.tostring())
n.write_nexus_data(handle)
handle.close()
There are two problems with write_nexus_data(), firstly it doesn't accept a
StringIO handle (see also Bug 2454).
Secondly, if given a handle it closes it. This would break the above code, or
how I typically use StringIO.
This patch addresses these points.
Frank, are you happy for me to commit this change?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 16:02:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:02:30 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST -------
Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view)
Patch to Bio/AlignIO/NexusIO.py adding write support
This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment
4).
My method for constructing an empty DNA, RNA, or Protein Nexus object is
perhaps inelegant. This is required in order to setup the alphabet,
ambiguous_values and unambiguous_letters properties which otherwise default to
DNA.
Also note that the Nexus add_sequence() method does not seem to support
duplicated taxa names. Perhaps this method could update the
unaltered_taxlabels property and use the _unique_label method to cope with
duplicate names?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 16:08:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:08:26 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST -------
I have changed my use of original_taxon_order to just taxlabels (code now in
Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py).
I agree, adding a deprecation warning to the original_taxon_order get/set
functions would make sense.
P.S. Thanks for adding the unaltered_taxlabels property.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 4 08:11:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 09:11:06 +0100
Subject: [Biopython-dev] What happened to Biopython 1.46?
Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com>
We were recently talking about doing another release, but as you may
have noticed nothing has been announced.
Michiel devoted a good chunk of his weekend to preparing Biopython
1.46 and uploaded it to the servers on Sunday 29th. He didn't issue
an announcement email at the time due to the problem with the wiki
being read only (now fixed). However, on the Monday evening I
realised I'd done something really stupid in Bio.Data.CodonTable just
before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be
used whenever a translation table was requested by name. This change
has been reverted, and I've added further translation checks in
test_seq.py to avoid any similar issue in future.
So, while there is a Biopython 1.46, we're not going to advertise it
because the translation functionality is subtly wrong. However, it is
up on the website, and linked to with an errata statement.
Michiel will kindly try and prepare Biopython 1.47 soon... so please
hold off any big changes in CVS until then.
And I'm hearby publicly promising to treat him to dinner - hopefully
we'll be in the same country at the same time this year!
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:39:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:39:35 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST -------
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
> ...
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
Very nice. Go for it :-)
Cheers,
Frank
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
>
> #Read in an alignment object, e.g. with Bio.AlignIO
> from Bio import AlignIO
> alignment = AlignIO.read(open("example.aln"), "clustal")
>
> #Make a Nexus object
> from Bio.Nexus import Nexus
> handle = open("test.nex", "w")
> n = Nexus.Nexus()
> n.alphabet = alignment._alphabet
> for record in alignment :
> n.add_sequence(record.id, record.seq.tostring())
> n.write_nexus_data(handle)
> handle.close()
>
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:53:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:53:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #965 is|0 |1
obsolete| |
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST -------
(From update of attachment 965)
> > This patch addresses these points.
> >
> > Frank, are you happy for me to commit this change?
> >
>
> Very nice. Go for it :-)
>
Thanks Frank.
Checking in Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.17; previous revision: 1.16
done
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:56:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:56:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #966 is|0 |1
obsolete| |
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST -------
(From update of attachment 966)
There is slight problem with this patch on the alphabet selection (it uses
"dna" when it should use "rna").
I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until
after the next Biopython release.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 09:13:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:13:25 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040913.m649DPap027929@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST -------
(In reply to comment #5)
> Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details]
> Patch to Bio/AlignIO/NexusIO.py adding write support
>
> This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment
> 4).
>
> My method for constructing an empty DNA, RNA, or Protein Nexus object is
> perhaps inelegant. This is required in order to setup the alphabet,
> ambiguous_values and unambiguous_letters properties which otherwise default to
> DNA.
>
> Also note that the Nexus add_sequence() method does not seem to support
> duplicated taxa names. Perhaps this method could update the
> unaltered_taxlabels property and use the _unique_label method to cope with
> duplicate names?
>
Ok, I updated add_sequence and will commit the changes soon.
F
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 09:20:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:20:07 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST -------
(In reply to comment #9)
> >
> > Also note that the Nexus add_sequence() method does not seem to support
> > duplicated taxa names. Perhaps this method could update the
> > unaltered_taxlabels property and use the _unique_label method to cope with
> > duplicate names?
> >
> Ok, I updated add_sequence and will commit the changes soon.
>
Checking in biopython/Bio/Nexus/Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.18; previous revision: 1.17
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Jul 4 10:24:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in Bio/SeqIO/IgIO.py (based on
> the format name of "ig" used in EMBOSS).
OK.
> Would we then also deprecate Bio.IntelliGenetics?
Yes. Otherwise, it's replicated functionality.
> Do you want to make these changes, or should I?
Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know.
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be able to cope.
In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
--Michiel.
--- On Wed, 7/2/08, Peter wrote:
> From: Peter
> Subject: Re: [Biopython-dev] Bio.IntelliGenetics
> To: mjldehoon at yahoo.com
> Date: Wednesday, July 2, 2008, 12:11 PM
> > It may be that the link in
> Bio/IntelliGenetics/__init__.py actually does not pertain
> to
> > the IntelliGenetics format. Except for this link
> (which as you point out actually talks
> > about the MASE format, not the IntelliGenetics
> format), I have seen no description
> > elsewhere of these file-wide comments preceded by a
> double semi-colon in the
> > IntelliGenetics format. Even Biopython doesn't
> treat these consistently: The tests
> > for Bio.IntelliGenetics include comments with the
> double semi-colon, but the parser
> > doesn't treat them differently from
> sequence-specific comments.
>
> Maybe we should ask BioPerl if they distinguish between the
> IntelliGenetics and MASE formats?
>
> Looking back over the old mailing list, at the time they
> did think the
> two were the same:
> http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html
>
> > So let's do the following:
> > For the IntelliGenetics parser, do not look for double
> semi-colon comments. Only check
> > if the first character in a line is a semi-colon, and
> if so, treat it as a sequence-specific
> > comment. This is what Bio.IntelliGenetics currently
> does anyway.
> > Replace the parser class in Bio.IntelliGenetics by a
> generator function, and integrate it with
> > Bio.SeqIO.
>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in
> Bio/SeqIO/IgIO.py (based on the format name of
> "ig" used in EMBOSS).
> Would we then also deprecate Bio.IntelliGenetics?
>
> Do you want to make these changes, or should I?
>
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be
> able to cope.
>
> Peter
From biopython at maubp.freeserve.co.uk Fri Jul 4 14:31:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 15:31:55 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com>
References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
<36286.77119.qm@web62412.mail.re1.yahoo.com>
Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com>
>> Do you want to make these changes, or should I?
>
> Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead.
OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for
IntelliGenetics/MASE files using the format name "ig" to match EMBOSS.
The existing three sample files are now being used in test_SeqIO.py
and one of them also in test_AlignIO.py as well.
If anyone wants to scan over the code, I'd be delighted to have feedback.
Adding support for writing these files should be easy. Do you think
this is worth implementing?
Before we deprecate Bio.IntelliGenetics I suggest we ask on the
mailing list if anyone is using it.
> In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation
> from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but
> if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
I am ignoring the ";;" lines at the start of the file.
Peter
From mjldehoon at yahoo.com Sat Jul 5 08:24:41 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT)
Subject: [Biopython-dev] CVS freeze for release 1.47
Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out.
Thanks!
--Michiel.
From mjldehoon at yahoo.com Sun Jul 6 00:00:17 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT)
Subject: [Biopython-dev] Biopython release 1.47
Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com>
We are pleased to announce the release of Biopython 1.47.
This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules.
Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible.
--Michiel on behalf of the Biopython developers
From sbassi at gmail.com Sun Jul 6 19:53:54 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 6 Jul 2008 16:53:54 -0300
Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions,
is this a bug?
Message-ID:
NCBIStandalone changed in 1.46 due to bug #2508.
So this code that was working before, no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
The error trace is:
File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
line 1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'float' is not iterable
So I had to rewrite the code as:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation="1e-10", descriptions="1")
The problem is the function "_security_check_parameters", that assumes
that all arguments are strings.
Proposed solutions:
1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
2) Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 10:47:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:47:48 -0400
Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS
journals
In-Reply-To:
Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2447
------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST -------
Using Biopython release 1.47;
Bio.Entrez can parse the XML for this PMID:
>>> from Bio import Entrez
>>> PMID = "17238260"
>>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml')
>>> record = Entrez.read(handle)
>>>
Noel, can you use Bio.Entrez instead of Bio.EUtils?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 10:55:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:55:10 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST -------
Using Bio.Entrez in Biopython release 1.47:
>>> from Bio import Entrez
>>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['MedlineCitation']['Article']['AuthorList']
[{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'},
{u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'},
{u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'},
{u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta
C'}]
Noel, is this sufficient for your needs?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 11:12:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 07:12:26 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST -------
Thanks Michiel, but I found a workaround a day later so don't worry about me. I
was just letting you know about the bug...
Noel
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Jul 7 13:07:24 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Jul 2008 14:07:24 +0100
Subject: [Biopython-dev] NCBIStandalon not compatible with previous
versions, is this a bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com>
On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote:
> NCBIStandalone changed in 1.46 due to bug #2508.
> So this code that was working before, no longer works:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation=1e-10, descriptions=1)
>
> The error trace is:
>
> File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
> line 1986, in _security_check_parameters
> if ";" in value or "&&" in value :
> TypeError: argument of type 'float' is not iterable
>
> So I had to rewrite the code as:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation="1e-10", descriptions="1")
>
> The problem is the function "_security_check_parameters", that assumes
> that all arguments are strings.
>
> Proposed solutions:
>
> 1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
> 2) Modify line 1986 from:
> if ";" in value or "&&" in value :
> To this:
> if ";" in value or "&&" in str(value) :
I would say its a bug, and casting into a string on line 1986 looks
like the best fix. I won't be able to do this until tomorrow
afternoon at the latest - if you could file a bug that would be
helpful in case I forget ;)
Thanks
Peter
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 17:08:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 13:08:40 -0400
Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes
all arguments are strings
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
Summary: _security_check_parameters assumes all arguments are
strings
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
This code no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
Because new _security_check_parameters function assumes all blastall parameters
are string, but expectation and descriptions are float and int.
Proposed fix:
Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From sbassi at gmail.com Mon Jul 7 20:30:14 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 7 Jul 2008 17:30:14 -0300
Subject: [Biopython-dev] Alignment problem. bug?
Message-ID:
I would like to confirm that this is a bug ot not. If I get
confirmation, I would fill it in bugzilla.
With this code:
from Bio import Clustalw
from Bio.Clustalw import MultipleAlignCL
cline = MultipleAlignCL('foralig.txt')
cline.set_output("alig.aln")
alignment = Clustalw.do_alignment(cline)
I get:
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/ii.py", line 112, in
alignment = Clustalw.do_alignment(cline)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 125, in do_alignment
return parse_file(out_file, alphabet)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 47, in parse_file
generic_alignment = AlignIO.read(handle, "clustal")
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
line 299, in read
first = iterator.next()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
line 169, in next
raise ValueError("Could not parse line:\n%s" % line)
ValueError: Could not parse line:
I tested with Biopython 1.47 and 1.46 with the input file:
http://www.pastecode.com.ar/f44f28b41 (download at
http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
The clustal program is running because I see in the disk its output
(posted here: http://www.pastecode.com.ar/f275a5475). It seems it
fails to parse it.
I also tested in an older version (I guess it is 1.44) and it works
OK. So I think the problem was introduced in 1.46.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 08:41:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:41:02 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST -------
Included a float in the unit test for _security_check_parameters() added in Bug
2508:
Tests/test_NCBIStandalone.py revision: 1.15;
Fixed the string assumption in:
Bio/Blast/NCBIStandalone.py revision: 1.74;
Note that in your suggested fix Sebastian, both the "in" expressions need
casting to a string.
Thanks for reporting this!
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 08:51:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 09:51:31 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
> I would like to confirm that this is a bug ot not. If I get
> confirmation, I would fill it in bugzilla.
It does look like a bug to me...
> With this code:
>
> from Bio import Clustalw
> from Bio.Clustalw import MultipleAlignCL
>
> cline = MultipleAlignCL('foralig.txt')
> cline.set_output("alig.aln")
> alignment = Clustalw.do_alignment(cline)
>
> I get:
>
> Traceback (most recent call last):
> File "/mnt/hda2/py252/bin/ii.py", line 112, in
> alignment = Clustalw.do_alignment(cline)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 125, in do_alignment
> return parse_file(out_file, alphabet)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 47, in parse_file
> generic_alignment = AlignIO.read(handle, "clustal")
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
> line 299, in read
> first = iterator.next()
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
> line 169, in next
> raise ValueError("Could not parse line:\n%s" % line)
> ValueError: Could not parse line:
>
>
> I tested with Biopython 1.47 and 1.46 with the input file:
> http://www.pastecode.com.ar/f44f28b41 (download at
> http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
> The clustal program is running because I see in the disk its output
> (posted here: http://www.pastecode.com.ar/f275a5475). It seems it
> fails to parse it.
>
> I also tested in an older version (I guess it is 1.44) and it works
> OK. So I think the problem was introduced in 1.46.
For Biopython 1.46+ I switched the Bio.Clustalw parser to internally
call Bio.AlignIO, so one thing you could try is reverting
Bio/Clustalw/__init__.py to the older version (e.g. that shipped with
Biopython 1.45).
You haven't said which version of the ClustalW tool you are using -
maybe 2.0? If so, there could be a subtle change in the output
format since 1.83. If you could run the tool by hand and share the
output that would be helpful to try and track down this issue.
I don't seem to have any version of ClustalW installed on my current
machine, so it will take me a little longer to reproduce this here.
Could you file a bug please, and attach the example input and the
output when run by hand at the command line.
Thanks,
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 08:52:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:52:06 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST -------
Forgot to mark this as fixed - sorry for the extra email!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 11:02:37 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 12:02:37 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote:
> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
>> I would like to confirm that this is a bug ot not. If I get
>> confirmation, I would fill it in bugzilla.
>
> It does look like a bug to me...
I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
files where the first line of the consensus was blank (and would
probably affect both Clustal W 1.83 too).
I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Could you update this file and re-test please Sebastian? Also, may I
add a test alignment file based on your example to CVS please?
Thanks,
Peter
From mjldehoon at yahoo.com Tue Jul 8 12:47:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT)
Subject: [Biopython-dev] Bio.Sequencing
Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Hi everybody,
Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
Just to make sure that I am not treading on other people's work.
--Michiel
From fkauff at biologie.uni-kl.de Tue Jul 8 13:12:39 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 08 Jul 2008 15:12:39 +0200
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <487367C7.2050702@biologie.uni-kl.de>
Hi all,
Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
>
Not me. Green lights from my side.
Frank
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
>
>
> --Michiel
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From biopython at maubp.freeserve.co.uk Tue Jul 8 14:36:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 15:36:43 +0100
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com>
On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
My only comment is watch out for the fact that Bio.SeqIO is now
calling Bio.Sequencing for the "ace" and "phd" formats.
On a related note, I'd had some ideas for making the Ace parser more
user friendly by further extending the doc strings and defining
__str__ or __repr__ methods for some of the "line type classes" which
otherwise must be explored by using dir() to discover the properties.
I haven't actually done any work on this yet though.
Peter
From sbassi at gmail.com Tue Jul 8 15:38:29 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 12:38:29 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
> Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
> files where the first line of the consensus was blank (and would
> probably affect both Clustal W 1.83 too).
Yes, I used ClustalW 1.83
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
> Could you update this file and re-test please Sebastian? Also, may I
> add a test alignment file based on your example to CVS please?
Ok, I will test it today. You can use my file or any possible derivation of it.
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From biopython at maubp.freeserve.co.uk Tue Jul 8 15:56:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 16:56:20 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com>
>> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
>> Could you update this file and re-test please Sebastian? Also, may I
>> add a test alignment file based on your example to CVS please?
>
> Ok, I will test it today. You can use my file or any possible derivation of it.
Thanks - I have added a two sequence version of your example as
Tests/Clustalw/odd_consensus.aln
Peter
From sbassi at gmail.com Tue Jul 8 16:52:13 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 13:52:13 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Just to confirm that it works now. Thank you!
Best,
SB.
From biopython at maubp.freeserve.co.uk Wed Jul 9 11:11:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 12:11:16 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Now that Biopython 1.47 is out, are there any comments/objections to
my committing this to CVS?
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Thanks,
Peter
P.S. Any real world example files would be good for the test suite.
From lpritc at scri.ac.uk Wed Jul 9 12:14:04 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Wed, 09 Jul 2008 13:14:04 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID:
Only that you might want to consider Axon Text File format as a
self-describing tab-separated format which would facilitate storage and
recovery of all attributes of a sequence. There's a spec here:
http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
On 09/07/2008 12:11, "Peter" wrote:
> Now that Biopython 1.47 is out, are there any comments/objections to
> my committing this to CVS?
>
> Bug 2533 - Support for simple "tab" format in Bio.SeqIO
> http://bugzilla.open-bio.org/show_bug.cgi?id=2533
>
> Thanks,
>
> Peter
>
> P.S. Any real world example files would be good for the test suite.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
SCRI, Invergowrie, Dundee, DD2 5DA.
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
DISCLAIMER:
This email is from the Scottish Crop Research Institute, but the views
expressed by the sender are not necessarily the views of SCRI and its
subsidiaries. This email and any files transmitted with it are confidential
to the intended recipient at the e-mail address to which it has been
addressed. It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this
confidentiality and you must not use, disclose, copy, print or rely on this
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the
name of the sender and delete the email from your system.
Although SCRI has taken reasonable precautions to ensure no viruses are
present in this email, neither the Institute nor the sender accepts any
responsibility for any viruses, and it is your responsibility to scan the email
and the attachments (if any).
From biopython at maubp.freeserve.co.uk Wed Jul 9 12:30:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 13:30:26 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com>
On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote:
> Only that you might want to consider Axon Text File format as a
> self-describing tab-separated format which would facilitate storage and
> recovery of all attributes of a sequence. There's a spec here:
>
> http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
>
Its an interesting and flexible file format, but I don't see any
standard column name for "sequence" which would be of particular
interest from the point of view of the Bio.SeqIO module. If there is
a de-facto convention for storing sequence data in an Axon Text File,
then we could adopt this within Bio.SeqIO. Otherwise, I think any
Axon Text File parser added to Biopython would have to be of much more
general nature (and not part of Bio.SeqIO).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 9 13:03:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 14:03:16 +0100
Subject: [Biopython-dev] Simple __getitem__ for Alignments
Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com>
Now that the latest release is out (Biopython 1.47), Bio.AlignIO
should start to get used more. I anticipate more people getting
frustrated with the current Alignment object, and would like to make
another baby-step in improving it.
I'd like to add a minimal __getitem__ method, as described in Bug 1944
comment 15,
http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15
> def __getitem__(self, index) :
> """Access part of the alignment.
>
> You can access a row of the alignment as a SeqRecord using an integer
> index (think of the alignment as a list of SeqRecord objects here):
>
> first_record = my_alignment[0]
> last_record = my_alignment[-1]
>
> Right now, this is the ONLY indexing operation supported. The
> use of two indices and splice notation to extract a sub-alignment,
> row, column or letter is under discussion for a future update."""
> if isinstance(index, int) :
> #e.g. result = align[x]
> #Return a SeqRecord
> return self._records[index]
> else :
> raise TypeError, "Not currently supported, but may be in future."
>From the discussion on Bug 1944, this doesn't seem to be contentious -
the debate is about more advanced splicing operations.
I'd also like to add a __len__ method which would return the number of
SeqRecord objects (i.e. the number of rows). This would then let the
alignment be treated very much like a read-only list of SeqRecord
objects. Remember, we can already iterate over the rows in the
alignment as SeqRecord objects.
Any comments?
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:21:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:21:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST -------
(In reply to comment #16)
I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
to the previous parser.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:37:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:37:44 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST -------
Michiel,
while you're at it - could you update my email in the source as well? And
Cymon's email is now cy at cymon.org. Thanks!
Frank
(In reply to comment #20)
> (In reply to comment #16)
> I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
> to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
> to the previous parser.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:38:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:38:18 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST -------
In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've
also update Bio.SeqIO.PhdIO in CVS (good).
I would suggest you add yourself to the copyright statement for this module,
and add some doc string entries to the new read and parse functions. I haven't
looked over the details of the new code (other than confirming test_Phd.py and
test_SeqIO.py seem happy).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:28:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 10:28:36 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST -------
(In reply to comment #21)
> Michiel,
>
> while you're at it - could you update my email in the source as well? And
> Cymon's email is now
I have updated your address, but I'd prefer hold off on Cymon's without his
direct permission -- spammers are watching too, you know.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 18:33:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:33:42 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST -------
OK, so my old code not yet converted to biopython-1.47 gives me:
_textframe = blast.blast_and_htmlize(_query_sequence, _usermode,
upload_temp_path, blast_path, uri, _align_view, _matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in
blast_and_htmlize
_blast_out, _error_info, _blast_file = blastall(blast_path + targetdb,
query_sequence, upload_temp_path, mode='sequence', align_view=align_view,
matrix=matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize,
gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands,
alignments=_alignments, descriptions=_descriptions, expectation=_expectation,
align_view=align_view)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1620, in blastall
_security_check_parameters(keywds)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'int' is not iterable
It turns out I am passing in:
{'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4,
'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999}
I don't think it makes sense to require users to pass strings instead of
numbers to the function.
While looking into the _security_check_parameters() I think you should also
check for "||" - the logical OR as interpreted by shell and redirections ">"
and "<".
FIX:
-if ";" in value or "&&" in value:
+if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in
str(value) or "<" in str(value):
My apologies that I did not test earlier.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 18:38:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:38:08 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST -------
Don't know if you want to leave in the back-door to pass in another argument
with its value. If not, prevent spaces as well. Values never contain spaces
unless wrapped by single or double-quotes. I find it perfectly legal to tell
blastall:
-d "/some/db /another/db /yet/another" to search over three databases at once.
It seems it does not reflect -d specified 3 times on its command-line.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 20:12:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 16:12:40 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST -------
The issue with non-string arguments (e.g. float or integers) was reported by by
Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was
after the release of Biopython 1.47.
As you've demonstrated there are valid reasons to want to include spaces. I
would rather not add a check which requires lots of special casing.
I'm leaving this bug open to consider extending _security_check_parameters() to
prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds
reasonable. A third opinion wouldn't hurt of course!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 10:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 06:30:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST -------
> (In reply to comment #21)
> > Michiel,
> >
> > while you're at it - could you update my email in the source as well? And
> > Cymon's email is now
>
> I have updated your address, but I'd prefer hold off on Cymon's without his
> direct permission -- spammers are watching too, you know.
>
Contacted Cymon, reply below:
Hi Frank,
...
>
> Do you want your email address updated in the ace/phd parser code? Or
> removed (just the email, not the name, of course)? Don't know if you follow
> biopython-dev lately.
I dont actually follow the -dev list but perhaps I should, as I think
I'm going to be using and doing far more diverse bioinformatics stuff
(now that I'm employed as a bioinformatician :)
Anyway, the email can be changed to cymon.cox at gmail.com - best to go
through google I think as their spam filters tend to be pretty good.
Cheers, C.
(In reply to comment #23)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 16:24:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 12:24:27 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST -------
Checked in, marking as fixed.
Bio/SeqIO/TabIO.py initial revision: 1.1
Bio/SeqIO/__init__.py new revision: 1.33
Tests/output/test_SeqIO new revision: 1.25
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 03:20:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 23:20:11 -0400
Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
Summary: AlignInfo.py fails a test
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
When I run:
$ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py
I get the first 2 test OK but then:
Traceback (most recent call last):
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in
print summary.information_content()
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in
information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
I've also tried without the AlignIO:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.ProteinAlphabet)
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
summary.information_content()
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/align.py", line 16, in
summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 508, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 08:49:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 04:49:08 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST -------
Going over your example code:
>>> from Bio import Alphabet
>>> from Bio.Align.Generic import Alignment
>>> from Bio.Align.AlignInfo import SummaryInfo
>>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
>>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
>>> a = Alignment(Alphabet.ProteinAlphabet)
First problem, you gave the Alignment object an Alphabet class, rather than an
instance of the class. I guess we should an explicit check to the Alignment
object...
You should have used:
>>> a = Alignment(Alphabet.ProteinAlphabet())
Or, if you prefer perhaps:
>>> a = Alignment(Alphabet.generic_protein)
Then when we get to the information_content, there is another issue:
>>> a.add_sequence("asp",seq1)
>>> a.add_sequence("unk",seq2)
>>> summary = SummaryInfo(a)
>>> summary.information_content()
Traceback (most recent call last):
...
AttributeError: ProteinAlphabet instance has no attribute 'gap_char'
The trouble here is that SummaryInfo class is looking for a declared gap
character in the protein alphabet - and none has been declared. Your example
sequences appear to use "-" as a gap, but you haven't declared this.
Try this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-"))
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
print summary.information_content()
You mentioned having a similar issue with Bio.AlignIO - could you attached the
example file to this bug with some trivial code showing your problem?
Thanks, Peter.
P.S. Please update to Biopython 1.47 rather than using 1.46
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 09:50:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 05:50:49 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST -------
I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py
directly. I don't know how I missed that before...
/home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <--
AlignInfo.py
new revision: 1.15; previous revision: 1.14
done
My opinion from from looking at the AlignInfo code, and scanning back over the
CVS history, is that it was ever used much with generic alphabets (as tend to
be returned by Bio.AlignIO). There may be other issues here - for example I've
spotted another problem case, doubly extended alphabets like a protein alphabet
with declared Gapped and WithStopCodon (which you *might* want in an
alignment).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 11 10:33:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 11 Jul 2008 11:33:22 +0100
Subject: [Biopython-dev] Checking alphabet argument in alignments
Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com>
I'd like to add the following check to the __init__ method of the
Bio.Align.Generic.Alignment object (our base alignment class),
> if not (isinstance(alphabet, Alphabet.Alphabet) \
> or isinstance(alphabet, Alphabet.AlphabetEncoder)):
> raise ValueError("Invalid alphabet argument")
This will prevent subtle user errors like this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet)
which should be:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet())
The only downside I have thought of is if anyone has created their own
alignment class which does NOT subclass the original
Bio.Alphabet.Alphabet class.
This same test could (should?) also be added to the Seq and MutableSeq objects.
What do people think?
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 10:39:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 06:39:48 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST -------
In comment 2 I wrote:
> I've spotted another problem case, doubly extended alphabets like a
> protein alphabet declared Gapped and WithStopCodon (which you *might*
> want in an alignment).
This alphabet issue is fixed in CVS, as is another corner case of a divide by
zero error where an entire column consists of ignored characters.
Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS.
Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 16:18:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 12:18:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST -------
(In reply to comment #24)
OK, I updated Phd.py.
The last module to consider is Ace.py; I'll upload a fixed version soon.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:00:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:00:10 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST -------
(In reply to comment #1)
> First problem, you gave the Alignment object an Alphabet class, rather than an
> instance of the class. I guess we should an explicit check to the Alignment
> object...
Yes, that is my fault.
> You mentioned having a similar issue with Bio.AlignIO - could you attached the
> example file to this bug with some trivial code showing your problem?
Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py
rev. 1.17):
from Bio.Align import AlignInfo
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
print summary.information_content()
And I got (and this time I am not supplying any alphabet, at least not
explicit):
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/2542.py", line 12, in
print summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 499, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
> P.S. Please update to Biopython 1.47 rather than using 1.46
I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not
available from the drop-down menu in bugzilla form.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:02:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:02:24 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST -------
Created an attachment (id=971)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view)
This file is used by my example were information_content() fails when sequences
retrieved with AlignIO
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:16:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:16:03 -0400
Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and
Bio.AlignIO
In-Reply-To:
Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2443
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Specifying the alphabet in |Specifying the alphabet in
|Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST -------
I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and
Bio.AlignIO (both their read() and parse() functions).
See also alphabet issues raised on Bug 2542.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:19:50 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST -------
> Yes, this code with Bio.AlignIO also fails (I tried right now with
> AlignInfo.py rev. 1.17):
>
> from Bio.Align import AlignInfo
> from Bio.Align.AlignInfo import SummaryInfo
> from Bio import AlignIO
> fn = open("secu3.aln")
> alignment = AlignIO.read(fn, "clustal")
> summary = SummaryInfo(alignment)
> print summary.information_content()
>
> And I got (and this time I am not supplying any alphabet, at least not
> explicit):
>
> Traceback (most recent call last):
> ...
> ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
> frequencies
Good. That seems to be working as intended - alignment formats like FASTA or
Clustal do not specify the sequence type (unlike for example the Nexus format).
Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional
alphabet argument? I had already been considering this for Bio.SeqIO so this
is a natural extension. See Bug 2443.
Unless information_content() can determine the sequence type (protein or
nucleotide) from the alignment alphabet, you have to help it by supplying an
appropriate e_freq_table argument.
Perhaps:
from Bio.Alphabet import IUPAC
from Bio.SubsMat import FreqTable
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
#Have a generic alphabet, without a declared gap char, so must
#provide the frequencies and chars to ignore explicitly:
expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25},
FreqTable.FREQ, IUPAC.unambiguous_dna)
print summary.information_content(e_freq_table=expected,
chars_to_ignore=['-'])
This is probably safest. I'm doubtful that information_content() will choose
wisely if given mixed case or lower case sequences... if that is the case it
should be filed as a new bug.
>
> > P.S. Please update to Biopython 1.47 rather than using 1.46
>
> I was using Biopython 1.47, but I reported as 1.46 just because 1.47
> it is not available from the drop-down menu in bugzilla form.
Thanks for the reminder - I've added that to Bugzilla now :)
I'm marking this bug as fixed now (after the updates to AlignInfo.py)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From peter at maubp.freeserve.co.uk Sat Jul 12 13:45:46 2008
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 12 Jul 2008 14:45:46 +0100
Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW
Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com>
For some time now we've been discouraging the use of the HTML and
plain text Blast parsers in favour of the XML format.
I think it would be a good idea to now officially deprecate the HTML
parser in Bio.Blast.NCBIWWW with warning messages when it is used. I
don't even know if it still works with the recent big revision to the
BLAST webpages, but I suspect not.
However, the plain text parser in Bio.Blast.NCBIStandalone still has
its uses. In particular, right now the PSI-BLAST output in XML format
lacks some of the information found in the plain text output (new vs
reused entries) so it would be premature to deprecate our plain text
PSI parser. See Bug 2502 for details:
http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jul 13 16:23:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 Jul 2008 12:23:57 -0400
Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named
ancestors
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
Summary: Bio.Nexus.Trees can't handle named ancestors
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: FreeBSD
Status: NEW
Severity: normal
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: markd at soe.ucsc.edu
The following code produces:
ValueError: invalid literal for float(): Ancestor1
from Bio.Nexus import Trees
# from http://evolution.genetics.washington.edu/phylip/newicktree.html
treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);"
tree = Trees.Tree(treeStr)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 14 10:17:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 Jul 2008 06:17:14 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST -------
This sounds like a job for Frank (the Bio.Nexus module author).
Can I ask if you've actually come across trees with names ancestor nodes in
"real life"? That would make this bug more important. If so, the name of the
tool would be interesting, an example tree file would be great to add to
Biopython as a test case.
If on the other hand the only named ancestor tree you've ever tried is the
example from the Newick documentation, this doesn't seem such a high priority
(but still worth fixing).
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 15 20:07:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Jul 2008 16:07:56 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST -------
This is a suggested implementation of the split method for our Seq object,
modelled after the python string method which it calls internall. Note that I
have made the separator non-optional on the grounds that the string method's
default of white space isn't (usually) sensible for sequences. I'm happy to
change this if people this its better to be as close as possible to the string
method.
def split(self, sep, maxsplit=None) :
"""Split method, like that of a python string.
Return a list of the 'words' in the string (as Seq objects),
using sep as the delimiter string. If maxsplit is given, at
most maxsplit splits are done.
Unlike the python string method, sep must be specified (as
there shouldn't be any whitespace strings in a sequence).
e.g. print my_seq.split("-")
"""
if maxsplit :
parts = self.data.split(sep, maxsplit)
else :
parts = self.data.split(sep)
return [Seq(chunk, self.alphabet) for chunk in parts]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 09:39:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 05:39:01 -0400
Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
Summary: Bio.SeqIO improvements
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
$ python
Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> handle = open("genbank-synthetic.gb")
>>> print seq_record
ID: EF452680.2
Name: EF452680
Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
/comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
/sequence_version=2
/source=synthetic construct
/taxonomy=['other sequences', 'artificial sequences']
/keywords=['']
/references=[,
, , ]
/accessions=['EF452680']
/data_file_division=SYN
/date=11-JUN-2008
/organism=synthetic construct
/gi=166831528
Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
IUPACAmbiguousDNA())
>>>
I do not see how I could access the value 'DNA' from the LOCUS line:
LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Could seq_record.features have a repr() function to give me something useful
instead of this?
>>> print seq_record.features
[, , ]
>>>
I don't see documented anywhere in the biopython docs access the features,
pasting something like the following into docs would give a user clue where to
look for for values:
>>> print seq_record.features[0].qualifiers
{'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
>>> print seq_record.features[1].qualifiers
{'gene': ['NOS']}
>>> print seq_record.features[2].qualifiers
{'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
gondii'], 'db_xref': ['GI:166831529'], 'translation':
['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
>>> print seq_record.features[3].qualifiers
Traceback (most recent call last):
File "", line 1, in
IndexError: list index out of range
>>>
I wonder if I could access the above dicts as seq_record.features['source']
or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 10:30:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 06:30:21 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature
| |improvements
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST -------
(In reply to comment #0)
> $ python
>
> Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
> [GCC 4.3.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio import SeqIO
> >>> handle = open("genbank-synthetic.gb")
I'm guessing the missing line here was something like:
seq_record = SeqIO.read(handle, "genbank")
> >>> print seq_record
> ID: EF452680.2
> Name: EF452680
> Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
> /comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
> /sequence_version=2
> /source=synthetic construct
> /taxonomy=['other sequences', 'artificial sequences']
> /keywords=['']
> /references=[,
> , instance at 0x834ceac>, ]
> /accessions=['EF452680']
> /data_file_division=SYN
> /date=11-JUN-2008
> /organism=synthetic construct
> /gi=166831528
> Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
> IUPACAmbiguousDNA())
> >>>
>
>
> I do not see how I could access the value 'DNA' from the LOCUS line:
> LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
Currently the sequence type (DNA, RNA, Protein) is used internally by the
GenBank parser to determine the alphabet. It is not currently recorded in the
SeqRecord object's annotation but could be. How about something like this?:
seq_record.annotations["seq_type"]
> No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Assuming that the first feature is the source (typically the case), and
assuming it has a specified molecule type, then your suggestion is one work
around. But I agree, its not nice.
> Could seq_record.features have a repr() function to give me something useful
> instead of this?
>
> >>> print seq_record.features
> [, instance at 0x837b9cc>, ]
Yes we could add that, but you wouldn't want to do that on a typical genome
with thousands of features. Adding a repr method for the Reference object is
also something I had wondered about doing.
> I don't see documented anywhere in the biopython docs access the features,
> pasting something like the following into docs would give a user clue where to
> look for for values:
>
> >>> print seq_record.features[0].qualifiers
> {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
> construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
> aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
> >>> print seq_record.features[1].qualifiers
> {'gene': ['NOS']}
> >>> print seq_record.features[2].qualifiers
> {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
> ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
> gondii'], 'db_xref': ['GI:166831529'], 'translation':
> ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
> 'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
There is a minimal bit of text in what is currently Chapter 10 of the tutorial
on the SeqFeature object. I agree, this is an area that needs improvement.
Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter
would help?
> >>> print seq_record.features[3].qualifiers
> Traceback (most recent call last):
> File "", line 1, in
> IndexError: list index out of range
You must have only three features (indexed 0, 1 and 2) which explains the index
error.
> I wonder if I could access the above dicts as seq_record.features['source']
> or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
As the .type attribute, try this:
for feature in seq_record.features:
print feature.type
You can't access the features by type (e.g. seq_record.features['CDS']) because
there is generally more than one feature of each type.
Peter
P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the
underlying Bio.GenBank parser or the SeqFeature object. I have therefore
changed the title.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 10:49:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 11:49:19 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
Message-ID: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
Michiel,
I just noticed your CVS revision to Bio/Saf/__init__.py removing this
snippet of code:
dumpfile = open( 'dump', 'w' )
dumpfile.write( data )
dumpfile.close()
I recall seeing (and removing) a similar lump of diagnostics/debugging
code from another of Katharine Lindner's parsers.
There is still a similar bit of code in
Bio/IntelliGenetics/__init__.py which we could remove, but as the
whole module is now deprecated we could just wait for a few releases
and then remove it entirely.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 11:40:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 07:40:53 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807161140.m6GBerMH021048@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 07:40 EST -------
(In reply to comment #8)
> Whether or not to stop translating at the first stop codon could be an
> argument to the translate method. As an alternative, it may be preferable
> to have a split() method that splits the sequences at the stop codons.
> Such a method could be applied to all protein sequences, not only those
> created by translate().
Adding a split() method to the Seq object is a good idea in general (making the
Seq object more like a python string), and using my_protein.split("*") is an
nice example usage of this.
I have posted a possible implementation of the split() method for the Seq
object on Bug 2351 comment 15
http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 12:40:03 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 13:40:03 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
Message-ID: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
>> But, there is also a set of interconnected modules where it's not 100%
>> clear if they can be removed without causing some surprises:
>> Bio.builders
>> Bio.config
>> Bio.dbdefs
>> Bio.formatdefs
>> Bio.dbdefs
>> Bio.expressions
>> Bio.FormatIO
>> Bio.Std
>> Bio.StdHandler
>> It is probably OK to remove these, since these were deprecated we did
>> not get a barrage of complaints from our users. Personally, I think it is
>> important to keep the code base clean, so I am in favor of removing
>> these (and see if anybody complains; in that case, we can always put
>> these modules back in and make a new release). But I can live with
>> keeping these modules for another release round. If anybody thinks
>> that that would be better, please let us know.
>
> Given some of these are very interconnected, I would be inclined to leave
> them in for one more release. However I'm content to see them go. If no
> one else has any qualms, then please carry on.
Now that Biopython 1.47 is out, its probably time to remove
Bio.expressions (deprecated in 1.44) and explicitly deprecate the
rest:
Bio.builders
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.Std
Bio.StdHandler
(plus Bio.Writer which is part this "Bioformat" code base?)
The final entry from your list, Bio.FormatIO, has already been removed.
Peter
From mjldehoon at yahoo.com Wed Jul 16 14:14:07 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 16 Jul 2008 07:14:07 -0700 (PDT)
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
Message-ID: <729090.76301.qm@web62408.mail.re1.yahoo.com>
I removed a similar piece of code in one more module (I forgot which one).
While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
--Michiel.
--- On Wed, 7/16/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
> To: "BioPython-Dev Mailing List"
> Date: Wednesday, July 16, 2008, 6:49 AM
> Michiel,
>
> I just noticed your CVS revision to Bio/Saf/__init__.py
> removing this
> snippet of code:
>
> dumpfile = open( 'dump',
> 'w' )
> dumpfile.write( data )
> dumpfile.close()
>
> I recall seeing (and removing) a similar lump of
> diagnostics/debugging
> code from another of Katharine Lindner's parsers.
>
> There is still a similar bit of code in
> Bio/IntelliGenetics/__init__.py which we could remove, but
> as the
> whole module is now deprecated we could just wait for a few
> releases
> and then remove it entirely.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Wed Jul 16 14:44:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 15:44:28 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <729090.76301.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
<729090.76301.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
On Wed, Jul 16, 2008 at 3:14 PM, Michiel de Hoon wrote:
> I removed a similar piece of code in one more module (I forgot which one).
Bio/MetaTool/__init__.py if anyone wanted to know. The CVS changes
RSS feed is handy:
http://biopython.org/wiki/Tracking_CVS_commits
> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
Yes, it probably does - assuming anyone still uses the file format.
I'll take a look at that at some point.
Peter wrote:
>> There is still a similar bit of code in
>> Bio/IntelliGenetics/__init__.py which we could remove, but
>> as the whole module is now deprecated we could just wait
>> for a few releases and then remove it entirely.
I've removed the Bio.IntelliGenetics dumpfile code in CVS.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 15:01:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 11:01:41 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807161501.m6GF1fuG028930@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #26 from mdehoon at ims.u-tokyo.ac.jp 2008-07-16 11:01 EST -------
I've uploaded a fixed parser in Bio.Sequencing.Ace to CVS; feel free to have a
look and comment.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 16 15:32:03 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 16 Jul 2008 16:32:03 +0100
Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py
In-Reply-To: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com>
<729090.76301.qm@web62408.mail.re1.yahoo.com>
<320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com>
Message-ID: <320fb6e00807160832w4eef825ek3ed4cfde1cc92cd2@mail.gmail.com>
>> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO.
>
> Yes, it probably does - assuming anyone still uses the file format.
> I'll take a look at that at some point.
I've been looking at the PredictProtein site's SAF (Simple Alignment
Format) specification, which as far as I know is the only definition
(spelling errors and all). Its a free-format somewhat like PHYLIP,
and for "nice" input files parsing shouldn't be too difficult.
However, some of the corner cases they give are frankly evil, and I
wonder if Bio.Saf is actually compliant.
See http://www.predictprotein.org/Dexa/optin_safDes.html
I'd like to propose deprecating Bio.Saf on the main mailing list.
If there are people wanting to use this SAF format, we can then
worrying about implementing a non-Martel parser for this file format
in Bio.AlignIO instead - and explicitly test it can cope with all the
examples given.
Peter
P.S. I updated Bio.Saf to use the new URL for the PredictProtein site.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 16:08:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 12:08:38 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807161608.m6GG8c0s031867@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 12:08 EST -------
Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit
repetitive. Might a sub-function help here? Also, I was wondering if you
managed to fix Bug 2446 as a nice bonus.
Regarding the Bio.Sequencing.Phd changes, Michiel has now deprecated Frank &
Cymon's original scanner/consumer parser. I didn't think it make sense to
leave the original header as it was (with their old version number etc and the
suggestion to contacting them directly with bugs). They are of course still
listed in the copyright header.
New Bio.Sequencing.Phd docstring header text in CVS:
"""
Parser for PHD files output by PHRED and used by PHRAP and CONSED.
This module can be used used directly which will return Record objects
which should contain all the original data in the file.
Alternatively, using Bio.SeqIO with the "phd" format will call this module
internally. This will give SeqRecord objects for each contig sequence.
"""
Previous text:
"""
Parser for PHD files output by PHRED and used by PHRAP and CONSED.
Works fine with PHRED 0.020425.c
Version 1.1, 03/09/2004
written by Cymon J. Cox (cymon.cox at gmail.com ) and
Frank Kauff (fkauff 'AT' biologie.uni-kl.de).
Comments, bugs, problems, suggestions to one of us are welcome!
"""
Frank & Cymon - I should have asked first, but is this revised wording OK with
you?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 20:35:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 16:35:13 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200807162035.m6GKZDOn012941@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz 2008-07-16 16:35 EST -------
(In reply to comment #1)
>
> I'm guessing the missing line here was something like:
> seq_record = SeqIO.read(handle, "genbank")
Yes, I forgot to paste one line, sorry.
> > I do not see how I could access the value 'DNA' from the LOCUS line:
> > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
>
> Currently the sequence type (DNA, RNA, Protein) is used internally by the
> GenBank parser to determine the alphabet. It is not currently recorded in the
> SeqRecord object's annotation but could be. How about something like this?:
>
> seq_record.annotations["seq_type"]
I am not much familiar with the official naming of the fields in LOCUS line
by Genbank but hope you are. Yes, it would be fine for me. I hope all other
values from LOCUS line can be accessed similarly as well.
> > Could seq_record.features have a repr() function to give me something useful
> > instead of this?
> >
> > >>> print seq_record.features
> > [, > instance at 0x837b9cc>, ]
>
> Yes we could add that, but you wouldn't want to do that on a typical genome
> with thousands of features. Adding a repr method for the Reference object is
> also something I had wondered about doing.
I think it could be there even for large records. It not up to the programmer
to use repr() or not, and while testing/learning it would be really useful. Or
at least internally the routine could check for number of features and in case
there would be thousands it could print some first and then stop with a clear
message how to force for full listing.
> > I don't see documented anywhere in the biopython docs access the features,
> > pasting something like the following into docs would give a user clue where to
> > look for for values:
> >
> > >>> print seq_record.features[0].qualifiers
> > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
> > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
> > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
> > >>> print seq_record.features[1].qualifiers
> > {'gene': ['NOS']}
> > >>> print seq_record.features[2].qualifiers
> > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
> > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
> > gondii'], 'db_xref': ['GI:166831529'], 'translation':
> > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
> > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
>
> There is a minimal bit of text in what is currently Chapter 10 of the tutorial
> on the SeqFeature object. I agree, this is an area that needs improvement.
Yes I read that before but it is too short, even after reading 2.4.2, 4.2.1,
9.2 and http://biopython.org/wiki/SeqIO.
>
> Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter
> would help?
Definitely, you should pick some exceptional record having different fields,
I think the one I have shown is quite OK.
>
> > >>> print seq_record.features[3].qualifiers
> > Traceback (most recent call last):
> > File "", line 1, in
> > IndexError: list index out of range
>
> You must have only three features (indexed 0, 1 and 2) which explains the
> index error.
I knew, it was intentional. ;-)
>
> > I wonder if I could access the above dicts as seq_record.features['source']
> > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
>
> As the .type attribute, try this:
>
> for feature in seq_record.features:
> print feature.type
>>> for feature in seq_record.features:
... print feature.type
...
source
gene
CDS
>>>
>
> You can't access the features by type (e.g. seq_record.features['CDS'])
> because there is generally more than one feature of each type.
Yes, but how about seq_record.features['CDS'][index]? Could that be provided?
> P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the
> underlying Bio.GenBank parser or the SeqFeature object. I have therefore
> changed the title.
Thanks!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Thu Jul 17 01:11:43 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 17 Jul 2008 02:11:43 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
Message-ID: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Hi all,
This year I will be delivering the Biopython presentation at BOSC
2008. The current draft is attached to this email (ppt format - yuck -
but the easieast to edit).
Comments, suggestions, changes are most welcome. Just one point, the
presenation is this Saturday, so if you have any comments, please send
them soon.
There is one slide still to be completed and a few presentation/looks
issues still to be edged out.
Many thanks,
Tiago
--
"Data always beats theories. 'Look at data three times and then come
to a conclusion,' versus 'coming to a conclusion and searching for
some data.' The former will win every time."
?Matthew Simmons,
http://www.tiago.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bosc2008.ppt
Type: application/vnd.ms-powerpoint
Size: 482816 bytes
Desc: not available
URL:
From biopython at maubp.freeserve.co.uk Thu Jul 17 13:07:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 17 Jul 2008 14:07:53 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Message-ID: <320fb6e00807170607s32af2744j479eb2b2e545f454@mail.gmail.com>
> Comments, suggestions, changes are most welcome. Just one point, the
> presentation is this Saturday, so if you have any comments, please send
> them soon.
I've sent Tiago some specific comments directly (little things). One
issue which might deserve wider discussion is the project's short term
goals. I would suggest putting:
* Moving from CVS to Subversion
* Make Sequence objects more OO and string-like
* More file formats in Bio.SeqIO and Bio.AlignIO
And also perhaps the Numeric to numpy move, Bug 2251
http://bugzilla.open-bio.org/show_bug.cgi?id=2251
I subscribe to the numpy mailing list and they seem to have been
making big strides in the documentation. Also it looks like they plan
to make Travis Oliphant's "Guide to NumPy" book free after "SciPy
2008" - which probably means the August 2008 SciPy conference at
Caltech rather than EuroSciPy 2008 in July in Germany.
Peter
From tiagoantao at gmail.com Thu Jul 17 21:45:56 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 17 Jul 2008 22:45:56 +0100
Subject: [Biopython-dev] Biopython presentation at BOSC2008
In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com>
Message-ID: <6d941f120807171445t32178835n6f5dd77f11f3f004@mail.gmail.com>
Hi all,
I would like to thank all that sent comments. I used the vast majority
of comments sent, so feedback was really useful.
Tiago
On Thu, Jul 17, 2008 at 2:11 AM, Tiago Ant?o wrote:
> Hi all,
>
> This year I will be delivering the Biopython presentation at BOSC
> 2008. The current draft is attached to this email (ppt format - yuck -
> but the easieast to edit).
> Comments, suggestions, changes are most welcome. Just one point, the
> presenation is this Saturday, so if you have any comments, please send
> them soon.
>
> There is one slide still to be completed and a few presentation/looks
> issues still to be edged out.
>
> Many thanks,
> Tiago
>
> --
> "Data always beats theories. 'Look at data three times and then come
> to a conclusion,' versus 'coming to a conclusion and searching for
> some data.' The former will win every time."
> ?Matthew Simmons,
> http://www.tiago.org
>
--
"Data always beats theories. 'Look at data three times and then come
to a conclusion,' versus 'coming to a conclusion and searching for
some data.' The former will win every time."
?Matthew Simmons,
http://www.tiago.org
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:07:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:07:02 -0400
Subject: [Biopython-dev] [Bug 1999] new frame translation method
In-Reply-To:
Message-ID: <200807190007.m6J0721C023043@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1999
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:09:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:09:26 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807190009.m6J09Qm2023188@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:30:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:30:36 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807190030.m6J0Ua27024398@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:30 EST -------
(In reply to comment #2)
> {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
> u'Jos\xe9'},
If I remember right this is the string-ified representation of utf8
data when you call str() or repr() on them. One could then in upper code
try to convert it back but one has to invent the magic code. In my programs
I avoid unicode but stick to utf8 and pass it back to the user. But as I say,
you may never use print(), str(), repr() because they are not utf8/unicode
safe. That should be one of the things to be fixed in python-3.
So in summary when I do raise an exception these values will get always
printed in the above escaped form, but it is the only exception. I believe
as long as you return the values the current code is ok. But, haven't tested.
grep-ing related stuff from my programs use e.g.:
self._connection = connect(unix_socket=unix_socket, db=dbname, user=username,
passwd=password, init_command='SET AUTOCOMMIT=0', charset='utf8',
use_unicode=False)
if self._connection.character_set_name() != 'utf8':
# test whether we really have utf8 connection
raise RuntimeError, "Connection to mysql not in utf8 mode: %s" %
self._connection.character_set_name()
value = unicode(value).encode('utf8')
http://evanjones.ca/python-utf8.html
http://www.idealliance.org/proceedings/xtech05/papers/02-08-01/
http://www.amk.ca/python/howto/unicode
http://diveintopython.org/xml_processing/unicode.html
http://www.jorendorff.com/articles/unicode/python.html
from elementtree.ElementTree import parse, Element, SubElement, ElementTree
# use 'utf8' and not 'utf-8' for Element.write() !!!
# We must supply unicode values to ElementTree and not just utf8 encoded
strings.
_value_node.text = _value.decode('utf8')
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:37:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:37:36 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807190037.m6J0baGc024748@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #6 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:37 EST -------
I was just about to report this bug. I use biopython to translate EST
sequences. They are full of sequencing errors although one knows the CDS
region, still it is often interrupted by N's or by literal STOP codons. The
current implementation in biopython-1.47 broke it for me. I haven't tested the
attached patches but would propose to make this strict check optional.
Currently it seems there is no way to pass down the code some variable not to
barf in such cases. Will attach my current hack.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:38:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 18 Jul 2008 20:38:48 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200807190038.m6J0cmLK024884@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:38 EST -------
Created an attachment (id=972)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=972&action=view)
Hack not to break on Ns for unknown bases in ESTs
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:47:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 08:47:34 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191247.m6JClYEO004649@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #955 is|0 |1
obsolete| |
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:47 EST -------
(From update of attachment 955)
I've checked this code in, with the most of the assertions moved into a new
unit test. This patch is now obsolete.
Checking in Bio/Data/CodonTable.py;
/home/repository/biopython/biopython/Bio/Data/CodonTable.py,v <--
CodonTable.py
new revision: 1.9; previous revision: 1.8
done
RCS file: /home/repository/biopython/biopython/Tests/test_CodonTable.py,v
done
Checking in Tests/test_CodonTable.py;
/home/repository/biopython/biopython/Tests/test_CodonTable.py,v <--
test_CodonTable.py
initial revision: 1.1
done
RCS file: /home/repository/biopython/biopython/Tests/output/test_CodonTable,v
done
Checking in Tests/output/test_CodonTable;
/home/repository/biopython/biopython/Tests/output/test_CodonTable,v <--
test_CodonTable
initial revision: 1.1
done
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:52:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 08:52:02 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191252.m6JCq26c004896@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:52 EST -------
(In reply to comment #6)
> I was just about to report this bug. I use biopython to translate EST
> sequences. They are full of sequencing errors although one knows the CDS
> region, still it is often interrupted by N's or by literal STOP codons. The
> current implementation in biopython-1.47 broke it for me. I haven't tested the
> attached patches but would propose to make this strict check optional.
> Currently it seems there is no way to pass down the code some variable not to
> barf in such cases. Will attach my current hack.
Do you have an example which "worked" in an older version of Biopython, but is
"broken" in Biopython 1.47?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:40:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 12:40:58 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191640.m6JGew57014127@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:40 EST -------
>gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence
AAGAAAACGAGAAGGACGGGGTTATATAGTAAGGTACAAACAGGGCANNNNNNCCATTACACGACCAACT
TCTTCGCCTTGCCCTTTTTCTCAGAGTCCTTGTGCGACAGGAACTCGACCTCGGTCGCAAGAGGCCCAGC
AAGTCGCGCTCCCTCGGGGTACCCAAGCACACTCATCTTGAAATGCTTCCCAACTCCCTCAATCCTTTCC
CGCAGCCCCGCATCCTCCTCGGTCGGTGCAAGTCGCGTCCATATCGACAATCGATAAAACTGCGGCCGCG
TCGACACGATCACACCTGTAATCAGCGACGCAGACCCACTTCCACCCGTCAGCGTCGGCGATGGGTCAAA
TGTTTCCCCGATCGCAGCCAGCATCGTATACAGCCACATCTTGTCTACGTTGGGTCGGTTTTTATCTTTG
GGCAGTTGGATACTCCATTTTCCTCCAAGCTTGTTCGCCTCGTCCTCCCATGCGGGAATAATTCCCTCCT
TGAAAAGGTAATAATTTGCCTTCTGGGGCAGTTGAGATGGCGGTATGATGTTGTTATATAACCCCCAAAA
CTCCNNNNNGCTATCAAAGNNNNNGACCCGCNNNNNGTCCNCCANNNACCCTTNNNCCNNNNNANNNCCG
GNNNNNNNNNNNNTGNGGGTCNNNNNNNNNGCTNNNNNNNNNNTNNNNNG
resulted as of Aug 5 2007 in a six-frame translation
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+1
KKTRRTGLYSKVQTG***HYTTNFFALPFFSESLCDRNSTSVARGPASRAPSGYPSTLILKCFPTPSILSRSPASSSVGASRVHIDNR*NCGRVDTITPVISDADPLPPVSVGDGSNVSPIAASIVYSHILSTLGRFLSLGSWILHFPPSLFASSSHAGIIPS
LKR**FAFWGS*DGGMMLLYNPQNS**LSK**TR**S***P*****P******V***A*****
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+2
RKREGRGYIVRYKQG***ITRPTSSPCPFSQSPCATGTRPRSQEAQQVALPRGTQAHSS*NASQLPQSFPAAPHPPRSVQVASISTIDKTAAASTRSHL*SATQTHFHPSASAMGQMFPRSQPASYTATSCLRWVGFYLWAVGYSIFLQACSPRPPMRE*FPP
*KGNNLPSGAVEMAV*CCYITPKT***YQ***P****P*TL*****R*****G**********
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+3
ENEKDGVI**GTNRA**PLHDQLLRLALFLRVLVRQELDLGRKRPSKSRSLGVPKHTHLEMLPNSLNPFPQPRILLGRCKSRPYRQSIKLRPRRHDHTCNQRRRPTSTRQRRRWVKCFPDRSQHRIQPHLVYVGSVFIFGQLDTPFSSKLVRLVLPCGNNSLLEKVIICLLGQLRWRYDVVI*PPKL**AIK**DP**V***P************G**********
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-1
**********W************P***L**AQ**KLS**LKTPNILL*YGGRVDGVFRLIMEKFLP**GRTLLLRLFEPPFTS*VDGFLFLAGLHLFYTDICYDRR*PLCKLGSGCDCPPSPRRSD*CPH*HSCAGVKIANSYTCAERGWLLLRPDALS*LPQPFVKFYSHEPMGLPRAERPGERWLQLKDSVFLRLFFPFRFFNQHIT**TGQTWNDILGQEEQK
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-2
**********G*****G*****FP*T****P***NY***SKPPIYCCSMAVELTGSSV**WKSSSLNKGVPSCSACSNLLLPHRLTGFYFWLGCICSTPTYATTDASPFVNWVAAATAHLHPDAATNVHTSTAAPASK*LTAIPALNVAGSSYAPTPFPNSLNPS*SSTHTNPWGSLALNDPENAGSSSRTACS*DSFSRSASSTSTL***RDKHGMIYWGRKSKR
>gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-3
*****S***L******A*****S***P**RP**ETI**PQNPQYIVVVWR*S*RGLPFNNGKVPPLIRAYPPAPLVRTSFYLIG*RVSIFGWVASVLHRHMLRPTLAPL*TG*RLRLPTFTQTQRLMSTLAQLRRRQNS*QLYLR*TWLAPPTPRRPFLTPSTLRKVLLTRTHGAPSR*TTRRTLAPAQGQRVPETLFPVPLLQPAHY***GTNME*YIGAGRAKE
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:44:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 12:44:36 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807191644.m6JGiahx014350@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:44 EST -------
BTW, formatdb silently ignores asterisks so you have to replace them with X
yourself otherwise alignment outputs from blast do not reflect reality.
Don't know if I would prefer biopython to give me 'X' instead of '*', maybe
for codons with 'N', 'R' would prefer X while for true STOP codons would prefer
'*'. In PIR database is nice that proteins really ending at a STOP codon have a
trailing '*'.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 20:24:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 16:24:41 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807192024.m6JKOffD023599@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 16:24 EST -------
How did you do the six translations in comment 9? Using Bio.Seq.translate()
would have failed with a TranslationError on any "NNN" codon or similar.
By common agreement "*" is used for a stop symbol. While "X" generally means
any amino acid, I have somethimes seen it used to mean any amino acid OR a stop
codon (in the NCBI translations in certain GenBank files).
Personally I think it would be nice if there was an agreed character for an
amino acid OR stop codon (e.g. "!" for example). However, as far as I know no
such convention exists, so we shouldn't invent one as the default in Biopython.
P.S. The nicest way to handle translate("NNN") isn't what I filed this bug
about. Its the fact that translate("{@}") or anything else like that returns
"*" and not an error.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 19 21:40:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 19 Jul 2008 17:40:35 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807192140.m6JLeZQR025907@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #12 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 17:40 EST -------
Created an attachment (id=973)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view)
translate_ESTs.py
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:46:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 10:46:23 -0400
Subject: [Biopython-dev] [Bug 2547] New: Translation of ambiguous codons
like NNN and TAN
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
Summary: Translation of ambiguous codons like NNN and TAN
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
It is often useful to want to translate ambiguous nucleotide sequences (e.g.
EST sequences), and this may contain codons which could code for an amino acid
OR a stop codon (e.g. NNN, TNN or TAN).
See for example Bug 2530 comment 6 and comment 9.
Currently Bio.Seq.translate() will not translate such sequences and raises an
exception.
The following example shows correct translation of ambiguous codons which only
encode valid amino acid(s) OR valid stop codons (but not both):
from Bio.Seq import translate
assert translate("TAA") == "*"
assert translate("TAG") == "*"
assert translate("TAT") == "Y"
assert translate("TAC") == "Y"
#Recall ambiguous nucleotide Y means T or C (pYrimidine)
#so TAY = TAT or TAC which both code for Y (Tyr, Tyrosine)
assert translate("TAY") == "Y"
#Recall ambigous nucleoide R means G or A (puRine)
#so TAR = TAG or TAA which both code for a stop codon
assert translate("TAR") == "*"
However, in Biopython 1.47 the following all raise an exception:
translate("TAN")
translate("TAM")
translate("TAK")
translate("TRR")
translate("TNN")
translate("NNN")
TAN, TAM, TAK, ... can code for Y or stop. More generally, "TRR" and "TNN" can
code multiple amino acids or a stop codon, and "NNN" can code for any amino
acid or a stop codon.
According to IUPAC, the single letter protein code X is an "unknown or 'other'
amino acid" (igoring its historic and obsolete usage for selenocysteine, now
U).
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html
This document does NOT cover the idea of stop codons, and I am not aware of any
additional symbol to mean "any amino acid OR a stop codon" which would be ideal
for this situation.
For comparison, the EMBOSS transeq tool will use X when given a codon which
could be either an amino acid OR a stop codon:
$ transeq -filter asis:NNNTANTARTAGTAYTAC
XX**YY
Therefore one solution would be to follow EMBOSS and return X for codons which
could be an amino acid OR a stop codon.
See also Bug 2530 on the related issue that Bio.Seq.translate() currently
translates invalid codons as "*" (presumably an accidental side effect of the
implementation).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:50:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 10:50:22 -0400
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To:
Message-ID: <200807201450.m6KEoMVZ017607@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2530
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 10:50 EST -------
Martin,
I've filed Bug 2547 ("Translation of ambiguous codons like NNN and TAN") on the
separate issue of wanting to translate ambigous codons as found in EST
sequences.
This bug (Bug 2530) is only for the mis-translation of invalid codons as stop
characters.
If there is agreement that changing the behaviour of Bio.Seq.translate() as
described in Bug 2547 is desirable, then we end up fixing both issues at the
same time.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Sun Jul 20 15:03:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 20 Jul 2008 16:03:48 +0100
Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid
codons as stops
In-Reply-To: <200807192140.m6JLeZQR025907@portal.open-bio.org>
References:
<200807192140.m6JLeZQR025907@portal.open-bio.org>
Message-ID: <320fb6e00807200803v57820ab8v2502d6e5671933cc@mail.gmail.com>
> ------- Comment #12 from mmokrejs -------
> Created an attachment (id=973)
> --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view)
> translate_ESTs.py
Martin,
I had some general comments on your code which you might find helpful.
Most of your variable name start with an underscore - this is very
unusual. There is a convention in Python that a single leading
underscore is used for private properties or methods of an object.
You used the following code to reverse a string by turning it into a
list and back again:
_reversed = list(_record.sequence)
_reversed.reverse()
_reversed = ''.join(_reversed)
For simply reversing a string, I would suggest using a stride of minus
one instead, reversed_string = old_string[::-1]
You then go on to take the reverse complement (without worrying about
ambiguous characters which could be present, e.g. R -> Y):
_reversed = list(_record.sequence)
_reversed.reverse()
_reversed = ''.join(_reversed)
_reversed =
_reversed.translate(string.maketrans('AaTtGgCcUu', 'TtAaCcGgAa'), '')
I would suggest using the Bio.Seq.reverse_complement() function here instead.
Finally are you aware of the string formatting operator (%) in python?
The following code:
_outprothandle.write(''.join(('>', _record.gi, ' ',
_record.definition, ' frame:-3', '\n',
translate(_reversed[2:]).replace('*','X'), '\n')))
might typically be written as:
_outprothandle.write('>%s %s frame:-3\n%s\n" % (_record.gi,
_record.definition, translate(_reversed[2:]).replace('*','X')))
See http://docs.python.org/lib/typesseq-strings.html for more details
(and how to use named insertion points).
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:08:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:08:22 -0400
Subject: [Biopython-dev] [Bug 2548] New: Updating IUPACData and
ExtendedIUPACProtein for U and O
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
Summary: Updating IUPACData and ExtendedIUPACProtein for U and O
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
The IUPAC data in Biopython has not been updated to officially use X for any
amino acid and U for selenocysteine (Sec). Nor do we support O for pyrrolysine
(Pyl) .
I haven't found an official statement from the IUPAC-IUBMB Joint Commission on
Biochemical Nomenclature via Google, but several major resources confirm this:
http://www.ebi.ac.uk/RESID/faq.html
http://www.uniprot.org/news/2008/02/26/release
http://doc.bioperl.org/bioperl-live/Bio/Tools/IUPAC.html
Patch to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:26:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:26:10 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807201626.m6KGQAQZ021741@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:26 EST -------
See also: http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html
Taking the following as the current IUPAC standard, there is no direct mention
of the use of J in NMR as designation for signals assigned either to leucine
(L) or to isoleucine (I) which cannot be distinguished from each other.
I am therefore NOT intending to add J to Biopython's IUPAC extend protein
alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:54:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 12:54:51 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807201654.m6KGsp7L022759@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:54 EST -------
Created an attachment (id=974)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=974&action=view)
Adds U and O, clearly defines X, but does not add J
Does anyone have any definative sources on the MW of these "new" amino acids?
Also I'd like to confirm if IUPAC have officially accepted "J" or not.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 18:30:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 14:30:17 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807201830.m6KIUHMb028714@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mmokrejs at ribosome.natur.cuni
| |.cz
------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz 2008-07-20 14:30 EST -------
Regarding the selenocystein issue, expect "inconsistencies" between data files
released from NCBI. I haven't check now but in 2002 I had the following
communication with NCBI staff:
GenBank format requires official IUPAC amino acid code that doesn't include
Selenocystein and therefore it uses 'X'. FASTA format uses the NCBI extended
amino acid code that does include Selenecystein 'U'.
> >gi_2983532 formate dehydrogenase alpha subunit [Aquifex aeolicus]
> MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG
> AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV
> KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT
> ------------------------^
[cut]
>
> It seems there's buggy version in
> ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa
> although the .gbk flatfile says "X" in case of "U".
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 20 21:16:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 20 Jul 2008 17:16:48 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807202116.m6KLGmdb005982@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #974 is|0 |1
obsolete| |
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 17:16 EST -------
Created an attachment (id=975)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=975&action=view)
Tested version of previous patch
This revision includes a work arround for missing molecular weights in
_make_ambiguous_ranges() function, and has been tested with the full test suite
on Linux.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 10:55:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 06:55:13 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211055.m6LAtDHp009314@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 06:55 EST -------
(In reply to comment #1)
> Regarding the selenocystein issue, expect "inconsistencies" between data files
> released from NCBI. I haven't check now but in 2002 I had the following
> communication with NCBI staff ...
I think you meant to post this on Bug 2548.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:04:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:04:14 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211104.m6LB4E0w009769@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-21 07:04 EST -------
Yes, sorry. :(
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:10:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:10:02 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211110.m6LBA2H8010005@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:10 EST -------
I've gone over the GenBank release notes on this issue...
Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb131.release.notes (Dated
August 15 2002, similar text appears in earlier files too as a warning of
intended changes)
==============================================================
1.3.3 Selenocysteine representation
At the May 1999 DDBJ/EMBL/GenBank collaborative meeting, it was learned
that IUPAC plans to adopt the letter 'U' for selenocysteine.
With this August 2002 release, selenocysteine residues are now presented
via residue abbreviation 'U', in both /translation and /transl_except
qualifiers.
==============================================================
By now they SHOULD have fixed any sequences which were using X for
selenocysteine to use U instead.
Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb156.release.notes (Dated
October 15 2006, similar text appears in earlier files too as a warning of
intended changes)
==============================================================
1.3.4 New protein residue abbreviation for Pyrrolysine
Sequence databases use single-letter amino acid abbreviations to
record the primary structure (sequence) of amino acids in a polypeptide.
The table of abbreviations includes only those amino acids that are
encoded in the genetic code and directly inserted by a tRNA during the
process of protein translation. Post-translational modifications are
not represented in the sequence data itself, but may be described by
features annotated on the sequence.
The discovery of the 22nd naturally encoded amino acid, pyrrolysine,
and the recent submission of sequence records that should contain
this residue, require the adoption of a new amino acid abbreviation.
Because several letters are assigned to represent different experimental
ambiguities, the only letter still available for use is O (uppercase
letter o). Scientists working in the field have independently suggested
use of this letter, and it has a reasonable mnemonic, pyrrOlysine.
The IUPAC-IUBMB Joint Commission on Biochemical Nomenclature has agreed
that Pyl/O will be recommended for this amino acid.
The consequences for flatfile users are that O can now appear in CDS
/translation qualifiers, and that Pyl (the three-letter abbreviation)
can appear in CDS /transl_except qualifiers and in the /product and
/anticodon qualifiers of tRNA features. These changes are legal as of this
October 2006 GenBank Release.
Sample ASN.1, FASTA, GenBank flatfile, and INSDSeq XML files for CP000099,
which has a protein with a pyrrolysine residue, are available for testing
purposes at the NCBI FTP site:
ftp://ftp.ncbi.nih.gov/genbank/Pyrrolysine_Samples
Files:
CP000099.pse (print-form ASN.1 Seq-entry)
CP000099.gbff (GenBank flatfile)
CP000099.aa_fsa (protein FASTA)
CP000099.isx (INSDSeq XML)
==============================================================
And later on in the same file,
==============================================================
1.3.5 Protein residue J for leucine/isoleucine ambiguities
The residue abbreviation J is reserved for mass spectrometry experiments that
cannot distinguish leucine from isoleucine. Although this abbreviation has
been part of the IUPAC recommendations for some time, it has not previously
appeared in protein sequences in the GenBank database.
As of October 2006, abbreviation J is legal in CDS /translation qualifiers,
and Xle (the three-letter abbreviation) will be allowed in CDS /transl_except
qualifiers and in the /product and /anticodon qualifiers of tRNA features.
J will also be mapped to unknown (X) for the purpose of BLAST and other
sequence similarity search tools.
==============================================================
So, according to GenBank, "The residue abbreviation J is reserved for mass
spectrometry experiments that cannot distinguish leucine from isoleucine ...
this abbreviation has been part of the IUPAC recommendations for some time".
I would prefer a direct citation, but that seems good enough evidence to me to
include J in the Biopython IUPAC extended protein alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:18:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:18:12 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211118.m6LBICM8010531@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:18 EST -------
Regarding Martin's example (erroneously added to Bugzilla as Bug 2547 comment
1), the protein GI:2983532
Martin wrote "GenBank format requires official IUPAC amino acid code that
doesn't include Selenocystein and therefore it uses 'X'."
That is out of date - IUPAC and GenBank both accept U for selenocysteine now
(see my notes in comment 4 of this bug).
Looking at these files:
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.gbk
(feature translation)
They both give the same amino acid sequence for GI:2983532, which includes "U"
but not "X" as I had expected.
>gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [Aquifex aeolicus VF5]
MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG
AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV
KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT
FGRGAMTNNWVDISNSDLVFVMGGNPAENHPCGFKWAIKAREKRGAKIICIDPRFNRTAAVADIFVQIRP
GTDIAFLGGLINYVLQNEKYQKEYVRLHTTGPFIVREDFGFKDGLFTGYDPKTRSYDTTTWDYEFDPATG
YPKMDPEMKHPRCVLNILKEHYSRYTPEVVSQICGCSKEDFLRVAEEVAKCGAPNKFMTILYALGWTHHS
YGTQLIRTACMLQLLLGNIGCPGGGINALRGHSNVQGMTDLAGQNKNLPTYIKPPKPEEQTLAQHLKNRT
PRKLHPTSLNYWANYPKFFISFLKCMWGDAATPENDFAYDYLYKPEGGYNSWDKFIDDMYKGKIEGVVTA
ALNFLNNTPNAKKTVRALKNLKWMVVMDPFMIETAQFWKAEGLDPKEVKTEILVLPTAVFLEKEGSFTNS
ARWVKWKYKATDPPGDAKDEFWIFGRFFMKLKEFYEKEGGAFPEPILNLVWPYKNPYYPTAEEILTEING
YYTRDVDGHKKGERVRLFTDLRDDGSTACGGWLYCGVFPPEGNLAKRTDLSDPLGLGTYPNYAWNWPANR
RVLYNRASCDEKGRPWDPERPLLRWDPERDMWVGDIPDYPATAPPEKGIGAFIMLPEGKGRLFAAKSYVT
FKDGPLPEHYEPYESPVTNILHPNVPHNPVAKVYKSDLDLLGTPDKFPHVATTYRLTEHYHFWTKHLYGP
SLLAPVMFIEIPEELAKEKGIQNGDLVRVSTARASIEAIALVTKRIKPLKVAGKTVYTIGIPIHWGFEGL
VKGAITNFITPNVWDPNSRTPEFKGFLANIEKVKT
It is quite possible that during the transition from X to U for selenocysteine
there were inconsistencies in GenBank - but I hope/expect the NCBI have fixed
them all by now.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:49:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 07:49:21 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807211149.m6LBnLli012323@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #975 is|0 |1
obsolete| |
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:49 EST -------
Created an attachment (id=976)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=976&action=view)
Adds J, U and O, and clearly defines X as an unknown amino acid
Based on the GenBank release notes indirect confirmation that J is now an IUPAC
recommendation, I have updated my patch to include J as well. Note that this
requires a trivial update to test_seq.py (included in this patch).
Still ideally needs the MW filled in for U and O.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:25:59 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211525.m6LFPxgs022821@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:25 EST -------
I've managed to cobble together my first ever Perl program from scratch, and
established that BioPerl does the same as EMBOSS - they use an "X" when the
codon could be either an amino acid OR a stop codon.
My quick BioPerl script,
================================================
use Bio::Seq
$nuc_str = 'NNNTANTARTAGTAYTAC';
print "BioPerl translation of:\n";
$seq_obj = Bio::Seq->new(-seq => $nuc_str);
print $seq_obj->seq();
print "\n\n";
print "Sequence object's translation method:\n";
print $seq_obj->translate()->seq();
print "\n\n";
use Bio::Perl;
print "translate_as_string:\n";
print translate_as_string($nuc_str);
print "\n";
================================================
And the output:
================================================
BioPerl translation of:
NNNTANTARTAGTAYTAC
Sequence object's translation method:
XX**YY
translate_as_string:
XX**YY
================================================
There does seem to be a consensus building here!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:38:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:38:03 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807211538.m6LFc327023466@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:38 EST -------
For comparison, the following is copied from the BioPerl documentation about
their sequence object's translate method. It would be nice to follow some of
the same naming conventions for any optional arguments.
http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods
If we want to translate full coding regions (CDS) the way major nucleotide
databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform
more checks. Specifically, translate() needs to confirm that the sequence has
appropriate start and terminator codons at the very beginning and the very end
of the sequence and that there are no terminator codons present within the
sequence in frame 0. In addition, if the genetic code being used has an
atypical (non-ATG) start codon, the translate() method needs to convert the
initial amino acid to methionine. These checks and conversions are triggered by
setting ``complete'' to 1:
$prot_obj = $my_seq_object->translate(-complete => 1);
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:41:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 21 Jul 2008 11:41:47 -0400
Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN
and TAN
In-Reply-To:
Message-ID: <200807211541.m6LFflk5023670@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:41 EST -------
For reference, using the older Bio.Translate approach suffers the same
limitation (which is not surprising if you consider they both use the same
CodonTable objects internally):
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> from Bio import Translate
>>> standard_translator = Translate.ambiguous_dna_by_id[1]
The clear cut cases are fine,
>>> standard_translator.translate(Seq("TAR", IUPAC.ambiguous_dna))
Seq('*', HasStopCodon(ExtendedIUPACProtein(), '*'))
>>> standard_translator.translate(Seq("TAY", IUPAC.ambiguous_dna))
Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*'))
When the codon could be an amino acid or a stop, we raise an exception:
>>> standard_translator.translate(Seq("NNN", IUPAC.ambiguous_dna))
Traceback (most recent call last):
...
Bio.Data.CodonTable.TranslationError: NNN
>>> standard_translator.translate(Seq("TAN", IUPAC.ambiguous_dna))
Traceback (most recent call last):
...
Bio.Data.CodonTable.TranslationError: TAN
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 11:32:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 07:32:10 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807221132.m6MBWAAF016950@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-07-22 07:32 EST -------
(In reply to comment #27)
> Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit
> repetitive. Might a sub-function help here?
I thought about that, but each time the repetitive code is slightly different,
and I wonder if the end result will be clearer than what we have now.
> Also, I was wondering if you managed to fix Bug 2446 as a nice bonus.
I am planning to do so. I am checking with the polyphred people if the COMMENT
blocks are really intended and are here to stay (note that the polyphred
version that writes these COMMENT blocks is a beta version). Will update the
code once I hear back from them.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Tue Jul 22 11:38:13 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 22 Jul 2008 04:38:13 -0700 (PDT)
Subject: [Biopython-dev] Bio.KDTree
Message-ID: <108429.69921.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't compile cleanly on all platforms (for example it is missing in the Biopython installer for Python 2.3 on Windows); some platforms don't even have a C++ compiler. For this reason, setup.py asks the user each time if Bio.KDTree should be compiled. Does anybody (Thomas?) mind if I convert this code to plain C? That would be a nice weekend project. Then we can get rid of the question in setup.py, and Bio.KDTree can be made available on all platforms.
--Michiel.
From biopython at maubp.freeserve.co.uk Tue Jul 22 16:13:34 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Jul 2008 17:13:34 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
Message-ID: <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
On June 27, Michiel wrote:
> ..., there is also a set of interconnected modules where it's not 100%
> clear if they can be removed without causing some surprises:
> Bio.builders
> Bio.config
> Bio.dbdefs
> Bio.formatdefs
> Bio.dbdefs
> Bio.expressions
> Bio.FormatIO [already deprecated and removed]
> Bio.Std
> Bio.StdHandler
> It is probably OK to remove these, since these were deprecated we did
> not get a barrage of complaints from our users. Personally, I think it is
> important to keep the code base clean, so I am in favor of removing
> these (and see if anybody complains; in that case, we can always put
> these modules back in and make a new release). But I can live with
> keeping these modules for another release round. If anybody thinks
> that that would be better, please let us know.
Bio.expressions was already deprecated, and seems to be a dependency
of the following modules, which I have now explicitly deprecated in
CVS:
Bio.expressions (deprecated in Biopython 1.44)
Bio.config
Bio.dbdefs
Bio.formatdefs
Bio.dbdefs
It probably would be fine to remove these five modules now
(Bio.expressions, Bio.config, Bio.dbdefs, Bio.formatdefs and
Bio.dbdefs), since the indirect warning from Bio.expressions should
have alerted anyone who was using them. Or we can ship one more
release with them included?
Moving on, Bio.Std and Bio.StdHandler appear to be used by:
- Bio.expressions (deprecated in Biopython 1.44)
- Bio.config (now deprecated in CVS)
- Bio.builders (used by Mindy)
- Bio.Mindy (used by Bio.config which is now deprecated)
As far as I can tell, other historic usage of Mindy (e.g. in Bio.Fasta
and Bio.GenBank) has already been deprecated and removed. I think it
would therefore also be safe to deprecate these four together
(Bio.expressions, Bio.config, Bio.builders and Bio.Mindy), or start by
deprecating Bio.Mindy on its own.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 16:29:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 12:29:27 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807221629.m6MGTRuo002799@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-22 12:29 EST -------
Frank,
Would you mind if I removed this print statement from the add_sequence()
method?:
print "WARNING: Sequence name %s is already present. Sequence was added as %s."
% (name,unique_name)
I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to
write alignments, without getting warnings printed out.
Thanks
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 22 16:33:53 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 22 Jul 2008 17:33:53 +0100
Subject: [Biopython-dev] Bio.KDTree
In-Reply-To: <108429.69921.qm@web62404.mail.re1.yahoo.com>
References: <108429.69921.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00807220933v1e6125a7lcb91b963a5dd5195@mail.gmail.com>
On Tue, Jul 22, 2008 at 12:38 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't
> compile cleanly on all platforms (for example it is missing in the Biopython
> installer for Python 2.3 on Windows); some platforms don't even have a C++
> compiler. For this reason, setup.py asks the user each time if Bio.KDTree
> should be compiled. Does anybody (Thomas?) mind if I convert this code to
> plain C? That would be a nice weekend project. Then we can get rid of the
> question in setup.py, and Bio.KDTree can be made available on all platforms.
If you want to spend your weekend doing this, it does sounds like a
worthwhile incremental improvement to Biopython - and should simplify
the build process which is great.
Peter
P.S.
Have you noticed Bug 2489 "KDTree NN search without specifying radius"?
http://bugzilla.open-bio.org/show_bug.cgi?id=2489
From bugzilla-daemon at portal.open-bio.org Tue Jul 22 23:50:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 19:50:31 -0400
Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the
Seq object (in Bio.Seq)
In-Reply-To:
Message-ID: <200807222350.m6MNoVXd024298@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #15 from mmokrejs at ribosome.natur.cuni.cz 2008-07-22 19:50 EST -------
(In reply to comment #5)
> Another bonus for people who think OO, is doing dir(my_seq) would
> list these useful methods. Right now the user has to know to go
> looking in the Bio.Seq module for a function.
I do this quite often and this is a weak point in current biopython. Good
catch!
Regarding the back_translate, I don't use it but people ask for it often so
don't remove it. Otherwise I won't know where else to get this functionality.
;-)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 00:05:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 22 Jul 2008 20:05:09 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807230005.m6N059QE025415@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #12 from fkauff at biologie.uni-kl.de 2008-07-22 20:05 EST -------
Peter,
No problem.
Cheers,
Frank
(In reply to comment #11)
> Frank,
>
> Would you mind if I removed this print statement from the add_sequence()
> method?:
>
> print "WARNING: Sequence name %s is already present. Sequence was added as %s."
> % (name,unique_name)
>
> I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to
> write alignments, without getting warnings printed out.
>
> Thanks
>
> Peter
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 11:49:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 07:49:33 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807231149.m6NBnX4P014410@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 07:49 EST -------
(In reply to comment #12)
> Peter,
>
> No problem.
>
> Cheers,
> Frank
Great. I've removed that print statement (and tweaked a few doc strings) in
CVS.
Checking in Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.19; previous revision: 1.18
done
I'm just working on some alphabet stuff before adding support to write "nexus"
format files with Bio.SeqIO and Bio.AlignIO
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 12:33:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 08:33:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807231233.m6NCXAk6018007@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 08:33 EST -------
Fixed in CVS - you can now write Nexus files using Bio.SeqIO or Bio.AlignIO,
provided the alphabet is declared as DNA, RNA or protein. You cannot use
generic alphabets or just nucleotide alphabets.
Multiple files have been changed, so a complete CVS update is the best way to
test this before the next release of Biopython.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 23 14:12:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 23 Jul 2008 10:12:38 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200807231412.m6NECc33027073@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-23 10:12 EST -------
I recently got some code that is supposed to be able to deal with labeled nodes
(probably from the author of this bug - can't check now, as I'm traveling and
don't have access to the files). haven't looked at or tested the code yet, but
will do soon when I'm back.
Frank
(In reply to comment #1)
> This sounds like a job for Frank (the Bio.Nexus module author).
>
> Can I ask if you've actually come across trees with names ancestor nodes in
> "real life"? That would make this bug more important. If so, the name of the
> tool would be interesting, an example tree file would be great to add to
> Biopython as a test case.
>
> If on the other hand the only named ancestor tree you've ever tried is the
> example from the Newick documentation, this doesn't seem such a high priority
> (but still worth fixing).
>
> Peter
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Jul 24 11:41:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 24 Jul 2008 12:41:41 +0100
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
Message-ID: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
Hi all,
We (Michiel) deprecated the Bio.WWW.* modules in Biopython 1.45, after
relocating most of the functionality:
Bio.WWW.ExPASy -> Bio.ExPASy
Bio.WWW.InterPro -> Bio.InterPro
Bio.WWW.NCBI -> Bio.Entrez
Bio.WWW.SCOP -> Bio.SCOP
Now that the deprecation warnings have been in place for a couple of
releases, I'd like to remove the four Bio.WWW.* modules, and leave
just Bio/WWW/__init__.py with a deprecation warning telling people
where to look for the relocated code.
Any comments or objections?
Peter
From mjldehoon at yahoo.com Fri Jul 25 00:19:33 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 24 Jul 2008 17:19:33 -0700 (PDT)
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
In-Reply-To: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
Message-ID: <502434.4415.qm@web62406.mail.re1.yahoo.com>
Note that Bio.WWW.__init__.py contains some code that is used in other modules. Most (but not all) of these modules are deprecated themselves. For the non-deprecated modules, it's probably easiest to just copy the code from Bio.WWW.__init__.py over to avoid having to import Bio.WWW.
--Michiel.
--- On Thu, 7/24/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
> To: "BioPython-Dev Mailing List"
> Date: Thursday, July 24, 2008, 7:41 AM
> Hi all,
>
> We (Michiel) deprecated the Bio.WWW.* modules in Biopython
> 1.45, after
> relocating most of the functionality:
>
> Bio.WWW.ExPASy -> Bio.ExPASy
> Bio.WWW.InterPro -> Bio.InterPro
> Bio.WWW.NCBI -> Bio.Entrez
> Bio.WWW.SCOP -> Bio.SCOP
>
> Now that the deprecation warnings have been in place for a
> couple of
> releases, I'd like to remove the four Bio.WWW.*
> modules, and leave
> just Bio/WWW/__init__.py with a deprecation warning telling
> people
> where to look for the relocated code.
>
> Any comments or objections?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Fri Jul 25 10:31:49 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 25 Jul 2008 11:31:49 +0100
Subject: [Biopython-dev] Updating the installation instructions
Message-ID: <320fb6e00807250331k47ec64dcoe246933f0d02682b@mail.gmail.com>
As Nick Matzke has pointed out,
http://biopython.org/DIST/docs/install/Installation.html and
http://biopython.org/DIST/docs/install/Installation.pdf are somewhat
out of date.
I've updated the source LaTeX file in CVS to cover python 2.5 being
the latest stable python, mxTextTools is now optional (but 2.0 is
preferred over 3.0), and removed the bits about the "Classic" Mac (pre
OS X).
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/install/Installation.tex?cvsroot=biopython
The reportlab instructions probably need updating too - although we
should double check if everything is happy with ReportLab 2 as part of
this.
If anyone wants to skim over the revised version and look for anything
I've missed or other improvements that would be great.
Peter
From biopython at maubp.freeserve.co.uk Fri Jul 25 11:21:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 25 Jul 2008 12:21:31 +0100
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
In-Reply-To: <502434.4415.qm@web62406.mail.re1.yahoo.com>
References: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
<502434.4415.qm@web62406.mail.re1.yahoo.com>
Message-ID: <320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com>
On Fri, Jul 25, 2008 at 1:19 AM, Michiel de Hoon wrote:
> Note that Bio.WWW.__init__.py contains some code that is used in other modules.
> Most (but not all) of these modules are deprecated themselves. For the
> non-deprecated modules, it's probably easiest to just copy the code from
> Bio.WWW.__init__.py over to avoid having to import Bio.WWW.
Good catch - I didn't do my recursive grep correctly. The file
Bio/WWW/__init__.py just contains a RequestLimiter class, and this is
currently used in:
Bio/Blast/NCBIWWW.py (used in qblast, simple to recode as in Bio.Entrez)
Bio/config/_support.py (completely deprecated)
Bio/Prosite/__init__.py (in the deprecated ExPASyDictionary class)
Bio/SwissProt/SProt.py (in the deprecated ExPASyDictionary class)
Note I have just updated Bio.Prosite and Bio.SwissProt to use
Bio.ExPASy rather than Bio.WWW.ExPASy which means we can delete the
deprecated Bio/WWW/ExPASy.py, InterPro.py, NCBI.py and SCOP.py now.
Peter
From bugzilla-daemon at portal.open-bio.org Sat Jul 26 22:05:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 26 Jul 2008 18:05:24 -0400
Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and
ExtendedIUPACProtein for U and O
In-Reply-To:
Message-ID: <200807262205.m6QM5Ow9021435@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2548
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-26 18:05 EST -------
Checking in Bio/Alphabet/IUPAC.py;
/home/repository/biopython/biopython/Bio/Alphabet/IUPAC.py,v <-- IUPAC.py
new revision: 1.3; previous revision: 1.2
done
Checking in Bio/Data/IUPACData.py;
/home/repository/biopython/biopython/Bio/Data/IUPACData.py,v <-- IUPACData.py
new revision: 1.5; previous revision: 1.4
done
Checking in Tests/test_seq.py;
/home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py
new revision: 1.15; previous revision: 1.14
done
Marking as fixed, although still ideally needs the MW filled in for U and O.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:30:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 11:30:37 -0400
Subject: [Biopython-dev] [Bug 2550] New: Alphabet problems when adding
sequences
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
Summary: Alphabet problems when adding sequences
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
#Create three sequences as Seq objects,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
#Now try adding them together...
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
95, in contains
return other.gap_char == self.gap_char and \
AttributeError: DNAAlphabet instance has no attribute 'gap_char'
I would expect to get:
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
Similar example, but using proteins
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
#Now try adding these together...
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
77, in __add__
elif other.alphabet.contains(self.alphabet):
File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line
110, in contains
return other.stop_symbol == self.stop_symbol and \
AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol'
Here is an example of a more reasonable failure,
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "", line 1, in ?
File
"/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line
80, in __add__
raise TypeError, ("incompatable alphabets", str(self.alphabet),
TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')",
"Gapped(IUPACUnambiguousDNA(), '.')")
I am OK with this failing with a TypeError. However, one might argue that
reverting to a generic DNA alphabet with no declared alphabet was desirable:
Seq("AC-TGAC.TG", DNAAlphabet()))
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:59:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 11:59:50 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271559.m6RFxoej018165@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 11:59 EST -------
Trying to fix this by chaning the Alphabet and AlphabetEncoder classes'
contains method only is nasty, and wouldn't cover situations like this:
p = Seq("PKL-PAK", Gapped(generic_protein,"-"))
q = Seq("ADKS*", HasStopCodon(generic_protein,"*"))
where you might expect something like:
p+q == Seq("PKL-PAKADKS*", HasStopCodon(Gapped(generic_protein,"-"),"*")
Taken literally, neither of these two alphabets contains the other - but there
is a fairly obvious consensus alphabet! I think the best solution would
require changes to the Seq object's add method to pick a consensus alphabet in
the non-simple cases where one alphabet is clearly a sub-set of the other.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 18:54:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 14:54:01 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271854.m6RIs1wZ025718@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:54 EST -------
Created an attachment (id=977)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=977&action=view)
Patch to Bio/Seq.py and Bio/Alphabet/__init__.py
This uses some (private) alphabet functions in Bio/Alphabet/__init__.py (where
I have already put a few bits extracted from or used by Bio.Align and
Bio.AlignIO), and makes the old Alphabet .contains method effectively obsolete.
Test case update to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 18:56:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 14:56:47 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271856.m6RIulpl025828@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:56 EST -------
Created an attachment (id=978)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=978&action=view)
Patches for test_seq.py and test_GACrossover.py
Adds a new block of tests to test_seq.py to explicitly check a number of
different alphabet combinations.
Also tweaks test_GACrossover.py to define its test alphabet as a subclass of a
suitable generic class in Bio.Alphabet, as otherwise it is not recognised as a
valid alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 27 19:06:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 27 Jul 2008 15:06:22 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200807271906.m6RJ6MBk026364@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 15:06 EST -------
With the patch, repeating the example in my comment 0,
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+c
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
i.e. All the above additions work now.
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*'))
These work too.
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 199,
in _consensus_alphabet
raise ValueError("More than one gap character present")
ValueError: More than one gap character present
The error message has changed (and is more explicit), but I think this is a
real failure case.
Then based on the example in my comment 1,
>>> p = Seq("PKL-PAK", Alphabet.Gapped(Alphabet.generic_protein,"-"))
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> p+q
Seq('PKL-PAKADKS*', HasStopCodon(Gapped(ProteinAlphabet(), '-'), '*'))
This works now too.
One final example of a valid failure:
>>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*"))
>>> r = Seq("SRFG@", Alphabet.HasStopCodon(Alphabet.generic_protein,"@"))
>>> q+r
Traceback (most recent call last):
File "", line 1, in ?
File "Bio/Seq.py", line 78, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 208,
in _consensus_alphabet
raise ValueError("More than one stop symbol present")
ValueError: More than one stop symbol present
I'd be grateful if anyone could test this, or comment on the code. While
adding private functions to Bio.Alphabet is a reasonable short term solution
(and means we can change arguments and names without breaking people's
scripts!), some of this functionality might be best exposed publically.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:26:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:26:03 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200807280926.m6S9Q3Cn032456@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #943 is|0 |1
obsolete| |
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:26 EST -------
(From update of attachment 943)
Checked in as part of
Bio/Align/Generic.py revision 1.10
Adding __len__ would also be sensible, and perhaps __nonzero__ (which could
check the number of rows AND columns).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:37:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:37:27 -0400
Subject: [Biopython-dev] [Bug 2551] New: Adding advanced __getitem__ to
generic alignment, e.g. align[1:2, 5:-5]
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2551
Summary: Adding advanced __getitem__ to generic alignment, e.g.
align[1:2,5:-5]
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BugsThisDependsOn: 2507
I'm filing this as a separate sub-issue from Bug 1944. The idea is to enhance
the minimal __getitem__ method now in CVS to allow accessing of rows
(sequences), columns, or sub-alignments.
A possible __getitem__ doc string:
Depending on the indices, you can get a SeqRecord object
(representing a single row), a Seq object (for a single columns),
a string (for a single characters) or another alignment
(representing some part or all of the alignment).
align[r,c] gives a single character as a string
align[r] gives a row as a SeqRecord
align[r,:] gives a row as a SeqRecord
align[:,c] gives a column as a Seq (using the alignment's alphabet)
align[:] and align[:,:] give a copy of the alignment
Anything else gives a sub alignment, e.g.
align[0:2] or align[0:2,:] uses only row 0 and 1
align[:,1:3] uses only columns 1 and 2
align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2
Doing this nicely will build on adding annotation aware slicing support to the
SeqRecord, which is Bug 2507.
There is some __getitem__ code on Bug 1944 Attachment 732 and Bug 1944
Attachment 770.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:37:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:37:29 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200807280937.m6S9bTY8000615@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |2551
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:48:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:48:56 -0400
Subject: [Biopython-dev] [Bug 2552] New: Adding alignments
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2552
Summary: Adding alignments
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
This is related to the very broad alignment bug 1944.
Given two alignments, it can make sense to talk about adding them together.
However we can either add by row, or by column.
e.g. Consider this alignment, a
DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
Doing a+a by column would give:
DNAAlphabet() alignment with 3 rows and 28 columns
ACGATCAGCTAGCTACGATCAGCTAGCT Alpha
CCGATCAGCTAGCTCCGATCAGCTAGCT Beta
ACGATGAGCTAGCTACGATGAGCTAGCT Gamma
This sort of operation is often done to combined alignments from multiple genes
(after first sorting the rows to ensure the species names are in the same
order). To implement this would ideally require the ability to add SeqRecord
objects together, doing something sensible with the annotation and in
particular the identifies.
Doing a+a by row would give:
DNAAlphabet() alignment with 6 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
This particular example, a+a, is perhaps unrealistic due to the repeated
identifiers, but I imagine there are some real use cases for this operation.
More generally, suppose we have two alignments a and b. Treating each
alignment as a list of SeqRecord objects, you might expect:
a.extend(b) -> addition by row
a+b -> addition by row
However, I would suggest for alignment objects:
a.extend(b) -> addition by row, requires sequence all be same length (same
number of columns)
a+b -> addition by column, requires same number of sequences (rows)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:53:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:53:34 -0400
Subject: [Biopython-dev] [Bug 2553] New: Adding SeqRecord objects to an
alignment (append or extend)
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
Summary: Adding SeqRecord objects to an alignment (append or
extend)
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Currently a Bio.Align.Generic.Alignment object stores the rows as SeqRecord
objects, but only exposes a public API for adding row sequences as strings.
As suggested on Bug 1944, it would make sense to treat the Alignment as a list
of SeqRecord objects and therefore support the list methods .append() and
.extend() for the addition of more rows as SeqRecord objects.
I would make the .append() method enforce the expectation that all rows are the
same length, and that the new sequence's alphabet is compatible with the
declared alignment alphabet.
See also Bug 2552 - Adding alignments
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:57:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:57:52 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200807280957.m6S9vqJd001617@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn|2507 |
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:57 EST -------
I've filed bugs on what I think are the remaining issues raised here (Bug
1944), and am now closing this issue (as its getting very long and hard to
follow):
Bug 2551 - The __getitem__ method (accessing part of the alignment as an
character string, row, column or sub-alignment).
Bug 2552 - Adding alignments
Bug 2553 - Adding SeqRecord objects to an alignment (append or extend)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:57:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 05:57:54 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200807280957.m6S9vspm001632@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO|1944 |
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:13:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:13:38 -0400
Subject: [Biopython-dev] [Bug 2554] New: Creating an Alignment from a list
of SeqRecord objects
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
Summary: Creating an Alignment from a list of SeqRecord objects
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BugsThisDependsOn: 2553
It would be nice to be able to supply a list (or iterator) of SeqRecord objects
when creating an alignment object. This would also make the
Bio.SeqIO.to_alignment() function obsolete.
Currently, the __init__ method takes just an alphabet:
def __init__(self, alphabet):
"""Initialize a new Alignment object.
Arguments:
o alphabet - The alphabet to use for the sequence objects that are
created. This alphabet must be a gapped type.
"""
#...
My plan is to accept a list of SeqRecord objects (possibly empty) and an
optional alphabet. If the alphabet is omitted, a consensus can be determined
from the SeqRecord alphabets.
This can be made backwards compatible:
def __init__(self, records, alphabet=None):
"""Initialize a new Alignment object.
Arguments:
records - A list (or iterator) of SeqRecord objects, whose sequences
are all the same length. This an be an empy list.
alphabet - The alphabet for the whole alignment, typically a gapped
alphabet, which should be a superset of the individual
record alphabets. If ommited, a consensus alphabet is used.
"""
if not (isinstance(records, Alphabet.Alphabet) \
or isinstance(records, Alphabet.AlphabetEncoder)):
if alphabet is None :
#Backwards compatible mode!
alphabet = records
records = []
else :
raise ValueError("Invalid records argument")
#...
I would expect the implementation to depend on Bug 2553 - Adding SeqRecord
objects to an alignment (append or extend).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:13:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:13:41 -0400
Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment
(append or extend)
In-Reply-To:
Message-ID: <200807281013.m6SADf6o002429@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |2554
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:49:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:49:45 -0400
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
SeqRecord objects
In-Reply-To:
Message-ID: <200807281049.m6SAnjbE003984@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:49 EST -------
There is an unwanted "not" in the code snippet in comment 0.
Here is a preliminary implementation of the revised __init__ method plus append
and extend (Bug 2533):
def __init__(self, records, alphabet=None):
"""Initialize a new Alignment object.
Arguments:
records - A list (or iterator) of SeqRecord objects, whose sequences
are all the same length. This an be an empty list.
alphabet - The alphabet for the whole alignment, typically a gapped
alphabet, which should be a super-set of the individual
record alphabets. If omitted, a consensus alphabet is used.
NOTE - Earlier versions of Biopython only accepted a single argument,
an alphabet. This is still supported via a backwards compatible
"hack" so as not to disrupt existing scripts and users.
"""
if isinstance(records, Alphabet.Alphabet) \
or isinstance(records, Alphabet.AlphabetEncoder):
if alphabet is None :
#Backwards compatible mode!
alphabet = records
records = []
else :
raise ValueError("Invalid records argument")
if alphabet is not None :
if not (isinstance(alphabet, Alphabet.Alphabet) \
or isinstance(alphabet, Alphabet.AlphabetEncoder)):
raise ValueError("Invalid alphabet argument")
self._alphabet = alphabet
else :
#Default while we add sequences, will take a consensus later
self._alphabet = Alphabet.single_letter_alphabet
self._records = []
self.extend(records)
if alphabet is None :
#No alphabet was given, take a consensus alphabet
#TODO - Use a generator expression once we drop python 2.3:
self.alphabet = Alphabet._consensus_alphabet([rec.seq.alphabet for
\
rec in
self._records])
self._records = []
def extend(self, records) :
"""Add more SeqRecord objects to the alignment as rows.
They must all have the same length as the original alignment, and have
alphabets compatible with the alignment's alphabet."""
for rec in records :
self.append(rec)
def append(self, record) :
"""Add one more SeqRecord object to the alignment as a new row.
This must have the same length as the original alignment (unless this
is
the first record), and have an alphabet compatible with the alignment's
alphabet."""
if not isinstance(record, SeqRecord) :
raise TypeError("New sequence is not a SeqRecord object")
if self._records and len(record) <> self.get_alignment_length() :
raise ValueError("New sequence is not of length %i" \
% self.get_alignment_length())
#Using not self._alphabet.contains(record.seq.alphabet) needs fixing
#for AlphabetEncoders (e.g. gapped versus ungapped).
if not Alphabet._are_alphabets_compatible(self._alphabet, \
record.seq.alphabet) :
raise ValueError("New sequence's alphabet is incompatible")
self._records.append(record)
The unit tests look fine with this addition. Of course, new tests to verify
this functionality explicitly should then be added (and I could take advantage
of this in Bio.AlignIO too).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:54:12 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 28 Jul 2008 06:54:12 -0400
Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of
SeqRecord objects
In-Reply-To:
Message-ID: <200807281054.m6SAsClZ004173@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2554
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:54 EST -------
Regarding the code in comment 1, the private function
_are_alphabets_compatible() isn't in CVS, its something I was playing with on
Bug 2550 - Alphabet problems when adding sequences.
However, I hope that this conveys my overall intention for the Alignment
object.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 02:22:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 29 Jul 2008 22:22:59 -0400
Subject: [Biopython-dev] [Bug 2557] New: AlignIO::write fails when
delegating to SeqIO::write
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2557
Summary: AlignIO::write fails when delegating to SeqIO::write
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: rsuri at cs.utexas.edu
In line 185 of "biopython/Bio/AlignIO/__init__.py" in the current CVS version,
there's a call to SeqIO::write with only 2 arguments instead of the required 3
["SeqIO.write(alignment.get_all_seqs(), format)"] should be
["SeqIO.write(alignment.get_all_seqs(), handle, format)"] (i.e. pass the handle
object).
I know this happens when trying to output to FASTA format, and it appears to do
so more generally whenever the SeqIO module can be used for output.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 02:36:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 29 Jul 2008 22:36:07 -0400
Subject: [Biopython-dev] [Bug 2558] New: AlignIO nexus parsing chokes on
superfluous comma
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
Summary: AlignIO nexus parsing chokes on superfluous comma
Product: Biopython
Version: 1.47
Platform: All
URL: http://www.cs.utexas.edu/~rsuri/M3579.NX
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: rsuri at cs.utexas.edu
The URL above points to a nexus file (also available from TreeBase with Matrix
accession #M3579) that causes BioPython to raise an error when reading it with
the AlignIO module. In the "Trees" section of the input file, the final taxon
("Lecanorales") has a trailing comma that causes BioPython to fail (search for
the line beginning with "59"). I've verified that manually deleting the
offending comma is a valid workaround.
I don't know what the nexus format specification says, but this is poor form
for BioPython, in my opinion. It seems reasonable enough to allow for some
slack like this when reading formatted files.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 08:55:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 04:55:42 -0400
Subject: [Biopython-dev] [Bug 2557] AlignIO::write fails when delegating to
SeqIO::write
In-Reply-To:
Message-ID: <200807300855.m6U8tgLU019854@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2557
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 04:55 EST -------
That's embarrassing for me! Bug confirmed and fixed in CVS.
I've used a very slightly simpler fix, taking advantage of the fact that you
can iterate for the SeqRecords within an alignment:
SeqIO.write(alignment, handle, format)
I've also updated the Bio.AlignIO unit test to cover writing a couple of the
formats supported via Bio.SeqIO ("fasta" and "tab"), although it might make
sense to try all of them...
Checking in Bio/AlignIO/__init__.py;
/home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <--
__init__.py
new revision: 1.10; previous revision: 1.9
done
Checking in Tests/test_AlignIO.py;
/home/repository/biopython/biopython/Tests/test_AlignIO.py,v <--
test_AlignIO.py
new revision: 1.12; previous revision: 1.11
done
Checking in Tests/output/test_AlignIO;
/home/repository/biopython/biopython/Tests/output/test_AlignIO,v <--
test_AlignIO
new revision: 1.10; previous revision: 1.9
done
Thank you for reporting this oversight,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 09:23:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 05:23:59 -0400
Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with
superfluous comma
In-Reply-To:
Message-ID: <200807300923.m6U9Nx8l021492@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|AlignIO nexus parsing chokes|Bio.Nexus chokes on
|on superfluous comma |TRANSLATE block with
| |superfluous comma
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 05:23 EST -------
This is an issue in the Bio.Nexus module, so its a job for Frank.
Do you know if this affects all the NEXUS files from www.treebase.org? I've
tried downloading several trees, but their FTP site is just timing out for me.
According to http://www.treebase.org/treebase/submit.html the request trees be
uploaded in the NEXUS file format so its possible that just a minority of their
trees have this trailing comma.
Note that this may be an invalid file (a TRANSLATE block with trailing comma),
but as you say it looks relatively straight forward to cope with. However, I
have had a quick look at the Bio.Nexus code, and I don't entirely understand
what Frank's parser is doing here - so its not going to be a quick fix from me.
Quick bit of python to show the stack trace:
>>> from Bio.Nexus import Nexus
>>> n = Nexus(open("M3579.NX"))
Traceback (most recent call last):
File "", line 1, in
TypeError: 'module' object is not callable
>>> n = Nexus.Nexus(open("M3579.NX"))
Traceback (most recent call last):
File "", line 1, in
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 552, in __init__
self.read(input)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 614, in read
self._parse_nexus_block(title, contents)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 655, in _parse_nexus_block
getattr(self,'_'+line.command)(line.options)
File
"/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
line 922, in _translate
raise NexusError,'Format error in line %s.' % options
Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides',
2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4
'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6
'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8
'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10
'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12
'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14
'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16
'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens',
19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21
'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24
'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27
'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30
'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33
'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36
'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39
'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42
'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45
'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48
'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51
'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54
'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57
'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 30 12:57:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 30 Jul 2008 08:57:00 -0400
Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with
superfluous comma
In-Reply-To:
Message-ID: <200807301257.m6UCv0co031445@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-30 08:57 EST -------
I'm all for a little bit of slack in parsers, but this looks in my opinion like
a straightforward syntax error in the nexus file. I work with nexus files
daily, and have never encountered such a trailing comma. What really confuses
me is that there are 58 taxa in the data set, and no. 59 Lecanorales is in
addition, with no data and no occurence in the tree. I don't think this is
proper nexus format.
Frank
(In reply to comment #1)
> This is an issue in the Bio.Nexus module, so its a job for Frank.
>
> Do you know if this affects all the NEXUS files from www.treebase.org? I've
> tried downloading several trees, but their FTP site is just timing out for me.
> According to http://www.treebase.org/treebase/submit.html the request trees be
> uploaded in the NEXUS file format so its possible that just a minority of their
> trees have this trailing comma.
>
> Note that this may be an invalid file (a TRANSLATE block with trailing comma),
> but as you say it looks relatively straight forward to cope with. However, I
> have had a quick look at the Bio.Nexus code, and I don't entirely understand
> what Frank's parser is doing here - so its not going to be a quick fix from me.
>
>
> Quick bit of python to show the stack trace:
>
> >>> from Bio.Nexus import Nexus
> >>> n = Nexus(open("M3579.NX"))
> Traceback (most recent call last):
> File "", line 1, in
> TypeError: 'module' object is not callable
> >>> n = Nexus.Nexus(open("M3579.NX"))
> Traceback (most recent call last):
> File "", line 1, in
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 552, in __init__
> self.read(input)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 614, in read
> self._parse_nexus_block(title, contents)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 655, in _parse_nexus_block
> getattr(self,'_'+line.command)(line.options)
> File
> "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py",
> line 922, in _translate
> raise NexusError,'Format error in line %s.' % options
> Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides',
> 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4
> 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6
> 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8
> 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10
> 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12
> 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14
> 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16
> 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens',
> 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21
> 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24
> 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27
> 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30
> 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33
> 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36
> 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39
> 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42
> 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45
> 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48
> 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51
> 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54
> 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57
> 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 31 15:58:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 Jul 2008 11:58:08 -0400
Subject: [Biopython-dev] [Bug 2560] New: Adding BLAST support to Bio.AlignIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2560
Summary: Adding BLAST support to Bio.AlignIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
I think it can sometimes be useful to regard a BLAST output file as a series of
pairwise alignments - and therefore it makes sense to add it to Bio.AlignIO and
another input file format.
http://biopython.org/wiki/AlignIO
Note that the AlignIO API will not allow any "clumping" of the pairwise
alignments (or HSPs in Blast terminology) according to the query or the target
sequence - you just get them all one after the other.
I will attach a rough Bio/AlignIO/BlastIO.py file which attempts to mimic the
naming conventions in the fasta-m10 parser. This currently using Bio.Blast to
do the actual parsing, and then just using the Blast results to build alignment
objects with two sequences each.
I suggest using the format names "blast" and "blastxml" for the plain text and
XML output formats following BioPerl (although I would prefer "blast-xml" to
"blastxml"), see http://www.bioperl.org/wiki/HOWTO:SearchIO#Design
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 31 16:00:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 31 Jul 2008 12:00:23 -0400
Subject: [Biopython-dev] [Bug 2560] Adding BLAST support to Bio.AlignIO
In-Reply-To:
Message-ID: <200807311600.m6VG0NAq021299@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2560
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-31 12:00 EST -------
Created an attachment (id=980)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=980&action=view)
New file Bio/AlignIO/BlastIO.py
The included "self test" just parses all the unit tests (excluding the
PSI-Blast and HTML files).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.