From bugzilla-daemon at portal.open-bio.org Tue Jul 1 04:36:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 04:36:33 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST -------
Just uploaded a new Nexus.py to CVS.
First, the taxlabels command in a taxa block is now ignored. For a standard
nexus file, taxon labels are in the matrix, and a taxon block is irrelevant.
The only exception are transposed matrices, which are not supported by Nexus.py
anyway.
Without the added confusion of a separate taxlabels command, it is now fairly
easy to deal with duplicate names. Both self.taxlabels and self.matrix now
carry the same set of unique taxon names.
All example files seem to work fine for me. unless I hear otherwise, I close
this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:01:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:01:29 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST -------
Does this mean that there will be no way to see the original non-unique names
from within Bio.Nexus? I agree they are a pain, but it would be nice to
preserve them.
I haven't read the Nexus specs (restricted article), but does this comment on
the issue of repeated identifiers?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:13:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 05:13:02 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST -------
Yes, the original non-unique names are currently not preserved. It would be
fairly easy to keep them, if desired.
The NEXUS specs (Maddison et al.) state that unique names "should be avoided if
this might cause ambiguity", which imho they always do. But I experienced that
sometimes names become identical due to truncation etc, so I needed a way to
deal with it instead of just throwing an error.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:16:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 09:16:57 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST -------
I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
everyone (instead creating their own uppercase-lowercase variants of those
terribly complicated biopython alphabet classes), and easy to change for all
other modules if lowercase-uppercase is what they want (or need).
Nexus.py and Phd.py certainly need to allow lowercase characters, as this is
very common.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:56:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:56:03 -0400
Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Summary: Support for simple "tab" format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Requested on the mailing list by Giovanni Marco Dall'Olio:
http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html
See BioPerl:
http://www.bioperl.org/wiki/Tab_sequence_format
Suggested implementation to follow.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:57:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 11:57:26 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST -------
Created an attachment (id=962)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view)
New file Bio/SeqIO/TabIO.py
Treats the first field as the record's .id (and .name)
Treats the second field as the record's sequence.
When writing, uses only record.id and record.seq
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 1 12:00:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 1 Jul 2008 12:00:59 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST -------
Created an attachment (id=963)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view)
Patch to add the "tab" format to Bio.SeqIO and update the unit test output
The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format.
Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the
updated output from test_SeqIO.py (trivial to regenerate with "python
run_tests.py -g test_SeqIO.py").
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 06:33:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 11:33:35 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Hello Michiel et al.,
I've already added a few if statements to the end of
Bio.Entrez._open() to catch a few errors I'd observed, and I've just
found another example:
>>> from Bio import Entrez
>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
'\n'
>>> Entrez.efetch("nucleotide", id="fiction").read()
'\n'
This seems to happen for any invalid identifier. Are you happy for me
to check for this as an error too? Are there any valid reasons to get
back an empty dataset like this?
Also, I was wondering if we should raise a ValueError rather than
IOError if we are fairly sure the problem is with the arguments rather
than the network or the sever being unavailable.
Peter
From sdavis2 at mail.nih.gov Wed Jul 2 07:18:43 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 2 Jul 2008 07:18:43 -0400
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote:
> Hello Michiel et al.,
>
> I've already added a few if statements to the end of
> Bio.Entrez._open() to catch a few errors I'd observed, and I've just
> found another example:
>
>>>> from Bio import Entrez
>>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read()
> '\n'
>>>> Entrez.efetch("nucleotide", id="fiction").read()
> '\n'
>
> This seems to happen for any invalid identifier. Are you happy for me
> to check for this as an error too? Are there any valid reasons to get
> back an empty dataset like this?
If the ability to use history is added, then an empty dataset could be
a valid return after an empty search. For id-based-searches, I'm not
sure I would raise an error for an empty set being returned anyway.
Just my $0.02.
Sean
> Also, I was wondering if we should raise a ValueError rather than
> IOError if we are fairly sure the problem is with the arguments rather
> than the network or the sever being unavailable.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From biopython at maubp.freeserve.co.uk Wed Jul 2 07:34:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 12:34:32 +0100
Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez
In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com>
<264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com>
Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com>
>> This seems to happen for any invalid identifier. Are you happy for me
>> to check for this as an error too? Are there any valid reasons to get
>> back an empty dataset like this?
>
> If the ability to use history is added, then an empty dataset could be
> a valid return after an empty search. ...
Bio.Entrez has always supported the history, its just up to the user
to take advantage of it. I've included an example in the tutorial to
explain how to do this, cut and pasted below:
from Bio import Entrez
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16",
usehistory="y", email="history.user at example.com")
search_results = Entrez.read(search_handle)
search_handle.close()
gi_list = search_results["IdList"]
count = int(search_results["Count"])
assert count == len(gi_list)
session_cookie = search_results["WebEnv"]
query_key = search_results["QueryKey"]
#Now use the history session cookie and query key to download the
results in batchs
batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0,count,batch_size) :
end = min(count, start+batch_size)
print "Going to download record %i to %i" % (start+1, end)
fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta",
retstart=start, retmax=batch_size,
webenv=session_cookie, query_key=query_key,
email="history.user at example.com")
data = fetch_handle.read()
fetch_handle.close()
out_handle.write(data)
out_handle.close()
Feedback on the tutorial or the example is of course welcome.
> For id-based-searches, I'm not sure I would raise an error for an empty
> set being returned anyway.
>
> Just my $0.02.
I was wondering about this kind of thing... maybe some more testing of
these kinds of examples would be in order.
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:03:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:03:36 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Hi all,
Do any of you have any comments or feedback on this suggested new
"simple tab separated" format for Bio.SeqIO? To match BioPerl I plan
on calling it the "tab" format - see below.
Any real world example files would be good for the test suite.
One nice thing is it adds another output format, something we're a bit
short of in Bio.SeqIO with only fasta and some alignment formats (now
handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip).
Peter
---------- Forwarded message ----------
From: Peter
Date: Tue, Jul 1, 2008 at 5:06 PM
Subject: Re: [BioPython] Sequence from Fasta
To: dalloliogm at gmail.com
Cc: biopython at biopython.org
Giovanni wrote:
> yes, I think it will be useful to implement.
> I know of people who have written a customized fasta2tab script and
> use it quite frequently, so it would be good to support such a task.
> As you said before this format is commonly used in combination with
> grep/gawk scripts.
I've gone for the simple option about how to parse the first field, its used
as the record identifer (.id) and name only (nothing clever). Here is my
suggested code, which you are welcome to download and try out.
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
If you want to try this yourself you'll need to download the new file
TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to
tell it about the new format (two new lines, see patch).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:21:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:21:29 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Hello again Frank,
As Biopython's NEXUS expect, I've got a couple of hopefully trivial
questions about the format, which connect to how best to handle it the
Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/AlignIO
My short questions are:
Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
with more than one #NEXUS line)?
Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
If the answer to either of those is a "yes", then any example files
you could contribute would be very helpful.
I have a more complicated question too, which would help me to resolve Bug 2227:
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
Q3: Given a generic Alignment object (e.g. from one of the other
parsers), can I construct a corresponding Nexus object where the
aligned sequences are used for the matrix? If so, how?
Thank you,
Peter
From mjldehoon at yahoo.com Wed Jul 2 09:30:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file.
Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record.
In that case, Bio.SeqIO looks like a more suitable place for this parser.
The user would see something like this:
>>> from Bio import SeqIO
>>> handle = open("mydatafile.txt")
>>> records = SeqIO.parse(handle, "ig")
>>> records.comment
"This is the overall comment"
>>> for record in records:
# ... record is a SeqRecord.
Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator.
Any objections, anybody?
--Michiel
From biopython at maubp.freeserve.co.uk Wed Jul 2 09:48:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 14:48:31 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote:
> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format.
Just to be upfront, I'm not familiar with this format, but I've had a
look at the examples.
> In this format, each sequence has a name and comments, and in addition there can
> also be an overall comment to the file.
OK. This is also the case in other file formats, for example GenBank
files can have free format text file header at the start but we ignore
this.
How would you separate the file header comment from the first record
comment? Some files include what looks like a file header but the
lines all seem to start with "; ". Maybe look for "; LOCUS..."?
Given the whole comment seems to be free format I don't think this is
very nice.
On the other hand, some of the sample inputs includes a number of
lines starting ";; Modified by ..." which would be easy to separate
(one semi colon versus two semi colons). These are clearly file-level
header lines, rather than being part of the first record.
> Currently the parser in Bio.IntelliGenetics stores this information in
> Bio.IntelliGenetics.Record.Record objects (one record per sequence; the
> overall comment is inadvertently added to the first sequence in the file). I
> think it makes more sense to use a SeqRecord for that, and to deprecate
> Bio.IntelliGenetics.Record.Record.
If all the data extracted by the Bio.IntelliGenetics parser could be
dealt with using the SeqRecord parser added to Bio.SeqIO, then yes
deprecating Bio.IntelliGenetics sounds fine.
> In that case, Bio.SeqIO looks like a more suitable place for this parser.
> The user would see something like this:
>>>> from Bio import SeqIO
>>>> handle = open("mydatafile.txt")
>>>> records = SeqIO.parse(handle, "ig")
>>>> records.comment
> "This is the overall comment"
>>>> for record in records:
> # ... record is a SeqRecord.
I see you are using "ig" as the format name, matching EMBOSS. Good :)
http://emboss.sourceforge.net/docs/themes/seqformats/ig
> Because of the overall comment, SeqIO.parse cannot simply return a
> generator function. It must return a full-fledged class, but one with an iterator.
Not necessarily. We can still use a simple generator function and either throw
away the header comment, or included it with the first record (or even
with every
record). If you did create an iterator class, would you make the
header available
as a property of the iterator?
Given the apparently fuzzy boundary between the file header and the first record
header, I would just opt to treat it all as a comment for the first
record. And use a
simple generator function.
Peter
From fkauff at biologie.uni-kl.de Wed Jul 2 10:01:01 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Wed, 02 Jul 2008 16:01:01 +0200
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
Message-ID: <486B8A1D.8090806@biologie.uni-kl.de>
Hi Peter,
Peter wrote:
> Hello again Frank,
>
> As Biopython's NEXUS expect, I've got a couple of hopefully trivial
> questions about the format, which connect to how best to handle it the
> Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO
> http://biopython.org/wiki/AlignIO
>
> My short questions are:
>
> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
> with more than one #NEXUS line)?
>
As far as I know: no. #NEXUS just indicates the file being a NEXUS file,
the concept of "records" is not part of a nexus file
> Q2: Can a NEXUS record/file contain more than one alignment (matrix block)?
>
>
I just had a quick look at the old Maddison et al. introductory paper of
Nexus, and it says that "although the nexus standard does not impose
constraints on the number of blocks, particular programs will". I don't
know of any program that would read more than one data block and keep
both of them.
> If the answer to either of those is a "yes", then any example files
> you could contribute would be very helpful.
>
> I have a more complicated question too, which would help me to resolve Bug 2227:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2227
>
> Q3: Given a generic Alignment object (e.g. from one of the other
> parsers), can I construct a corresponding Nexus object where the
> aligned sequences are used for the matrix? If so, how?
>
Hmmm - not really. Nexus.py does not support "empty" nexus class objects
that could be filled with data (just tried) . But it would actually be a
nice thing to have. I'll put this on my to do list.
Cheers,
Frank
> Thank you,
>
> Peter
>
>
'
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:01:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:01:13 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com>
Hello again,
Interestingly the IntelliGenetics looks the same as the MASE alignment
file format:
http://www.bioperl.org/wiki/Mase_multiple_alignment_format
On the other hand, the EMBOSS example is clearly not a multiple
sequence alignment:
http://emboss.sourceforge.net/docs/themes/seqformats/ig
Adding the parser to Bio.SeqIO would let us read in alignments too via
Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try
and convert the SeqRecords into an Alignment).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:06:40 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:06:40 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
References: <29487.55988.qm@web62410.mail.re1.yahoo.com>
<320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com>
Forgot to send this to the list, another point about IntelliGenetics vs MASE
---------- Forwarded message ----------
From: Peter
Date: Wed, Jul 2, 2008 at 3:05 PM
Subject: Re: [Biopython-dev] Bio.IntelliGenetics
To: mjldehoon at yahoo.com
> How would you separate the file header comment from the first record
> comment? Some files include what looks like a file header but the
> lines all seem to start with "; ". Maybe look for "; LOCUS..."?
> Given the whole comment seems to be free format I don't think this is
> very nice.
>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
I found an old link I had added on the wiki page for SeqIO development,
http://pbil.univ-lyon1.fr/help/formats.html
This clearly describes MASE format format s having (optional) header
lines as starting with two semi colons. But are MASE and
IntelliGenetics the same thing?
Petet
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:12:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:12:48 +0100
Subject: [Biopython-dev] Questions about the NEXUS format
In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de>
References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com>
<486B8A1D.8090806@biologie.uni-kl.de>
Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com>
>> My short questions are:
>>
>> Q1: Can a file contain more than one NEXUS record (i.e. concatenation,
>> with more than one #NEXUS line)?
>
> As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the
> concept of "records" is not part of a nexus file
OK, thank you.
>> Q2: Can a NEXUS record/file contain more than one alignment (matrix
>> block)?
>
> I just had a quick look at the old Maddison et al. introductory paper of
> Nexus, and it says that "although the nexus standard does not impose
> constraints on the number of blocks, particular programs will". I don't know
> of any program that would read more than one data block and keep both of
> them.
So that is a "yes in theory", but it doesn't sound worth worrying about.
>> Q3: Given a generic Alignment object (e.g. from one of the other
>> parsers), can I construct a corresponding Nexus object where the
>> aligned sequences are used for the matrix? If so, how?
>
> Hmmm - not really. Nexus.py does not support "empty" nexus class objects
> that could be filled with data (just tried) . But it would actually be a
> nice thing to have. I'll put this on my to do list.
Thanks,
Peter
From mjldehoon at yahoo.com Wed Jul 2 10:15:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com>
> On the other hand, some of the sample inputs includes a number of
> lines starting ";; Modified by ..." which would be easy to separate
> (one semi colon versus two semi colons). These are clearly file-level
> header lines, rather than being part of the first record.
According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon.
http://pbil.univ-lyon1.fr/help/formats.html
> If you did create an iterator class, would you make the
> header available as a property of the iterator?
I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class.
---Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:38:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools
in run_tests.py
In-Reply-To:
Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2524
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Documentation |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST -------
Filing under "Unit Tests".
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:39:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 10:39:22 -0400
Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test
suite)
In-Reply-To:
Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2469
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|Main Distribution |Unit Tests
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST -------
Filing under "Unit Tests"
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Jul 2 10:56:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 2 Jul 2008 15:56:00 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com>
<529945.38158.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com>
>> If you did create an iterator class, would you make the
>> header available as a property of the iterator?
>
> I am not sure what you mean by a property of the iterator. I was
> thinking to simply add a field to the class.
Adding the file header field to the iterator class? You could do I suppose.
Right now all the Bio.SeqIO parsers use generator functions (although
not in AlignIO), although I have no objection to returning iterator
classes instead.
I don't really like the idea of Bio.SeqIO parsers returning anything
other than SeqRecord objects - even if it is indirectly via a richer
iterator object. I see the Bio.SeqIO as a common unified API, and the
downside is sometimes extra data doesn't really fit.
If there really is some important meta-data in a file format that
applies to all the records, then it cannot easily be represented in
the Bio.SeqIO system except as annotation added to every single
SeqRecord. e.g. Add the header to the annotations dictionary under
"file-header" or something.
Peter
From mjldehoon at yahoo.com Wed Jul 2 11:29:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com>
Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com>
--- On Wed, 7/2/08, Peter wrote:
> I found an old link I had added on the wiki page for SeqIO
> development,
> http://pbil.univ-lyon1.fr/help/formats.html
>
> This clearly describes MASE format format s having
> (optional) header
> lines as starting with two semi colons. But are MASE and
> IntelliGenetics the same thing?
It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments.
So let's do the following:
For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway.
Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments.
Does that sound OK?
--Michiel.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:28:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:28:19 -0400
Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in
Bio.SeqIO
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
Summary: Support for PIR / NBRF format in Bio.SeqIO
Product: Biopython
Version: Not Applicable
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also
supports "nbrf" as an alternative.
http://bioperl.org/wiki/PIR_sequence_format
Patch to follow, a new parser and writer in plain python. The existing Martel
based parser in Bio.NBRF could then be deprecated.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 12:30:28 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST -------
Created an attachment (id=964)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view)
New file Bio/SeqIO/PirIO.py
Note that the details of storing the sequence type may need tweaking for better
agreement with the de-facto conventions from the GenBank parser.
As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py
valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \
"DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \
"RC":"circular RNA", "N3": "transfer RNA", "N1": "other"
}
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 13:37:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 13:37:05 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST -------
My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py
Also when recording a SeqRecord from a non-PIR input, we could try and guess
the sequence type. The alphabet itself is one clue. GenBank and EMBL files
should also record if the sequence is linear or circular, as well as a sequence
type.
For proteins, I don't see how to decide between P1 and F1 though (complete
protein vs protein fragment). Maybe default to F1?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 2 15:51:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 2 Jul 2008 15:51:49 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST -------
Even better docs:
http://blog.doughellmann.com/2007/07/pymotw-subprocess.html
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:24:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:24:32 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST -------
Hi Frank,
I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the
original taxon order with and without the name changes.
n.unaltered_taxlabels = Original names in order with duplicates
n.original_taxon_order = Modified names in order, suitable as keys to n.matrix
I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing
the original name and the modified unique name as the SeqRecord's name and id
properties.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:52:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 09:52:08 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST -------
Hi Peter,
I'd strongly suggest to use self.taxlabels instead of
self.original_taxon_order. The latter is only for compatibility, and
original_taxon_order just maps taxlabels. Actually it might make sense to give
a deprecation warning if original_taxon_order is used, and it should be removed
in some future release.
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:06:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:06:46 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #584 is|0 |1
obsolete| |
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST -------
(From update of attachment 584)
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it. This code it now obsolete.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:13:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 10:13:38 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST -------
Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view)
Bio/Nexus/Nexus.py handle support in write_nexus_data()
With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
empty Nexus object and add sequences to it:
#Read in an alignment object, e.g. with Bio.AlignIO
from Bio import AlignIO
alignment = AlignIO.read(open("example.aln"), "clustal")
#Make a Nexus object
from Bio.Nexus import Nexus
handle = open("test.nex", "w")
n = Nexus.Nexus()
n.alphabet = alignment._alphabet
for record in alignment :
n.add_sequence(record.id, record.seq.tostring())
n.write_nexus_data(handle)
handle.close()
There are two problems with write_nexus_data(), firstly it doesn't accept a
StringIO handle (see also Bug 2454).
Secondly, if given a handle it closes it. This would break the above code, or
how I typically use StringIO.
This patch addresses these points.
Frank, are you happy for me to commit this change?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:02:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:02:30 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST -------
Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view)
Patch to Bio/AlignIO/NexusIO.py adding write support
This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment
4).
My method for constructing an empty DNA, RNA, or Protein Nexus object is
perhaps inelegant. This is required in order to setup the alphabet,
ambiguous_values and unambiguous_letters properties which otherwise default to
DNA.
Also note that the Nexus add_sequence() method does not seem to support
duplicated taxa names. Perhaps this method could update the
unaltered_taxlabels property and use the _unique_label method to cope with
duplicate names?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:08:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 3 Jul 2008 12:08:26 -0400
Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem
with identical taxa names
In-Reply-To:
Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2531
------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST -------
I have changed my use of original_taxon_order to just taxlabels (code now in
Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py).
I agree, adding a deprecation warning to the original_taxon_order get/set
functions would make sense.
P.S. Thanks for adding the unaltered_taxlabels property.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 4 04:11:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 09:11:06 +0100
Subject: [Biopython-dev] What happened to Biopython 1.46?
Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com>
We were recently talking about doing another release, but as you may
have noticed nothing has been announced.
Michiel devoted a good chunk of his weekend to preparing Biopython
1.46 and uploaded it to the servers on Sunday 29th. He didn't issue
an announcement email at the time due to the problem with the wiki
being read only (now fixed). However, on the Monday evening I
realised I'd done something really stupid in Bio.Data.CodonTable just
before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be
used whenever a translation table was requested by name. This change
has been reverted, and I've added further translation checks in
test_seq.py to avoid any similar issue in future.
So, while there is a Biopython 1.46, we're not going to advertise it
because the translation functionality is subtly wrong. However, it is
up on the website, and linked to with an errata statement.
Michiel will kindly try and prepare Biopython 1.47 soon... so please
hold off any big changes in CVS until then.
And I'm hearby publicly promising to treat him to dinner - hopefully
we'll be in the same country at the same time this year!
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:39:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:39:35 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST -------
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
> ...
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
Very nice. Go for it :-)
Cheers,
Frank
(In reply to comment #4)
> Created an attachment (id=965)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details]
> Bio/Nexus/Nexus.py handle support in write_nexus_data()
>
> With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an
> empty Nexus object and add sequences to it:
>
> #Read in an alignment object, e.g. with Bio.AlignIO
> from Bio import AlignIO
> alignment = AlignIO.read(open("example.aln"), "clustal")
>
> #Make a Nexus object
> from Bio.Nexus import Nexus
> handle = open("test.nex", "w")
> n = Nexus.Nexus()
> n.alphabet = alignment._alphabet
> for record in alignment :
> n.add_sequence(record.id, record.seq.tostring())
> n.write_nexus_data(handle)
> handle.close()
>
> There are two problems with write_nexus_data(), firstly it doesn't accept a
> StringIO handle (see also Bug 2454).
>
> Secondly, if given a handle it closes it. This would break the above code, or
> how I typically use StringIO.
>
> This patch addresses these points.
>
> Frank, are you happy for me to commit this change?
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:53:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:53:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #965 is|0 |1
obsolete| |
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST -------
(From update of attachment 965)
> > This patch addresses these points.
> >
> > Frank, are you happy for me to commit this change?
> >
>
> Very nice. Go for it :-)
>
Thanks Frank.
Checking in Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.17; previous revision: 1.16
done
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:56:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 04:56:10 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #966 is|0 |1
obsolete| |
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST -------
(From update of attachment 966)
There is slight problem with this patch on the alphabet selection (it uses
"dna" when it should use "rna").
I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until
after the next Biopython release.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:13:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:13:25 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040913.m649DPap027929@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST -------
(In reply to comment #5)
> Created an attachment (id=966)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details]
> Patch to Bio/AlignIO/NexusIO.py adding write support
>
> This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment
> 4).
>
> My method for constructing an empty DNA, RNA, or Protein Nexus object is
> perhaps inelegant. This is required in order to setup the alphabet,
> ambiguous_values and unambiguous_letters properties which otherwise default to
> DNA.
>
> Also note that the Nexus add_sequence() method does not seem to support
> duplicated taxa names. Perhaps this method could update the
> unaltered_taxlabels property and use the _unique_label method to cope with
> duplicate names?
>
Ok, I updated add_sequence and will commit the changes soon.
F
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:20:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 4 Jul 2008 05:20:07 -0400
Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO
In-Reply-To:
Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2227
------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST -------
(In reply to comment #9)
> >
> > Also note that the Nexus add_sequence() method does not seem to support
> > duplicated taxa names. Perhaps this method could update the
> > unaltered_taxlabels property and use the _unique_label method to cope with
> > duplicate names?
> >
> Ok, I updated add_sequence and will commit the changes soon.
>
Checking in biopython/Bio/Nexus/Nexus.py;
/home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py
new revision: 1.18; previous revision: 1.17
Frank
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Jul 4 06:24:06 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT)
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in Bio/SeqIO/IgIO.py (based on
> the format name of "ig" used in EMBOSS).
OK.
> Would we then also deprecate Bio.IntelliGenetics?
Yes. Otherwise, it's replicated functionality.
> Do you want to make these changes, or should I?
Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know.
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be able to cope.
In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
--Michiel.
--- On Wed, 7/2/08, Peter wrote:
> From: Peter
> Subject: Re: [Biopython-dev] Bio.IntelliGenetics
> To: mjldehoon at yahoo.com
> Date: Wednesday, July 2, 2008, 12:11 PM
> > It may be that the link in
> Bio/IntelliGenetics/__init__.py actually does not pertain
> to
> > the IntelliGenetics format. Except for this link
> (which as you point out actually talks
> > about the MASE format, not the IntelliGenetics
> format), I have seen no description
> > elsewhere of these file-wide comments preceded by a
> double semi-colon in the
> > IntelliGenetics format. Even Biopython doesn't
> treat these consistently: The tests
> > for Bio.IntelliGenetics include comments with the
> double semi-colon, but the parser
> > doesn't treat them differently from
> sequence-specific comments.
>
> Maybe we should ask BioPerl if they distinguish between the
> IntelliGenetics and MASE formats?
>
> Looking back over the old mailing list, at the time they
> did think the
> two were the same:
> http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html
>
> > So let's do the following:
> > For the IntelliGenetics parser, do not look for double
> semi-colon comments. Only check
> > if the first character in a line is a semi-colon, and
> if so, treat it as a sequence-specific
> > comment. This is what Bio.IntelliGenetics currently
> does anyway.
> > Replace the parser class in Bio.IntelliGenetics by a
> generator function, and integrate it with
> > Bio.SeqIO.
>
> I'm assuming we'd put the new IntelliGenetics to
> SeqRecord parser in
> Bio/SeqIO/IgIO.py (based on the format name of
> "ig" used in EMBOSS).
> Would we then also deprecate Bio.IntelliGenetics?
>
> Do you want to make these changes, or should I?
>
> > Then, let's replace the IntelliGenetics tests by
> files that do not contain the double
> > semi-colon comments.
>
> Why not just leave the double colon lines alone? The parser
> should be
> able to cope.
>
> Peter
From biopython at maubp.freeserve.co.uk Fri Jul 4 10:31:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 4 Jul 2008 15:31:55 +0100
Subject: [Biopython-dev] Bio.IntelliGenetics
In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com>
References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com>
<36286.77119.qm@web62412.mail.re1.yahoo.com>
Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com>
>> Do you want to make these changes, or should I?
>
> Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead.
OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for
IntelliGenetics/MASE files using the format name "ig" to match EMBOSS.
The existing three sample files are now being used in test_SeqIO.py
and one of them also in test_AlignIO.py as well.
If anyone wants to scan over the code, I'd be delighted to have feedback.
Adding support for writing these files should be easy. Do you think
this is worth implementing?
Before we deprecate Bio.IntelliGenetics I suggest we ask on the
mailing list if anyone is using it.
> In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation
> from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but
> if we'd include them with the sequence-specific comments we'd be misrepresenting the file.
I am ignoring the ";;" lines at the start of the file.
Peter
From mjldehoon at yahoo.com Sat Jul 5 04:24:41 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT)
Subject: [Biopython-dev] CVS freeze for release 1.47
Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out.
Thanks!
--Michiel.
From mjldehoon at yahoo.com Sat Jul 5 20:00:17 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT)
Subject: [Biopython-dev] Biopython release 1.47
Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com>
We are pleased to announce the release of Biopython 1.47.
This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules.
Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible.
--Michiel on behalf of the Biopython developers
From sbassi at gmail.com Sun Jul 6 15:53:54 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 6 Jul 2008 16:53:54 -0300
Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions,
is this a bug?
Message-ID:
NCBIStandalone changed in 1.46 due to bug #2508.
So this code that was working before, no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
The error trace is:
File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
line 1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'float' is not iterable
So I had to rewrite the code as:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation="1e-10", descriptions="1")
The problem is the function "_security_check_parameters", that assumes
that all arguments are strings.
Proposed solutions:
1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
2) Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:47:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:47:48 -0400
Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS
journals
In-Reply-To:
Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2447
------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST -------
Using Biopython release 1.47;
Bio.Entrez can parse the XML for this PMID:
>>> from Bio import Entrez
>>> PMID = "17238260"
>>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml')
>>> record = Entrez.read(handle)
>>>
Noel, can you use Bio.Entrez instead of Bio.EUtils?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:55:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 06:55:10 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST -------
Using Bio.Entrez in Biopython release 1.47:
>>> from Bio import Entrez
>>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml')
>>> records = Entrez.read(handle)
>>> records[0]['MedlineCitation']['Article']['AuthorList']
[{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'},
{u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'},
{u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'},
{u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta
C'}]
Noel, is this sufficient for your needs?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 07:12:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 07:12:26 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST -------
Thanks Michiel, but I found a workaround a day later so don't worry about me. I
was just letting you know about the bug...
Noel
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Jul 7 09:07:24 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 7 Jul 2008 14:07:24 +0100
Subject: [Biopython-dev] NCBIStandalon not compatible with previous
versions, is this a bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com>
On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote:
> NCBIStandalone changed in 1.46 due to bug #2508.
> So this code that was working before, no longer works:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation=1e-10, descriptions=1)
>
> The error trace is:
>
> File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py",
> line 1986, in _security_check_parameters
> if ";" in value or "&&" in value :
> TypeError: argument of type 'float' is not iterable
>
> So I had to rewrite the code as:
>
> result, err = NCBIStandalone.blastall(b_exe, "blastn",
> b_db, f_name, expectation="1e-10", descriptions="1")
>
> The problem is the function "_security_check_parameters", that assumes
> that all arguments are strings.
>
> Proposed solutions:
>
> 1) Leave it as is (this is not a bug). Some tutorial should be changed (?)
> 2) Modify line 1986 from:
> if ";" in value or "&&" in value :
> To this:
> if ";" in value or "&&" in str(value) :
I would say its a bug, and casting into a string on line 1986 looks
like the best fix. I won't be able to do this until tomorrow
afternoon at the latest - if you could file a bug that would be
helpful in case I forget ;)
Thanks
Peter
From bugzilla-daemon at portal.open-bio.org Mon Jul 7 13:08:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 7 Jul 2008 13:08:40 -0400
Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes
all arguments are strings
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
Summary: _security_check_parameters assumes all arguments are
strings
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
This code no longer works:
result, err = NCBIStandalone.blastall(b_exe, "blastn",
b_db, f_name, expectation=1e-10, descriptions=1)
Because new _security_check_parameters function assumes all blastall parameters
are string, but expectation and descriptions are float and int.
Proposed fix:
Modify line 1986 from:
if ";" in value or "&&" in value :
To this:
if ";" in value or "&&" in str(value) :
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From sbassi at gmail.com Mon Jul 7 16:30:14 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Mon, 7 Jul 2008 17:30:14 -0300
Subject: [Biopython-dev] Alignment problem. bug?
Message-ID:
I would like to confirm that this is a bug ot not. If I get
confirmation, I would fill it in bugzilla.
With this code:
from Bio import Clustalw
from Bio.Clustalw import MultipleAlignCL
cline = MultipleAlignCL('foralig.txt')
cline.set_output("alig.aln")
alignment = Clustalw.do_alignment(cline)
I get:
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/ii.py", line 112, in
alignment = Clustalw.do_alignment(cline)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 125, in do_alignment
return parse_file(out_file, alphabet)
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
line 47, in parse_file
generic_alignment = AlignIO.read(handle, "clustal")
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
line 299, in read
first = iterator.next()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
line 169, in next
raise ValueError("Could not parse line:\n%s" % line)
ValueError: Could not parse line:
I tested with Biopython 1.47 and 1.46 with the input file:
http://www.pastecode.com.ar/f44f28b41 (download at
http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
The clustal program is running because I see in the disk its output
(posted here: http://www.pastecode.com.ar/f275a5475). It seems it
fails to parse it.
I also tested in an older version (I guess it is 1.44) and it works
OK. So I think the problem was introduced in 1.46.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:41:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:41:02 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST -------
Included a float in the unit test for _security_check_parameters() added in Bug
2508:
Tests/test_NCBIStandalone.py revision: 1.15;
Fixed the string assumption in:
Bio/Blast/NCBIStandalone.py revision: 1.74;
Note that in your suggested fix Sebastian, both the "in" expressions need
casting to a string.
Thanks for reporting this!
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 04:51:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 09:51:31 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
> I would like to confirm that this is a bug ot not. If I get
> confirmation, I would fill it in bugzilla.
It does look like a bug to me...
> With this code:
>
> from Bio import Clustalw
> from Bio.Clustalw import MultipleAlignCL
>
> cline = MultipleAlignCL('foralig.txt')
> cline.set_output("alig.aln")
> alignment = Clustalw.do_alignment(cline)
>
> I get:
>
> Traceback (most recent call last):
> File "/mnt/hda2/py252/bin/ii.py", line 112, in
> alignment = Clustalw.do_alignment(cline)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 125, in do_alignment
> return parse_file(out_file, alphabet)
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py",
> line 47, in parse_file
> generic_alignment = AlignIO.read(handle, "clustal")
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py",
> line 299, in read
> first = iterator.next()
> File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py",
> line 169, in next
> raise ValueError("Could not parse line:\n%s" % line)
> ValueError: Could not parse line:
>
>
> I tested with Biopython 1.47 and 1.46 with the input file:
> http://www.pastecode.com.ar/f44f28b41 (download at
> http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41)
> The clustal program is running because I see in the disk its output
> (posted here: http://www.pastecode.com.ar/f275a5475). It seems it
> fails to parse it.
>
> I also tested in an older version (I guess it is 1.44) and it works
> OK. So I think the problem was introduced in 1.46.
For Biopython 1.46+ I switched the Bio.Clustalw parser to internally
call Bio.AlignIO, so one thing you could try is reverting
Bio/Clustalw/__init__.py to the older version (e.g. that shipped with
Biopython 1.45).
You haven't said which version of the ClustalW tool you are using -
maybe 2.0? If so, there could be a subtle change in the output
format since 1.83. If you could run the tool by hand and share the
output that would be helpful to try and track down this issue.
I don't seem to have any version of ClustalW installed on my current
machine, so it will take me a little longer to reproduce this here.
Could you file a bug please, and attach the example input and the
output when run by hand at the command line.
Thanks,
Peter
From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:52:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 8 Jul 2008 04:52:06 -0400
Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all
arguments are strings
In-Reply-To:
Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2538
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST -------
Forgot to mark this as fixed - sorry for the extra email!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Jul 8 07:02:37 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 12:02:37 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote:
> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote:
>> I would like to confirm that this is a bug ot not. If I get
>> confirmation, I would fill it in bugzilla.
>
> It does look like a bug to me...
I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
files where the first line of the consensus was blank (and would
probably affect both Clustal W 1.83 too).
I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Could you update this file and re-test please Sebastian? Also, may I
add a test alignment file based on your example to CVS please?
Thanks,
Peter
From mjldehoon at yahoo.com Tue Jul 8 08:47:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT)
Subject: [Biopython-dev] Bio.Sequencing
Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Hi everybody,
Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
Just to make sure that I am not treading on other people's work.
--Michiel
From fkauff at biologie.uni-kl.de Tue Jul 8 09:12:39 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 08 Jul 2008 15:12:39 +0200
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <487367C7.2050702@biologie.uni-kl.de>
Hi all,
Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
>
Not me. Green lights from my side.
Frank
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
>
>
> --Michiel
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From biopython at maubp.freeserve.co.uk Tue Jul 8 10:36:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 15:36:43 +0100
Subject: [Biopython-dev] Bio.Sequencing
In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com>
References: <570915.67657.qm@web62415.mail.re1.yahoo.com>
Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com>
On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps?
> I'd like to make some changes to Bio.Sequencing with regards to bug #2454:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2454
>
> Just to make sure that I am not treading on other people's work.
My only comment is watch out for the fact that Bio.SeqIO is now
calling Bio.Sequencing for the "ace" and "phd" formats.
On a related note, I'd had some ideas for making the Ace parser more
user friendly by further extending the doc strings and defining
__str__ or __repr__ methods for some of the "line type classes" which
otherwise must be explored by using dir() to discover the properties.
I haven't actually done any work on this yet though.
Peter
From sbassi at gmail.com Tue Jul 8 11:38:29 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 12:38:29 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with
> Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal
> files where the first line of the consensus was blank (and would
> probably affect both Clustal W 1.83 too).
Yes, I used ClustalW 1.83
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
> Could you update this file and re-test please Sebastian? Also, may I
> add a test alignment file based on your example to CVS please?
Ok, I will test it today. You can use my file or any possible derivation of it.
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From biopython at maubp.freeserve.co.uk Tue Jul 8 11:56:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 8 Jul 2008 16:56:20 +0100
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To:
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com>
>> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
>> Could you update this file and re-test please Sebastian? Also, may I
>> add a test alignment file based on your example to CVS please?
>
> Ok, I will test it today. You can use my file or any possible derivation of it.
Thanks - I have added a two sequence version of your example as
Tests/Clustalw/odd_consensus.aln
Peter
From sbassi at gmail.com Tue Jul 8 12:52:13 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Tue, 8 Jul 2008 13:52:13 -0300
Subject: [Biopython-dev] Alignment problem. bug?
In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
References:
<320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com>
<320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com>
Message-ID:
On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote:
> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12
Just to confirm that it works now. Thank you!
Best,
SB.
From biopython at maubp.freeserve.co.uk Wed Jul 9 07:11:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 12:11:16 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com>
Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Now that Biopython 1.47 is out, are there any comments/objections to
my committing this to CVS?
Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
Thanks,
Peter
P.S. Any real world example files would be good for the test suite.
From lpritc at scri.ac.uk Wed Jul 9 08:14:04 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Wed, 09 Jul 2008 13:14:04 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID:
Only that you might want to consider Axon Text File format as a
self-describing tab-separated format which would facilitate storage and
recovery of all attributes of a sequence. There's a spec here:
http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
On 09/07/2008 12:11, "Peter" wrote:
> Now that Biopython 1.47 is out, are there any comments/objections to
> my committing this to CVS?
>
> Bug 2533 - Support for simple "tab" format in Bio.SeqIO
> http://bugzilla.open-bio.org/show_bug.cgi?id=2533
>
> Thanks,
>
> Peter
>
> P.S. Any real world example files would be good for the test suite.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
SCRI, Invergowrie, Dundee, DD2 5DA.
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
DISCLAIMER:
This email is from the Scottish Crop Research Institute, but the views
expressed by the sender are not necessarily the views of SCRI and its
subsidiaries. This email and any files transmitted with it are confidential
to the intended recipient at the e-mail address to which it has been
addressed. It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this
confidentiality and you must not use, disclose, copy, print or rely on this
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the
name of the sender and delete the email from your system.
Although SCRI has taken reasonable precautions to ensure no viruses are
present in this email, neither the Institute nor the sender accepts any
responsibility for any viruses, and it is your responsibility to scan the email
and the attachments (if any).
From biopython at maubp.freeserve.co.uk Wed Jul 9 08:30:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 13:30:26 +0100
Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com>
Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com>
On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote:
> Only that you might want to consider Axon Text File format as a
> self-describing tab-separated format which would facilitate storage and
> recovery of all attributes of a sequence. There's a spec here:
>
> http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html
>
Its an interesting and flexible file format, but I don't see any
standard column name for "sequence" which would be of particular
interest from the point of view of the Bio.SeqIO module. If there is
a de-facto convention for storing sequence data in an Axon Text File,
then we could adopt this within Bio.SeqIO. Otherwise, I think any
Axon Text File parser added to Biopython would have to be of much more
general nature (and not part of Bio.SeqIO).
Peter
From biopython at maubp.freeserve.co.uk Wed Jul 9 09:03:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 9 Jul 2008 14:03:16 +0100
Subject: [Biopython-dev] Simple __getitem__ for Alignments
Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com>
Now that the latest release is out (Biopython 1.47), Bio.AlignIO
should start to get used more. I anticipate more people getting
frustrated with the current Alignment object, and would like to make
another baby-step in improving it.
I'd like to add a minimal __getitem__ method, as described in Bug 1944
comment 15,
http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15
> def __getitem__(self, index) :
> """Access part of the alignment.
>
> You can access a row of the alignment as a SeqRecord using an integer
> index (think of the alignment as a list of SeqRecord objects here):
>
> first_record = my_alignment[0]
> last_record = my_alignment[-1]
>
> Right now, this is the ONLY indexing operation supported. The
> use of two indices and splice notation to extract a sub-alignment,
> row, column or letter is under discussion for a future update."""
> if isinstance(index, int) :
> #e.g. result = align[x]
> #Return a SeqRecord
> return self._records[index]
> else :
> raise TypeError, "Not currently supported, but may be in future."
>From the discussion on Bug 1944, this doesn't seem to be contentious -
the debate is about more advanced splicing operations.
I'd also like to add a __len__ method which would return the number of
SeqRecord objects (i.e. the number of rows). This would then let the
alignment be treated very much like a read-only list of SeqRecord
objects. Remember, we can already iterate over the rows in the
alignment as SeqRecord objects.
Any comments?
Peter
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:21:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:21:13 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST -------
(In reply to comment #16)
I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
to the previous parser.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:37:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:37:44 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST -------
Michiel,
while you're at it - could you update my email in the source as well? And
Cymon's email is now cy at cymon.org. Thanks!
Frank
(In reply to comment #20)
> (In reply to comment #16)
> I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free
> to have a look and comment. If everybody is OK, I'll add a DeprecationWarning
> to the previous parser.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:38:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 09:38:18 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST -------
In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've
also update Bio.SeqIO.PhdIO in CVS (good).
I would suggest you add yourself to the copyright statement for this module,
and add some doc string entries to the new read and parse functions. I haven't
looked over the details of the new code (other than confirming test_Phd.py and
test_SeqIO.py seem happy).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 10:28:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 10:28:36 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST -------
(In reply to comment #21)
> Michiel,
>
> while you're at it - could you update my email in the source as well? And
> Cymon's email is now
I have updated your address, but I'd prefer hold off on Cymon's without his
direct permission -- spammers are watching too, you know.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:33:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:33:42 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
mmokrejs at ribosome.natur.cuni.cz changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST -------
OK, so my old code not yet converted to biopython-1.47 gives me:
_textframe = blast.blast_and_htmlize(_query_sequence, _usermode,
upload_temp_path, blast_path, uri, _align_view, _matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in
blast_and_htmlize
_blast_out, _error_info, _blast_file = blastall(blast_path + targetdb,
query_sequence, upload_temp_path, mode='sequence', align_view=align_view,
matrix=matrix)
File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall
_blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall',
'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize,
gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands,
alignments=_alignments, descriptions=_descriptions, expectation=_expectation,
align_view=align_view)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1620, in blastall
_security_check_parameters(keywds)
File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line
1986, in _security_check_parameters
if ";" in value or "&&" in value :
TypeError: argument of type 'int' is not iterable
It turns out I am passing in:
{'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4,
'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999}
I don't think it makes sense to require users to pass strings instead of
numbers to the function.
While looking into the _security_check_parameters() I think you should also
check for "||" - the logical OR as interpreted by shell and redirections ">"
and "<".
FIX:
-if ";" in value or "&&" in value:
+if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in
str(value) or "<" in str(value):
My apologies that I did not test earlier.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:38:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 14:38:08 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST -------
Don't know if you want to leave in the back-door to pass in another argument
with its value. If not, prevent spaces as well. Values never contain spaces
unless wrapped by single or double-quotes. I find it perfectly legal to tell
blastall:
-d "/some/db /another/db /yet/another" to search over three databases at once.
It seems it does not reflect -d specified 3 times on its command-line.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 9 16:12:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 9 Jul 2008 16:12:40 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST -------
The issue with non-string arguments (e.g. float or integers) was reported by by
Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was
after the release of Biopython 1.47.
As you've demonstrated there are valid reasons to want to include spaces. I
would rather not add a check which requires lots of special casing.
I'm leaving this bug open to consider extending _security_check_parameters() to
prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds
reasonable. A third opinion wouldn't hurt of course!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 06:30:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 06:30:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST -------
> (In reply to comment #21)
> > Michiel,
> >
> > while you're at it - could you update my email in the source as well? And
> > Cymon's email is now
>
> I have updated your address, but I'd prefer hold off on Cymon's without his
> direct permission -- spammers are watching too, you know.
>
Contacted Cymon, reply below:
Hi Frank,
...
>
> Do you want your email address updated in the ace/phd parser code? Or
> removed (just the email, not the name, of course)? Don't know if you follow
> biopython-dev lately.
I dont actually follow the -dev list but perhaps I should, as I think
I'm going to be using and doing far more diverse bioinformatics stuff
(now that I'm employed as a bioinformatician :)
Anyway, the email can be changed to cymon.cox at gmail.com - best to go
through google I think as their spam filters tend to be pretty good.
Cheers, C.
(In reply to comment #23)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 12:24:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 12:24:27 -0400
Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2533
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST -------
Checked in, marking as fixed.
Bio/SeqIO/TabIO.py initial revision: 1.1
Bio/SeqIO/__init__.py new revision: 1.33
Tests/output/test_SeqIO new revision: 1.25
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 10 23:20:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 10 Jul 2008 23:20:11 -0400
Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
Summary: AlignInfo.py fails a test
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
When I run:
$ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py
I get the first 2 test OK but then:
Traceback (most recent call last):
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in
print summary.information_content()
File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in
information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
I've also tried without the AlignIO:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.ProteinAlphabet)
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
summary.information_content()
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/align.py", line 16, in
summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 508, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 04:49:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 04:49:08 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST -------
Going over your example code:
>>> from Bio import Alphabet
>>> from Bio.Align.Generic import Alignment
>>> from Bio.Align.AlignInfo import SummaryInfo
>>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
>>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
>>> a = Alignment(Alphabet.ProteinAlphabet)
First problem, you gave the Alignment object an Alphabet class, rather than an
instance of the class. I guess we should an explicit check to the Alignment
object...
You should have used:
>>> a = Alignment(Alphabet.ProteinAlphabet())
Or, if you prefer perhaps:
>>> a = Alignment(Alphabet.generic_protein)
Then when we get to the information_content, there is another issue:
>>> a.add_sequence("asp",seq1)
>>> a.add_sequence("unk",seq2)
>>> summary = SummaryInfo(a)
>>> summary.information_content()
Traceback (most recent call last):
...
AttributeError: ProteinAlphabet instance has no attribute 'gap_char'
The trouble here is that SummaryInfo class is looking for a declared gap
character in the protein alphabet - and none has been declared. Your example
sequences appear to use "-" as a gap, but you haven't declared this.
Try this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
from Bio.Seq import Seq
from Bio.Align.AlignInfo import SummaryInfo
seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW'
seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW'
a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-"))
a.add_sequence("asp",seq1)
a.add_sequence("unk",seq2)
summary = SummaryInfo(a)
print summary.information_content()
You mentioned having a similar issue with Bio.AlignIO - could you attached the
example file to this bug with some trivial code showing your problem?
Thanks, Peter.
P.S. Please update to Biopython 1.47 rather than using 1.46
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 05:50:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 05:50:49 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST -------
I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py
directly. I don't know how I missed that before...
/home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <--
AlignInfo.py
new revision: 1.15; previous revision: 1.14
done
My opinion from from looking at the AlignInfo code, and scanning back over the
CVS history, is that it was ever used much with generic alphabets (as tend to
be returned by Bio.AlignIO). There may be other issues here - for example I've
spotted another problem case, doubly extended alphabets like a protein alphabet
with declared Gapped and WithStopCodon (which you *might* want in an
alignment).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Jul 11 06:33:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 11 Jul 2008 11:33:22 +0100
Subject: [Biopython-dev] Checking alphabet argument in alignments
Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com>
I'd like to add the following check to the __init__ method of the
Bio.Align.Generic.Alignment object (our base alignment class),
> if not (isinstance(alphabet, Alphabet.Alphabet) \
> or isinstance(alphabet, Alphabet.AlphabetEncoder)):
> raise ValueError("Invalid alphabet argument")
This will prevent subtle user errors like this:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet)
which should be:
from Bio import Alphabet
from Bio.Align.Generic import Alignment
a = Alignment(Alphabet.ProteinAlphabet())
The only downside I have thought of is if anyone has created their own
alignment class which does NOT subclass the original
Bio.Alphabet.Alphabet class.
This same test could (should?) also be added to the Seq and MutableSeq objects.
What do people think?
Peter
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 06:39:48 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 06:39:48 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST -------
In comment 2 I wrote:
> I've spotted another problem case, doubly extended alphabets like a
> protein alphabet declared Gapped and WithStopCodon (which you *might*
> want in an alignment).
This alphabet issue is fixed in CVS, as is another corner case of a divide by
zero error where an entire column consists of ignored characters.
Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS.
Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 12:18:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 12:18:28 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST -------
(In reply to comment #24)
OK, I updated Phd.py.
The last module to consider is Ace.py; I'll upload a fixed version soon.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:00:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:00:10 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST -------
(In reply to comment #1)
> First problem, you gave the Alignment object an Alphabet class, rather than an
> instance of the class. I guess we should an explicit check to the Alignment
> object...
Yes, that is my fault.
> You mentioned having a similar issue with Bio.AlignIO - could you attached the
> example file to this bug with some trivial code showing your problem?
Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py
rev. 1.17):
from Bio.Align import AlignInfo
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
print summary.information_content()
And I got (and this time I am not supplying any alphabet, at least not
explicit):
Traceback (most recent call last):
File "/mnt/hda2/py252/bin/2542.py", line 12, in
print summary.information_content()
File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py",
line 499, in information_content
raise ValueError, errstr
ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
frequencies
> P.S. Please update to Biopython 1.47 rather than using 1.46
I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not
available from the drop-down menu in bugzilla form.
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:02:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:02:24 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST -------
Created an attachment (id=971)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view)
This file is used by my example were information_content() fails when sequences
retrieved with AlignIO
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:16:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:16:03 -0400
Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and
Bio.AlignIO
In-Reply-To:
Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2443
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Specifying the alphabet in |Specifying the alphabet in
|Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST -------
I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and
Bio.AlignIO (both their read() and parse() functions).
See also alphabet issues raised on Bug 2542.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:19:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 11 Jul 2008 17:19:50 -0400
Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test
In-Reply-To:
Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2542
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST -------
> Yes, this code with Bio.AlignIO also fails (I tried right now with
> AlignInfo.py rev. 1.17):
>
> from Bio.Align import AlignInfo
> from Bio.Align.AlignInfo import SummaryInfo
> from Bio import AlignIO
> fn = open("secu3.aln")
> alignment = AlignIO.read(fn, "clustal")
> summary = SummaryInfo(alignment)
> print summary.information_content()
>
> And I got (and this time I am not supplying any alphabet, at least not
> explicit):
>
> Traceback (most recent call last):
> ...
> ValueError: Error in alphabet: not Nucleotide or Protein, supply expected
> frequencies
Good. That seems to be working as intended - alignment formats like FASTA or
Clustal do not specify the sequence type (unlike for example the Nexus format).
Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional
alphabet argument? I had already been considering this for Bio.SeqIO so this
is a natural extension. See Bug 2443.
Unless information_content() can determine the sequence type (protein or
nucleotide) from the alignment alphabet, you have to help it by supplying an
appropriate e_freq_table argument.
Perhaps:
from Bio.Alphabet import IUPAC
from Bio.SubsMat import FreqTable
from Bio.Align.AlignInfo import SummaryInfo
from Bio import AlignIO
fn = open("secu3.aln")
alignment = AlignIO.read(fn, "clustal")
summary = SummaryInfo(alignment)
#Have a generic alphabet, without a declared gap char, so must
#provide the frequencies and chars to ignore explicitly:
expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25},
FreqTable.FREQ, IUPAC.unambiguous_dna)
print summary.information_content(e_freq_table=expected,
chars_to_ignore=['-'])
This is probably safest. I'm doubtful that information_content() will choose
wisely if given mixed case or lower case sequences... if that is the case it
should be filed as a new bug.
>
> > P.S. Please update to Biopython 1.47 rather than using 1.46
>
> I was using Biopython 1.47, but I reported as 1.46 just because 1.47
> it is not available from the drop-down menu in bugzilla form.
Thanks for the reminder - I've added that to Bugzilla now :)
I'm marking this bug as fixed now (after the updates to AlignInfo.py)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From peter at maubp.freeserve.co.uk Sat Jul 12 09:45:46 2008
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 12 Jul 2008 14:45:46 +0100
Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW
Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com>
For some time now we've been discouraging the use of the HTML and
plain text Blast parsers in favour of the XML format.
I think it would be a good idea to now officially deprecate the HTML
parser in Bio.Blast.NCBIWWW with warning messages when it is used. I
don't even know if it still works with the recent big revision to the
BLAST webpages, but I suspect not.
However, the plain text parser in Bio.Blast.NCBIStandalone still has
its uses. In particular, right now the PSI-BLAST output in XML format
lacks some of the information found in the plain text output (new vs
reused entries) so it would be premature to deprecate our plain text
PSI parser. See Bug 2502 for details:
http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18
Peter
From bugzilla-daemon at portal.open-bio.org Sun Jul 13 12:23:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 13 Jul 2008 12:23:57 -0400
Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named
ancestors
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
Summary: Bio.Nexus.Trees can't handle named ancestors
Product: Biopython
Version: 1.46
Platform: PC
OS/Version: FreeBSD
Status: NEW
Severity: normal
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: markd at soe.ucsc.edu
The following code produces:
ValueError: invalid literal for float(): Ancestor1
from Bio.Nexus import Trees
# from http://evolution.genetics.washington.edu/phylip/newicktree.html
treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);"
tree = Trees.Tree(treeStr)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Jul 14 06:17:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 14 Jul 2008 06:17:14 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST -------
This sounds like a job for Frank (the Bio.Nexus module author).
Can I ask if you've actually come across trees with names ancestor nodes in
"real life"? That would make this bug more important. If so, the name of the
tool would be interesting, an example tree file would be great to add to
Biopython as a test case.
If on the other hand the only named ancestor tree you've ever tried is the
example from the Newick documentation, this doesn't seem such a high priority
(but still worth fixing).
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 15 16:07:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 15 Jul 2008 16:07:56 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST -------
This is a suggested implementation of the split method for our Seq object,
modelled after the python string method which it calls internall. Note that I
have made the separator non-optional on the grounds that the string method's
default of white space isn't (usually) sensible for sequences. I'm happy to
change this if people this its better to be as close as possible to the string
method.
def split(self, sep, maxsplit=None) :
"""Split method, like that of a python string.
Return a list of the 'words' in the string (as Seq objects),
using sep as the delimiter string. If maxsplit is given, at
most maxsplit splits are done.
Unlike the python string method, sep must be specified (as
there shouldn't be any whitespace strings in a sequence).
e.g. print my_seq.split("-")
"""
if maxsplit :
parts = self.data.split(sep, maxsplit)
else :
parts = self.data.split(sep)
return [Seq(chunk, self.alphabet) for chunk in parts]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 05:39:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 05:39:01 -0400
Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
Summary: Bio.SeqIO improvements
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
$ python
Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> handle = open("genbank-synthetic.gb")
>>> print seq_record
ID: EF452680.2
Name: EF452680
Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
/comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
/sequence_version=2
/source=synthetic construct
/taxonomy=['other sequences', 'artificial sequences']
/keywords=['']
/references=[,
, , ]
/accessions=['EF452680']
/data_file_division=SYN
/date=11-JUN-2008
/organism=synthetic construct
/gi=166831528
Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
IUPACAmbiguousDNA())
>>>
I do not see how I could access the value 'DNA' from the LOCUS line:
LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Could seq_record.features have a repr() function to give me something useful
instead of this?
>>> print seq_record.features
[, , ]
>>>
I don't see documented anywhere in the biopython docs access the features,
pasting something like the following into docs would give a user clue where to
look for for values:
>>> print seq_record.features[0].qualifiers
{'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic
construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq:
aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']}
>>> print seq_record.features[1].qualifiers
{'gene': ['NOS']}
>>> print seq_record.features[2].qualifiers
{'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number':
['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma
gondii'], 'db_xref': ['GI:166831529'], 'translation':
['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'],
'gene': ['NOS'], 'protein_id': ['ABP65329.2']}
>>> print seq_record.features[3].qualifiers
Traceback (most recent call last):
File "", line 1, in
IndexError: list index out of range
>>>
I wonder if I could access the above dicts as seq_record.features['source']
or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Jul 16 06:30:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 16 Jul 2008 06:30:21 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature
| |improvements
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST -------
(In reply to comment #0)
> $ python
>
> Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24)
> [GCC 4.3.1] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio import SeqIO
> >>> handle = open("genbank-synthetic.gb")
I'm guessing the missing line here was something like:
seq_record = SeqIO.read(handle, "genbank")
> >>> print seq_record
> ID: EF452680.2
> Name: EF452680
> Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds.
> /comment=On Feb 4, 2008 this sequence version replaced gi:145391444.
> /sequence_version=2
> /source=synthetic construct
> /taxonomy=['other sequences', 'artificial sequences']
> /keywords=['']
> /references=[,
> , instance at 0x834ceac>, ]
> /accessions=['EF452680']
> /data_file_division=SYN
> /date=11-JUN-2008
> /organism=synthetic construct
> /gi=166831528
> Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC',
> IUPACAmbiguousDNA())
> >>>
>
>
> I do not see how I could access the value 'DNA' from the LOCUS line:
> LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008
Currently the sequence type (DNA, RNA, Protein) is used internally by the
GenBank parser to determine the alphabet. It is not currently recorded in the
SeqRecord object's annotation but could be. How about something like this?:
seq_record.annotations["seq_type"]
> No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0].
Assuming that the first feature is the source (typically the case), and
assuming it has a specified molecule type, then your suggestion is one work
around. But I agree, its not nice.
> Could seq_record.features have a repr() function to give me something useful
> instead of this?
>
> >>> print seq_record.features
> [, instance at 0x837b9cc>, ]
Yes we could add that, but you wouldn't want to do that on a typical genome
with thousands of features. Adding a repr method for the Reference object is
also something I had wondered about doing.
> I don't see documented anywhere in the biopython docs access the features,
> pasting something like the following into docs would give